ChatGPT: The chatbot changing how we work
We first chatted ChatGPT last month, and have since been keeping an eye on the incredible ways it's been responding to users from across the world. This week, we consider the implications of this very powerful tool that has just landed in the hands of pretty much everyone, and whether we should be excited or concerned by the prospect it might become even more powerful...
*Correction - This programme erroneously attributed ChatGPT to Google; it is a Microsoft project. Our apologies for this mistake.
In this episode
01:30 - Chatting with ChatGPT
Chatting with ChatGPT
To demonstrate its potential, and to find out a bit more about how chatbots got to the level they are at today, I thought I’d give ChatGPT a test drive before running its responses through some free online text to speech software to give it a voice…
James - ChatGPT, what was the first chatbot?
ChatGPT - The first chatbot, known as ELIZA, was created in the 1960s by Joseph Weizenbaum at MIT. ELIZA simulated a Rogerian psychotherapist and used pattern matching and substitution to respond to user inputs. It was designed to demonstrate the superficiality of communication between humans and computers.
James - what do you mean, "the superficiality of communication between humans and computers?"
ChatGPT - The idea behind ELIZA was to demonstrate that a computer program could simulate human conversation in a way that was indistinguishable from a real human, despite the program having no understanding of the content of the conversation. The program used simple pattern-matching techniques to respond to users' inputs, which made it appear to be engaging in a meaningful conversation. However, in reality, the program was only able to respond in a superficial way, without any real understanding of the meaning behind the words being used. This demonstrated the limitations of trying to replicate human communication with a computer program, and highlighted the need for more sophisticated techniques to be developed in order to achieve true human-like communication with computers.
James - So are you saying that a modern chatbot, like ChatGPT, does have a real understanding of the meaning of the words used in this conversation?
ChatGPT - As a modern language model, ChatGPT is capable of understanding the meaning of words and phrases, and can generate responses that are contextually appropriate. However, it is important to note that while ChatGPT can understand and respond to a wide range of inputs, it is still a machine and it does not possess the same level of understanding or consciousness as a human. It can understand the surface meaning of the words, but it does not have emotions, beliefs, or personal experiences that influence its understanding of the conversation. It is also limited by the scope of its training data and the information it was given during training. It can make assumptions and predictions based on patterns it has seen during training, but it may not always understand the nuances of a conversation or the true intentions behind a question.
James - Ok. I appreciate the honesty. What was the next development in chatbots after ELIZA?
ChatGPT - After ELIZA, a number of other early chatbots were created, such as PARRY and Jabberwacky. These chatbots improved upon ELIZA by incorporating more advanced natural language processing techniques and attempting to simulate more human-like conversation.
James - And then?
ChatGPT - In the late 1990s and early 2000s, chatbots began to be used more widely in commercial applications, such as customer service and e-commerce. These chatbots used rule-based systems and decision trees to respond to user inputs and provide information to users.
In recent years, with the advance of machine learning and deep learning techniques, chatbots have become more advanced and sophisticated. These chatbots, like ChatGPT, use neural networks to understand and generate text, and can handle more complex and open-ended conversations.
James - Well, I can see that… But how can I trust that anything you've said to this point is true?
ChatGPT - I would advise you to cross-reference any information provided by me with other reliable sources to ensure the information's accuracy and validity. In addition, while I can provide answers to factual questions based on the information I was trained on, I am not able to provide personal opinions or make factual claims without evidence.
James - Interesting. So if I asked you what your favourite colour was what would you say?
ChatGPT - As a language model, I do not have personal preferences or feelings, and therefore, I do not have a favourite color. I am a machine that processes the text based on the patterns it has learned from the data it was trained on. I can generate text that might sound as if it's expressing a preference, but it is not based on any personal experience or preference. I can only provide information based on the patterns and information I have seen in the data during my training.
05:04 - How does ChatGPT work?
How does ChatGPT work?
Michael Wooldridge, University of Oxford
It's sensible not to take everything ChatGPT told you at face value, not least because the software admitted itself that you shouldn’t take everything it says as gospel. Oxford University’s Mike Wooldridge, who spoke to us when this story was breaking, is with us now to provide the human touch, and hopefully help us understand how all this is possible…
Mike - What's happened is that people have realised that scale matters in artificial intelligence. And what scale means for these systems is three things. Firstly, it means how big are your neural networks? Literally the larger your neural networks are, the more elements that they have. That matters. The amount of training data that you use to train your system - modern artificial intelligence absolutely relies on training data so that matters. And finally, the amount of compute effort that you are prepared to throw at training these programmes - that matters. And so there was this move that started around about five years ago that just said, "let's see how far we can take scale. Let's see how big our neural networks can get. Let's see how much data we can throw at these problems and let's see how much compute resource we are prepared to use." And the first system, the ancestor of ChatGPT was GPT-2, which I think appeared in 2018 or 19. Famously, it was supposedly so good that they were not prepared to release it to the public because this unprecedented power was too much for us to handle. But what happened with GPT-3, the successor system, is basically it was an order of magnitude bigger, an order of magnitude more data, an order of magnitude more compute power. And that's the way things are going. There's been a race for scale. That's what we're seeing. We're seeing the benefits of that.
Chris - I was just pressing James on how quickly it responded because normally you're used to your computer taking a while to load a game or something, and it's generating this output almost instantly as though it were just a human spouting a result back at you. What sort of computing grunt have they got on the back end of that to make that possible?
Mike - Okay, so you've got to distinguish two different things. Firstly, there's building, or training, the model; throwing the data at it to train it so that it learns how to be able to respond. That takes AI supercomputers running for months. Computationally it's one of the heaviest tasks that people are doing in computing. Now there's a big concern here about the amount of CO2 that's being generated while you're doing that. We believe that GPT-3, which is the technology that underpins ChatGPT, uses something like 24,000 GPUs - graphics processing units. And these are high performance AI computers running for a number of months in order to be able to churn through that data. So that's the training part. But once you've got that, you've got your neural network structures, actually using them, the runtime as we call it, what you were doing when you had your conversation, that's much cheaper but you're not going to do it on a desktop computer, you don't need anything like the scale. You don't need super computers to do that but you still need a lot more than a desktop computer. And the reason is those neural networks are very, very big. GPT-3 is 175 billion parameters. Basically, these are the 175 billion numbers that make up that neural network.
Chris - That's what I wanted to ask you about, because what has it actually learned? What is sitting in that machine that means when James asks it for its opinion on colors and it says, "well, I don't have one." How is it doing that?
Mike - There's a long answer and a short answer. The short answer is that we don't exactly know. The long answer is that basically what these things are doing is exactly the same as your smartphone does when it suggests a completion for you. So if you open up your smartphone and you start sending a text message to your partner saying, "I'm going to be...", it might suggest "late" or "in the pub." How is it doing that? Because it's looked at all the text messages that I've ever sent and it's seen that whenever I type "I'm going to be..." the likeliest next thing is going to be "late" or "in the pub." GPT systems are doing the same thing, but on a vastly larger scale. The training data for them is not the text messages you've sent, it's every bit of digital text that they could get their hands on when they wanted to train it. They download the entire internet and they train it using all of that text to try to make the prediction of what would be the next likeliest thing in the sentence.
Chris - The problem is, Mike, that the internet is full of rubbish. There's tons and tons of unreliable data out there. So how do you make sure that your system can sort wheat from chaff?
Mike - So you've put your finger on one of the big issues with this technology at the moment. There is so much data that it can't all be checked by humans before it's fed to the machine. Again, the details are sketchy on exactly how it happened in these public systems, but there will be some screening, probably automatic screening, looking for toxic content that will work to a certain extent. But it won't be reliable. It will get things wrong. It will allow through some things that really, ideally, we wouldn't allow. It will not be able to check the veracity of an awful lot of stuff that it's fed. What we're getting out of this is some kind of aggregate picture. It's like an average of what it's seen out there on the internet. But, to be honest, we need to do a ton more work to understand exactly what's going on there and exactly how we can deal with those issues. These are brand new tools that have landed on planet Earth and we've got a lot more work to do to understand them.
Chris - What can we expect to see this do next?
Mike - So the things that they're phenomenally good at are things to do with text. I urge you to try it: right? Go to to the BBC News website, cut and paste a story, and ask it to summarise it. In my experience, it usually does a very, very good job of coming out with the summary. Ask for a one paragraph summary, ask it to extract the top three bullet points from the news story, and it will do that. Take two news stories about the same thing and ask it to find what are the commonalities in the news story? What are the points of difference? It's in my experience also, the technology is very, very good at that. It's not perfect. You have to check it. It comes out with falsehoods, but it's very good. Where are you going to see it? You're going to see it in your email system. So instead of showing you every email, you're going to get the top three bullet points from your email. I think that would be quite a useful thing to be able to do.
12:38 - CheatGPT: how will teachers respond?
CheatGPT: how will teachers respond?
Vitomir Kovanovic, University of South Australia
Now we understand a bit more about how ChatGPT works, it’s time to have a think about what sort of immediate impacts might be felt by society as a result of its introduction, and how it could change things further into the future. Earlier, we mentioned that some schools in America have already taken a swift, hardline approach to students using ChatGPT as a fast way to finish assignments, banning the chatbot from computers hooked up to the school’s network. Vitomir Kovanovic is a senior lecturer at the Centre for Change and Complexity in Learning at the University of South Australia whose background is in Computer Science. He specialises in learning analytics, and told James about how large language models could change education for the worse, but also for the better…
Vitomir - We had a conversation with teachers in South Australia about what are the ways you can use this? And there are several really potentially beneficial ways using it to very quickly generate a set of exam questions that haven't been used in the past, for example. Previously, educators used question banks. With a colleague, we were just testing the system and we used it to generate the syllabus of the course. It actually produced a very impressive syllabus. And then we said, how about you put a little bit more practical work in weeks five to seven or whatever, and then the system generates another one. That's how you have to use it. This kind of system will require a lot of skills to use. It's literally almost as if you hired a great composer and now you want him to compose something for you. You really need to be able to articulate what you want and be able, when it comes back with something, to say, "ah, this is not what I actually wanted. I wanted something slightly more dynamic." "I wanted more darker tones." You need to understand and be able to communicate with the machine to achieve what you really want.
James - It's interesting you framing it in the ways it can be used to the benefit of education. I completely agree with those points, but I wonder if we could just rewind a bit and think again about how, in the hands of the students, especially in the short term, playing around with the technology myself I can see how it would take a lot of the heavy lifting out of a task of an essay you've been set for homework. And especially if we continue to see improvements and students learn how to better use the software themselves, presumably this is a point of some concern?
Vitomir - Right away it'll really completely invalidate all the assessments we do because, let's be honest, a big part of the assessment is writing essays, long written responses, and so on. But you can literally say, write me a response to this question pretending that you're a year eight student and it'll simulate the stylistic complexity that that would be expected of a year eight or year nine. I'm pretty sure students are starting to use it. A bigger question is, "is this a good way to assess their learning?"
James - Is it possible simply to just police this properly? Can we not just ban the websites on school networks to stop students being able to use it?
Vitomir - Well on school networks, yes. Things like that are very easy to do. But the problem is these systems will become more and more common and, in a sense, why would you? The only reason to do it is to protect the existing assessment models. And we already know they're really not fit for purpose, so we want to change them.
James - I've heard ChatGPT be described before as like a calculator, but for essay writing and idea generation. Calculators, obviously when they became inexpensive and widely available, didn't make maths a redundant subject. Is that a comparison or an analogy you like? Or is ChatGPT even more powerful than, say,a calculator? Will it usher in even more dramatic change?
Vitomir - The comparison makes sense, but it's much more powerful. Grammar check would be something like a calculator; something that does something very small, constrained, and it does help, right? Writing a good essay with spellcheck or without spellcheck, it's not the same. But this is much, much more than that. It's almost like having a professional mathematician sitting next to you. What's interesting is, looking at the responses, people in the 60's, when calculators massively became available, that was the same discussion. Should we allow calculators? Shouldn't we allow them? But solving a big complex mathematical problem requires you to use a calculator 50 times, but how do you assemble the steps? Really critical thinking of solving mathematical problems, you still had to do it yourself. This system is far more powerful than that. I mean, you can still simulate at least some bit of this critical thinking, but if you want to really produce a good essay or a good written response, you have to still intervene there. You're not just writing now, you're being given by the computer a written response and now, as a student, you need to evaluate, is it good enough for my task? You need to go back to the computer and give it more instructions, how do I fix it? And so on. So it'll be far more back and forth.
James - The landscape just feels like it's changing so quickly. What sort of thing are we talking about when we say 'change the assessment?' You mentioned the archaic way of doing it is by just saying "no technology," but it seems like teachers will have a responsibility to integrate this. People will be using this in the workplace before too long.
Vitomir - It's still really open. We first need to see how people are going to be using this. How would somebody writing a script use this? What's the potential of this? Because this is a very, very fuzzy, different technology from the others. Typically, when you develop a technology, you know what the possibilities are? We are not even sure what this technology can do. Let's say you're given an essay on pros and cons of abortion laws or any complex social issue - really what you're testing there is for critical thinking, their ability to critically summarise different ideas, integrate them, compare them, see where the difference really is. And you'll still have to do that. So the focus should be on that. It'll shift a little bit from being focused on writing perfectly, because machines can do that now. We'll focus on your logic. What are you really writing there? I think in the future we'll see assessments becoming more complicated and more demanding.
19:08 - AI generated science papers
AI generated science papers
Catherine Gao, Northwestern University
One study published this week has demonstrated that ChatGPT, as well as being more than capable of generating homework indistinguishable from that of a real student, can even pen scientific content to a standard that allows it to go undetected as computer-made. Catherine Gao is a critical care physician at Northwestern University with a side hustle in machine learning research. She saw what ChatGPT was capable of and wondered how it would do at writing science abstracts - the summaries of the results of study papers published in journals. It blew her expectations out of the water and compelled her to systematically deduce just how indistinguishable it was from science written by real scientists…
Catherine - So one, we wondered if it would set off plagiarism detectors. The ChatGPT abstracts performed very well. They scored on average 100% original. So really not using any plagiarism in the traditional sense.
Chris - What that's telling you is that ChatGPT is not just going to some source online and grabbing wholesale that abstract and regurgitating it. It's generating content that is not in existence anywhere else.
Catherine - That's exactly right. It's really writing these abstracts from scratch. Another task that we looked at was whether or not it would score sort of high on different types of detectors. They exist online, these AI output detectors that you can run texts through. And so the real abstracts all scored very, very low. Most of them scored 0.02%, fake. Whereas the majority of the ChatGPT abstracts scored much higher with a median score of 99.98% fake.
Chris - So a machine can spot another machine's work.
Catherine - Yes, exactly. Using the machines to detect the machines. That's very right.
Chris - What about though, if you then thrust those results that it had generated in front of not a machine but people.
Catherine - Like I said, I was really impressed at how good it is. So we gave collections of 25 abstracts that were a mixture of real abstracts and generated abstracts to different team members. So these are all members who are within our biomedical sciences lab. So they're used to engaging with science. We said some of these are real, some of these are generated. Please give us a binary score of which one you think this is, and then you're also welcome to give us some notes to what made you think one way or the other. Even knowing that there are generated abstracts in this combined list, our human reviewers were only able to identify generated abstracts 68% of the time. And these were very, very suspicious, skeptical reviewers. They were so suspicious that they even thought 14% of the real abstracts were generated.
Chris - What about the quality of the content? Because you haven't said anything about that yet, whether or not when you looked at what the machine was saying, it was factually accurate?
Catherine - You know, at first I thought maybe it would provide some vague summary that was in the realm, but what we found is that in the generated abstracts ChatGPT actually came up with completely fabricated numbers for their results. Basically reporting full studies that just came out of the ether. What was really surprising to me was that it could hallucinate these numbers and present them in a way that seemed still factually sound enough that a reader might not be able to differentiate that 95% of the abstracts were generated - I think that would be reassuring. 68% is not that good and they even knew some of these abstracts were generated. So I think if someone came across the abstract in the wild, or if they were reviewing stuff, they might not realise that large language models have gotten so good at generating them and probably wouldn't think to think that it could be fake.
Chris - People are also raising concerns about, for instance, the use of tools like this to generate webpage content because, on the web, traffic is everything. Getting people to come to a resource, you throw adverts at them, you make revenue that way, you have a high foot fall site because you are creating content for your webpage. That was the bottleneck because that was where a person had to be involved and that's where money had to be involved.
Catherine - I think it gets to some very interesting questions about where do we go from here. In one way, could this be used in the hands of a responsible scientist to help take the burden off writing which sometimes can be, like you said, one of the bottlenecks of disseminating scientific work. Could it help improve equity across specifically scientists who have to write in a language that's not their own? What worries me also is, what if this technology is used for evil, right? There are these organisations that exist out there called paper mills that are basically generating scientific content for profit now with this technology that's so powerful, that's accessible and free. Could this be used by these nefarious organisations to spam science that's factually incorrect and dangerously convincing?
Chris - Well, could you go a step further and say, I've got a pharmaceutical company, it's not a very good one. It's deceitful and it wants to push a product. So what it does is generate hundreds of papers supporting a drug that it's invented, saying how good it is, encouraging real organisations to buy in, either investors or organisations who want to buy the drug or the product making money for that venture, when in fact it's all founded on fake science?
Catherine - The data that these models are trained on is detailed enough that it even knows the right range of patient cohort sizes to present in the generated results. For example, when we asked ChatGPT to write an abstract about study about diabetes, it included huge, huge numbers of patients beause a lot of patients have diabetes versus when we asked it to write an abstract about monkeypox, which is a much rarer, newer disease, it knew that the numbers needed to be much smaller. So certainly I think in the hands of these more nefarious or ill intended users, it could be a very dangerous technology.
25:57 - Computer coding with chatbots
Computer coding with chatbots
Michael Wooldridge, University of Oxford
Mike Wooldridge is still with us. I wonder what he thinks of what he’s heard today. Catherine mentioned AI detection software was able to detect the phoney science papers. But is this foolproof?
Michael - It's not foolproof, it's far from foolproof. I think there's an awful lot of work there to do. I think one of the interesting ideas that's out that work's being done on now is the idea that open AI can insert into the text that ChatGPT generates a digital watermark. Something which allows you to be able to analyse a piece of text and tell that it was actually produced by a system. So we don't have that yet, but I think that's a very interesting direction. But at the moment, I think us educators have got a headache right now to be able to identify this. Researchers, when we are looking at abstracts and research papers, it's going to be a challenge in the years ahead. And the real big worry for me is that peer review, which is the process that we use to evaluate scientific contributions, is already under strain. But systems like this might be used just to swamp and overwhelm peer review, where you're just getting an awful lot of very plausible looking reports and papers that are being produced by systems like that. So there's a lot of concern around those issues right now.
James - We haven't had a chance to even talk about the potential for ChatGPT to produce computer code. What are the possibilities there?
Michael - As I already mentioned, essentially the way that these programmes are trained is you just download the entire worldwide web and you train it on that. And in amongst all of that, there's a huge amount of computer codes. The site that we like to use to upload our code to prove how clever we are is called GitHub. And if you go to GitHub, there's tens of thousands, probably millions of of computer programmes that have been uploaded that you can analyse. Computer programming languages like Python are much simpler to understand than human languages like English. They're much, much simpler. They're very well-defined and actually incredibly simple languages to analyse. So it's no surprise that systems like ChatGPT should be quite good at being able to analyse and produce computer code. Where that technology is up to right now is being able to produce relatively short programmes. A tens of lines of computer code, which are kind of very often the useful little tools and utilities that we might use in our computer programming. I don't envisage them being able to produce Microsoft Windows or Microsoft Excel anytime soon. But there are some really fascinating applications of this. One of the most interesting is that ChatGPT can't do arithmetic and it can't do mathematics because that's not what it was designed for. But it can write computer programmes that can do mathematics and arithmetic. In other words, there's a problem that it can't solve itself, but it can write a computer programme to solve that problem. And at this point, I just wish Alan Turing was alive to see this technology. He would love to see this. This would really tickle, I think, his fancy, it's absolutely fascinating from the point of view of computer science.
James - Unbelievable, isn't it? One other thing, Mike, while we've got you that I wanted to ask, because another artificial intelligence technology that seems to be getting better with each passing day is deep fakes and the mind boggles at the possibilities of when we're able to somehow integrate the sophisticated level of ChatGPT with the deep fake software out there as well. Are we at a stage where we almost need to question anything we see online now?
Michael - I think absolutely. I think certainly we're at the point now where you can't trust text that you find on social media and so on. There's just no reason to do it. And this is why it's incredibly important to have providence to know where this text came from with confidence. But computer generated images and videos are not long behind. I mean, this is now very much within sight.