AI hype vs. AI reality

35

big tech companies have been leaning into artificial intelligence we unveiled the new AI powered Microsoft being an edge to reinvent the future of search we want everyone to benefit from what Gemini can do you’re using it to debug code get new insights and to build the next generation of AI applications mik it seems like you might be gearing up to shoot a video or maybe even a live stream yeah in fact we’ve got a new announcement to make is this announcement related to open AI Perhaps it is I and in fact what if I were to say that you’re related to the announcement or that you are the announcement me the announcement is about me well color me intrigued the promise of generative AI technology is that it will change our lives for the better but so far many of these AI rollouts are not living up to expectations there’s the Microsoft being chatbot last year that expressed love for a New York Times Reporter and suggested the reporter leave his partner Joe castaldo is with the Globes report on business and has been extensively covering artificial intelligence Google’s Gemini image generator which did things like depict America’s Founding Fathers as as not white chatbots inventing legal citations and getting lawyers in trouble even back in 2016 Microsoft released a chatbot on Twitter called Tay and people very quickly figured out how to make it say quite heinous things and Tay was never heard from again today he’ll tell us why AI hype is often different from reality why companies roll out these technologies that don’t seem quite ready and what that does to public trust I’m man rammen welms and this is the deciel from the globe in mail [Music] Joe great to have you here thanks for having me so Joe we’ve now seen a number of launches for new AI tools um just tell me though about I guess the usual pattern that these launches often tend to follow yeah there does seem to be a bit of a pattern generally there’s a lot of hype um that this new tool or model is going to improve our lives improve the way we work or get information and and then you know the thing is released to the public and it uh kind of Falls flat it looks a little half-baked people very quickly find out all the ways that this new model or application fails or makes things up and gets things wrong or says things that are just unhinged and share examples online and there’s you know sort of this negative media cycle of bad headlines and bad PR sometimes the company you know acknowledges the the mistake and and Promises a quick fix um and it just uh seems to happen again and again again and Google uh being the latest example um with its AI overviews in search yeah let’s let’s talk about that as an example because I think this is in people’s minds a little bit because this was fairly recent um what happened there how did this not go exactly as planned for Google yeah so this was a big change actually um to Google search which um you know makes the company billions of dollars and it’s like our gateway to the internet you know has been for years and so they started um putting on top of search results um an AI generated summary of whatever the query was and this was probably this wasn’t in Canada it was only being tested in a few places right yes Google only rolled it out uh in the US to start last month and their plans to roll it out into other countries including Canada down the road and Google pitched this as a way to get information you know faster and easier like you don’t have to do the hard work of clicking a link yourself and so very again very quickly um users started noticing that these AI overviews um could be wrong or just flat out nonsensical so an AI overview uh recommended eating rocks eating one rock a day for the nutritional benefits in a pizza recipe it included glue um as a way to you know get that cheese to to stick to the pizza um and they’re just you know flat out factual errors like one query was who was the first Muslim president of the United States and the AI overview said Barack Obama um you know perhaps picking up on this you know conspiracy theory that he’s some kind of secret Muslim so Google responded fairly quickly and said you know these instances are rare but they also made you know about a dozen technical fixes they said and they were clear that this isn’t a case of AI hallucinating which is the phenomenon where an AI model just make stuff up that is actually the term people use hallucinating them yeah and it’s it’s you know people debate if that’s an appropriate word or not but you know AI generative AI makes stuff up basically and they said it wasn’t so much that it was more pulling from websites maybe that it shouldn’t have been and so you know they they tried to address that problem and it just has the appearance of something that was released um when it wasn’t quite ready yeah so I mean this seems to fall into the pattern of what you were talking about earlier show right where we see this happen um with with these new releases that aren’t quite uh set for for the public but I guess the big question is why is this happening why does this continue to happen yeah there’s a few reasons I mean I think the the obvious one is just competition um when chat GPT was released toward the end of 2022 um you know it really touched off an arms race where every company had to do AI now like turn of AI was seen as the next big thing the next huge Market opportunity so companies are willing to make mistakes and risk some bad PR in order to get something out in order to be seen as first like if you’re too slow there are consequences Google for instance has been developing generative AI internally for a long time um but it wasn’t necessarily releasing everything to the public when open AI released chat GPT all of a sudden there’s a lot of pressure on Google to start doing something with all of this research um and as a quick example like Google had an image generator in early 22 called imagine um but it wasn’t released to the public and the team that made that um later left Google and started their own company in Toronto called ideogram um partly because they felt they could move faster outside of Google and look at Apple too um it’s one of the few big tech companies that wasn’t really doing anything with generative AI it’s it’s a device company in in many ways and so there are a lot of questions about what does AI mean for apple and they finally had um an event recently where they’re going to partner with open Ai and integrate generative AI into iOS in a bunch of different ways and the stock price is is up quite a bit since that event um in response to that yeah I think because investors there’s some relief uh on the part of investors who can say okay finally like apple is doing something with AI now okay so what you’re describing is really this pressure on these companies to kind of keep up with each other and and and roll these things out even if they’re not totally ready yet um it’s interesting I think we should dig a little bit deeper into this idea because this concept of releasing something even though it’s not really set seems to be I guess part of the Silicon Valley mindset if I can say that Joe like it goes beyond just AI um why is it the way that these companies tend to operate yeah there’s a couple things there I mean there’s this concept of the minimal viable product which is uh like a barebone version of something some tool some application that a company will build and release to test like market demand customer need rather than spend a lot of time and money releasing something um complete um that might flop so it’s you know it can be a smart way to do things um to make sure there’s a market before you throw a lot of money into it then exactly yeah and also you know the move fast and break things ethos has been part of silicon valy for quite some time um you know Facebook kind of being the poster child for that like when the company was really growing it endured a lot of scandals about you know privacy concerns and data breaches and being you know hijacked to manipulate um elections and so on so yeah that that mentality is there in Tech but I think there’s something a little different going on with generative AI okay um like Facebook for for all of its fault s um the core product more or less worked you know you add friends you post pictures you like you comment you get served up ads generative AI is different in that this technology is a bit more unwieldy it doesn’t always behave the way that you want it to it makes mistakes it outputs things that are that are not true and that’s not a bug that can be fixed with you know some more work and coding it’s just inherent in how these AI models work uh and companies are doing lots of things to try to improve accuracy but it’s a very very hard problem to solve and so you know until that’s addressed you know we’ll see more flubs and mistakes and and launches that uh that go sideways I I guess why is it that these problems are so complex when it comes to generative AI I mean maybe that’s obvious but I guess why yeah why are the problems that we’re seeing now why aren’t they as easily fixable as as previous problems like you were saying with Facebook before yeah I mean this is a a simplification but you know with a chatbot for example or the large language model that underlies the chatbot um you know it’s effectively predicting the next word in a sequence based on you know tons and tons of data that it has been that it has analyzed um but an AI model has no idea what is true and what is fiction we’ll be right back Joe can we talk a little bit about how these products are tested because of course companies are testing these models before they do roll them out um do we know what that means though exactly like what kind of tests are actually run on these tools yeah there’s a a concept called red teaming um which is fairly big uh where employees uh you know team of employees tries to like test the vulnerabilities of an AI model like can you make this AI chat bot say something it’s not supposed to can you make it say conspiracy theories or something discriminatory um so red teaming is kind of like trying to break it in a way to see if it will break yes exactly like ethical hacking in a way um so that you know you can better understand the vulnerabilities and fix them before it’s released to the public um so that’s a big Focus but it’s not sufficient so I guess still though like why are we still seeing these problems even with these measures this testing that is happening why is that still such a struggle yeah it’s hard to know without having insight into our like a particular company before a release but I mean red teaming you know there are you know tensions between needing to commercialize something and you know making sure it’s it’s safe um like there has been a shift where you know generative AI previously was you know it’s kind of a research project like you know University Labs were working on it corate Labs were working on it again with an eye to you know commercialize something down the road it wasn’t seen as like ready for public release so there’s presumably some tension there um you know with red teaming like do they have enough time um are there enough people um is the team diverse enough to you know find bias and and other vulnerabilities this is something that we’ve we’ve talked about generally that can be an issue with tech um how how does that play into this well so let’s take image generation for example it’s a well-known problem that image generators have you know bias and stereotypes kind of built into them um and that’s just a reflection of our society and our bias and our problems because AI models are trained on data that we as humans put out there in the world um so if you ask an image generator to produce a picture of a chances are it’ll be a white man a doctor a man a nurse a teacher a mug shot chances are it it might overly represent black people for example so um it it takes you know a diverse team um to think about these issues and try to address them before launch um in Google’s case with its Gemini image generator earlier this year it may have overcorrected um so people found that it was producing um historically inaccurate pictures so again like America’s Founding Fathers as as uh as black people for example or you know German World War II soldiers depicted as people who are not white Google was trying to inject more diversity into the output but went too far perhaps and and Google paused image generation on Gemini so that it could you know work to address this there was a bit of a narrative that like oh AI is is woke it’s too woke that’s the problem which is you know just silly it’s not about that it’s just an indication that these models are are hard to control hard to have like accurate predictable output and some of the blind spots of teams that are developing them yeah yeah it seems to really illustrate that I’m wondering what experts I guess have told you about all of this Joe because you know obviously companies I guess see an advantage of releasing products this way they they continue to do it um but what did experts tell you so far about the issues that we’ve seen yeah so um Ethan mik who’s a a professor at the Wharton School of Business in the US um has this really interesting take um that you know Perfection is the wrong standard to use um for generative AI so something doesn’t have to be perfect in order for it to be useful so he and his colleagues and uh the Boston Consulting Group did this really interesting study a while back where you know they gave some Consultants access to gp4 which is open ai’s latest model and other Consultants did not have access to Ai and they gave them a bunch of tasks to do and you know to simplify what they found is the Consultants who had access to GP gp4 um were much more productive and they had higher quality results on a lot of tasks than the Consultants who did not have ai and these were tasks like come up with 10 ideas for a new shoe uh write some marketing material for it write a press release for it um so more on the creative end um but they also designed a task that they knew AI could not do well and so what they found is the Consultants who used AI their results were worse like much worse than Consultants who did not use AI so you might think that’s kind of obvious like if a tool isn’t up to the job of course the results are going to be worse but I think the important takeaway is if you don’t have a good understanding of where the limits of generative AI are you will make mistakes like it can be detrimental to you I I know you spoke to another expert who was talking about something called the error rate Joe this is I guess this is a little bit about how we trust these tools uh can you tell me about the error rate yeah so just you know trying to figure out how often an AI model might hallucinate and so that was um I was speaking to Melanie Mitchell who’s a um computer science professor in the US and you know this ties back to like knowing where the limits of AI are but it’s a little tricky because she was saying that you know if we know chat gbt makes mistakes 50% of the time like we won’t trust it as much and we will check the output more often right because we know there could be lot of mistakes but if you know it’s only 5% of the time or 2% of the time we we won’t right like only 2% of the time you know it’s chance it it’s probably fine but it might not be so in that way you know more mistakes could slip through so she was saying that you know the better system in some ways could actually be riskier because we’re more likely to trust it I mean this idea of us how we understand and how we trust these tools I think is really is really fascinating and I I guess I wonder to to come back to how these launches are actually done does the way that they’re rolled out and they’re you know they they break and they do all these strange things does that actually I guess erode our trust the Public’s trust in this technology it could if you know if your first exposure to something new is kind of ho hum or if it’s influenced by a lot of you know negative coverage maybe you won’t try it again or try it at all um but the thing is like we’re getting ative AI whether we like it or not um there’s a lot of AI bloat what what exactly does that mean um I guess I think of AI bloat as unnecessary AI features okay that make products you know in some cases more expensive or just more annoying to use um you know like there are laptops coming out that have a Microsoft co-pilot button on them to get easy access to AI tools I think even like in WhatsApp now right you have the meta ask me anything that’s there as well yes same thing on Instagram so the The Meta AI chatbot is is in search so it’s it’s coming um but there are still a lot of questions about like is this better is this something people want um how does this improve a user’s experience you know and you have to wonder too this company’s add in more AI features are they going to have to jack up the price so it’s a value for money question um just in our last few minutes here Joe so these products are not perfect right now we we’ve talked about this extensively at this point uh is the understanding though that they’re going to get better over time that this is just kind of a temporary phase until we actually you know get over this hump to something better yeah I mean that’s the Arc of Technology generally but the question is you know how fast um I think a lot of AI developers assume that this technology is going to improve very quickly um and if you look at the past few years it certainly has like take for example I’m sure a lot of people saw the AI generated video of Will Smith eating spaghetti Uncle Phil come try this fresh pasta of B air which was hilarious but also horrifying um compare that to some of the AI generated videos now from companies like uh Runway or open AI Sora and like the leap in quality is quite astounding it’s not perfect but it’s huge but you know progress doesn’t necessarily continue at the same rate the approach to AI now is get a whole lot of data a whole lot of you know compute or gpus and the more data the more gpus you have the better the AI at the other end of this process um but there are real challenges in getting more data and you know there was a study earlier this year from Stanford sort of about the state of AI and one of the things that noted is that progress on a lot of benchmark marks has kind of stagnated and that could be a reflection of diminishing returns of this approach progress might not be as linear as as people are uh as some people are assuming just very lastly here Joe um when we look at these releases that the way things are are rolled out here what does this tell us about how seriously these companies are taking the big questions like like ethics the safety of these tools what what can we gleam from this yeah I mean there’s there has been a lot lot of concern about the pace of progress and companies releasing AI into the wild like last year there was a very high-profile open letter uh you know asking for a six-month pause on development so that regulations could catch up and of course nobody paused nobody stopped um and so I guess what we’re seeing now with this kind of Rush and companies scoring own goals and making mistakes that could have been avoided to some extent uh with more care and thought doesn’t necessarily bode well for the future um especially if AI models become more powerful more sophisticated more integrated into our lives that arguably carries more risk so this is where regulation comes in um why so many people are concerned about regulating AI I mean the EU is pass their AI act you know there’s a bill uh here in Canada there’s lots of efforts in the US um so you know there’s an argument to be made that if you want companies to behave responsibly uh you know take ethics and safety seriously you have to force them to through the law Joe this was so interesting thank you for being here thanks for having me that’s it for today I’m man ramman welms Kelsey Arnett is our intern our producers are meline white Cheryl southernland and Rachel Levy mlin David Crosby edits the show Adrien Chung is our senior producer and Matt frer is our managing editor thanks so much for listening and I’ll talk to you soon

Artificial Intelligence has been creeping into our lives more and more as tech companies release new chatbots, AI-powered search engines, and writing assistants promising to make our lives easier. But, much like humans, AI is imperfect and the products companies are releasing don’t always seem quite ready for the public.

The Globe’s Report on Business reporter, Joe Castaldo is on the show to explain what kind of testing goes into these models, how the hype and reality of AI are often at odds and whether we need to reset our expectations of Generative AI.

Subscribe to The Globe and Mail’s Morning Update to get stories directly in your inbox:

Reference

LEAVE A REPLY

Please enter your comment!
Please enter your name here