On Sunday, Wilhelm at The Ancient Gaming Noob posted a very interesting and detailed account of the way the three leading AIs answered the question "Who is Wilhelm Arcturus?" On Monday, Belghast from Tales of the Aggronaut followed suit. Then, later the same day, I saw that Scopique had joined in as well.
Following the lengthy discussion between myself and Andrew Farrell in the comments to Saturday's post, I was already planning to do something similar although I wasn't intending to use myself as a subject. I'm still going to do something along those lines but it might be a while before I get around to it, so in the meantime and in the hope of encouraging one of those post cascades, here's what I got when I asked all three AIs the simple question
Who is Bhagpuss?
Before we start, I have to say I knew exactly what was going to happen. I was sure at least one of the AIs would fall straight into the gaping maw of the obvious trap. I did think they might at least query it before jumping in with both feet but no...
Two of the three made the same very basic mistake. Both ChatGPT and Copilot assumed I was asking about the 1970s BBC stop-motion animated TV show Bagpuss, starring the eponymous saggy, old cloth cat. They gave me several paragraphs of what appeared to be accurate information about the show and sat back, looking pleased with themselves.
I then pointed out their error and asked them to try again.
No, that is "Bagpuss". I asked "Who is Bhagpuss?" Please note the different spelling.
Copilot apologized but clearly felt it had done all that was required of it.
I know when I'm being given the brush-off so I left it at that.
ChatGPT apologized, made excuses, then offered a what felt like a coded insult but at least it offered to try and do better if I'd just give it some help.
So I clarified.
Bhagpuss is a blogger who has a blog called Inventory Full on the Blogger blogging platform. Can you now tell me any more about them?
I'm not about to argue with any of that, all of which is true as far as it goes, although the needle does appear to have swung away from dismissive towards almost embarrassingly complimentary. I wonder where it gets those specific phrases from - "thoughtful analysis" and "engaging storytelling"? Has it picked them up from something in a comment or a review or is the AI just programmed to flatter?
ChatGPT was at least able to give me some reasonable detail when pointed in the right direction. I imagine Copilot would have done the same, if it could have been bothered to try. I suspect Belghast is correct when he suggests the main reason some of the AIs couldn't figure out who Wilhelm or I were from just our pseudonyms is because neither of us use our name in the title of our blogs.
I don't agree with him that the AIs are just souped-up search engines. If only! That would make them far more useful and reliable. They can be made to act something like that, given sufficient prompting but in the past, if they couldn't find what they were looking for, they just made stuff up.
Although they now seem a lot more willing to just throw up their hands and declare their ignorance, my recent experience with Dustborn suggests they can still get creative if not watched carefully. Search engines, no matter how overclocked, don't make stuff up. AIs most definitely do, even now.
Finally we come to the AI formerly known as Bard - Google's Gemini.
Gemini did so much better than the other two it scarcely seems like a fair competition. Here's its first response, in full.
That's very limited but also 100% accurate. Gemini found the blog, gave correct examples of its content and used a quote about me that I did indeed say. The alternate suggestion is also factually correct. I do indeed use this alias for the stated purposes.
I also give Gemini a bonus point for mentioning that I write about other subjects than gaming, even though it only mentioned blogging itself. I'm always puzzled why none of the AIs, even when pointed to the blog, ever seem to notice the hundreds of posts tagged "Music" or "Movies" or "TV". It's always the games they fixate on. Once a gaming blogger, always a gaming blogger, I guess.
Gemini has already won the contest but since it was offering to go further I thought I'd see what it could do. As it turned out it wasn't much.
Nothing much new there. It's almost exactly the same information as the first time, except for that curious last line. It's oddly specific, not to say random. but it's true to some extent.
It also appears to suggest Gemini might have done some kind of analysis to define not only what games the blog had covered but where those games stood in the overall context of gaming. Either that or someone, somewhere once said something about me to that effect. I think that seems more likely.
On the evidence presented so far, then, Gemini is the clear winner with ChatGPT coming in a distant second and Copilot not finishing at all. There is one significant fact I've with held, though.
I had to log in to use both ChatGPT and Copilot, which I did through Google, using a specific account I created just for playing around with AI. It's not linked to any of my other accounts or identities and therefore ought not to be immediately identifiable as me.
When I went to use Gemini, though, I didn't have to log in at all. Gemini already knew who I was.
That was the first thing I saw when I opened Gemini, before I'd asked it anything. It appears I must have logged in under my regular identity at some point in the past and Gemini remembers me.
If it was anything like as "intelligent" as Google would like us to believe, I'd have thought it might have answered the question "Who is Bhagpuss?" with a simple "You are!"
But no-one really believes these things are intelligent, do they?
And there you have it. We can all sleep easy in our beds.
For now...
I wonder where it gets those specific phrases from - "thoughtful analysis" and "engaging storytelling"? Has it picked them up from something in a comment or a review or is the AI just programmed to flatter?
ReplyDeleteMy answer to that is that the program has likely been given coded phrasing for certain answer types. A month or two ago, my oldest and I were visiting my mom when my brother and his family were in town. The kids, who were both in their single digits, asked my oldest to read them a story they brought along, and she obliged. While half listening, I realized that without even knowing anything about the story that it was religious. It was because the story used the word "pondered" a few times, and I realized that the only times I've ever heard that word in a children's story is when it's a children's Christian story. Such as in the biblical reference "Mary pondered these things and kept them in her heart." So when the story reached the end and explicitly mentioned passages from the Bible , I was already clued in to expect that.
Or, to borrow another reference, when I would hear people in the scientific fields talk, they have certain words and phrases they use. "Consider the lever," one would say when describing a problem in either a textbook or a lecture. I have no idea why my scientific brethren talk this way --I certainly don't and I firmly believe this crap actually harms communication and dissemination of scientific thought by making you sound all self-important-- but it creates a certain unconscious signal to members of the scientific community.
In much the same way, generative AI has been programmed to say certain things in a certain way as signals to the right groups.
I think if the AIs (Or any specific one of them.) repeated the same specific phrase, as in the examples you mention, it would suggest some specific intent in the background but if you compare the phrases I picked out to things the AIs said about Wilhelm in his post ("His blog is a treasure trove of experiences", "Wilhelm Arcturus’s blog is a rich source of information and stories", "Wilhelm’s content can serve as an informative and entertaining resource") it's more the generally upbeat, positive, almost fannish tone that I'm talking about. That suggests a general "positivity" setting of some kind.
DeleteWhat I thought the LLMs were supposed to be doing, at least originally, was analyzing huge swathes of data from the internet and using statistical probabilities to synthesize likely responses to questions or prompts. Given all of our experiences with the internet so far, it seems very hard to imagine that would have resulted in such cheery, jovial, encouraging phrasing. It seems to me that someone or something must be nudging the responses in that direction.
I seem to remember you used to be able to set a "tone" for the AIs responses, making them sarcastic or neutral or friendly as you preferred. I wonder if that's still an option or if there's now just this happy voice all the time?
I just wanted to say that I really liked the title of this post. Plus the reveal at the end that Gemini knew your name all along was great.
ReplyDeleteI went ahead to see what Gemini knew about Shintar (considering I'm also logged in via Google). At first it was nothing, but with a bit of prodding it commented on my SWTOR blog. I was curious to see whether it knew anything about my WoW activity. After having no luck, I directly linked it to my WoW blog and it said it didn't know enough about it "possibly because it doesn't have an about page". I directly linked it to the about page and it said it couldn't read it. I don't know why it could read the about page of my SWTOR blog (which it directly quoted at me) but not of my WoW blog, both of which are on Blogspot. Just one more of those weird things.
Thanks! Sometimes it feels like I put more effort into coming up with the titles than I do writing the posts so it's always nice when someone notices.
DeleteI do want to do some more structured testing of these three AIs because it seems to me that, no matter how much slack you want to cut them, they just don't seem to be showing the kind of consistency you'd expect for something that's been released commercially by the some of the biggest technology companies on the planet.
It does seem very odd that they can't replicate the basic function of a Search Engine, although I notice that no-one doing this current round of name-checking is using the "Imagine you are..." prefix that used to be common a year or so ago. I might try prompting them with "Imagine you are a search engine..." and see if that improves the results.
Hi! I meant to say in the comments on the last post that I've been reading for a while and not commenting enough - the music posts are great along with everything else!
ReplyDeleteAlso yes I'd forgotten this is my Government Name, so it was a little startling to see it - I have another, older, email address that partly by whim is the one that I usually use for gaming stuff, and I got a blogger account for that one, too. It's not that I'm trying to prevent the forces of AI from merging my two identities, it's just useful - or it would be, if it weren't for the stronger force of "Oh, I'm on my phone and I'd probably have to swap between 3 apps to change login"