Not to start an entirely different post but...
You know what? The hell with it! This is long enough to qualify as a post in its own right, so why not?
Blimey Charlie! You'd have thought it would be simple enough to get an AI to draw a picture of three kids in a trench-coat, right? I mean, how hard could it be?
I started out at NightCafe with a prompt ripped straight out of the text of the post - "three kids in a raincoat trying to get in to the cinema to see The Creature from the Black Lagoon" to which I added "magazine illustration, 1950s, black and white line art, cartoon, New Yorker style". I left all settings on default.
Running that through HiDream I1 Fast got me the very nice illustration at the top (Not the top of this post. This is the footnote. This post.) with the flawless typography and the unarguably accurate interpretation of what I'd asked for. I mean, yes, that is "three kids in a raincoat". A raincoat each. My bad for not specifying one raincoat between the lot of them, I guess.
I tried the same model four more times, using variations such as "three kids standing on each other's shoulders inside a single raincoat, pretending to be one adult, in a raincoat" (Second image.) and "three kids, standing on each other's shoulders, wearing a single trench-coat, pretending to be one adult" (Third image.) but every time it gave me three separate kids, only now without the movie title, which I'd left untouched in the prompt.
I gave up trying to get HiDream to understand what I wanted and tried half a dozen other models, from the newest and most sophisticated of the non-Pro versions like Google Imagen 4.0 fast to older ones supposedly tuned to produce comic-style images like ArthemyComics v 5.0.The newer models gave me something that looked very similar to HiDream, except for Imagen, which gave me two kids in separate trench-coats, except that one of them seems to have her feet on back-to-front. That could be an indication that there are two kids under there, only she's exactly the same height as the kid next to her and he really looks like just one person. Here, have a look for yourself and see what you think.
Also the typography is wrong. Not very impressive from Google, there.
The older models were much, much worse. Neither of them stacked the kids up and one of them didn't bother with any kids at all, preferring to give me what looked like an adult super-villain in a trench-coat and some kind of bizarre, adult mutant in a short jacket. You don't want to see it. Trust me.
Or the other model's attempt, which was at least two kids, just not kids from this planet. And not on each others' shoulders, either. And only one of them was wearing a trench-coat.
Oh, alright then...
Now you wish you hadn't asked, don't you? That was by Rabbit v7, by the way.
I was a bit miffed by this point but the HiDream pictures were at least pleasant illustrations so I decided to use them for the post. But then I couldn't leave it alone so I thought I'd google the general idea of "three kids in a trench-coat" to see if I could find a specific reference to give the AIs as a guide.
And do you know what? It appears to be a lot more obscure in origin than I thought. There doesn't seem to be all that much agreement on the first time it appears - the Little Rascals gets mentioned, as does Disney's 1937 Snow White. For examples, Vince Adultman from Bojack Horseman gets a lot of play, which is a pretty damn meta way of trying to explain the thing, if you ask me.
Clearly none of that was going to help, although I did try throwing the Bojack reference into one of the prompts. Didn't work.
Finally, I had the bright idea of asking ChatGPT to draw the image for me, only before I let it try I was going to ask it if it knew the trope. If it could describe the concept, maybe it could draw it. Or get one of its AI pals to do it, anyway.
Me: Are you familiar with the trope of two (Or three) small children, standing on each others' shoulders inside a large trench-coat so as to pretend to be an adult?
ChatGPT: Yes! That’s a well-known comedic trope.
It
usually involves two or three children (sometimes even small animals or
cartoon characters) stacking themselves vertically, hiding inside a
long trench coat, overcoat, or robe in order to pass as an adult. The
idea is that they can sneak into places children normally aren’t
allowed—like bars, nightclubs, or secret meetings—or trick authority
figures.
A few things that often come with the trope:
The “adult” looks unnaturally tall and awkward, with strange proportions.
Their voice gives them away (either squeaky, multiple voices arguing, or poorly disguised deep voice).
They often struggle with walking smoothly, balancing, or reaching for things.
It’s usually played for slapstick comedy when they get exposed (the coat falls apart, one kid sneezes, etc.).
It
shows up a lot in cartoons (Looney Tunes, Animaniacs, Spongebob, etc.),
comics, sitcoms, and even films. Sometimes it’s done straight,
sometimes it’s parodied as a ridiculously transparent disguise.
Do you want me to dig up some specific well-known examples of this trope in media?
I didn't want it to do that. I wanted it to draw me a picture. And it did. It's the third image in the post. (Not this post, obviously. But I didn't need to expain that, did I?)
And it's approximately what I asked for, although I'd have preferred it without the extra leg. Also, I'd have preferred a good cartoon, which this isn't. It's about as perfunctory as you could get and still be recognizable.
Also, ChatGPT took ages to produce it. So long, in fact, that I was starting to think I'd have to log out and try again because something had obviously gone wrong. I must be spoiled by NightCafe, which takes maybe five seconds at most to produce a result. This felt like ten times that at least.
All things considered, I'm certain that if I could draw even passably well I wouldn't be bothering with AI at all for anything like this. I'd have been able to knock out three or four decent spot illustrations in the time all of that took (A couple of dozen if you include how long it's take me to write this footnote!)
And finally, I just generated a new image for the top of this post, now it's going out on its own, and just bloody look at it! It's not even close!
The prompt was "two kids, standing on each other's shoulders, wearing a single trench-coat, pretending to be one adult, like Vince Adultman in Bojack Horseman, trying to get in to the cinema to see The Creature from the Black Lagoon magazine illustration, 1950s, color, line art, cartoon, New Yorker style".
I got an alien and an adult in trnechcoats and a kid in what looks like a Norfolk jacket.
Thanks very much, Stable Core.
I just had a go at this on various sites using various models and also struck out. I even stripped it way down to just try to get a person standing on another person's shoulders and AI just seems baffled by the concept.
ReplyDeleteI found the oddest part was that ChatGPT was easily able to summarise the concept and produce an appropriate, if poor-quality, representation, where none of the others seemed able to do it. ChatGPT used to use DALL-E but now mostly uses some proprietory image generator of its own, unless it tells you it used DALL-E, which mine didn't. Something must be going on in the interaction between the LLM and its in-house Image Generator that isn't happening when you send a prompt straight to the AI models via NightCafe but what it might be I have no clue. It would be interesting to find out.
DeleteYou know, thanks for restoring some sanity to the universe. After a couple of weeks' worth of AI hype from all over, it's nice to see screw-ups are still there.
ReplyDelete(No, I still don't use AI.)
AI is going to be fine for unsupervised, general use one day, I'm fairly sure of that. At the moment, though, it's like a game in something like late-alpha. All the systems are in and running but there are tons of issues waiting to be discovered and fixed. I wouldn't say it even feels like a beta product yet. The real problem is the way it's being rolled out as though it's already a polished, commercial proposition. It's very obvious to anyone who uses it that it's nothing like that at all. Or it should be, anyway...
DeleteI have had a lot of similar experiences with AI image generation. My 'perpetual' one is trying to get AI to produce a comic book version of someone who sort of looks like myself. My prompts generally look something like:
ReplyDelete"Generate an image in comic book style of an older red-haired man wearing glasses doing "
And for some reason, no matter how I ask, the image ends up with a beard. Even when I explicitly add "... without a beard" to the description, and/or argue with the AI about how it got the image wrong.
The responses always make me laugh: "Yes, clearly I got that wrong as the man in the illustration has a beard. Let me try again, this time carefully avoiding including a beard." And the 'corrected' image also has a beard.
Generative / LLM AI has no real clue what it is doing. It is just a cosmically gigantic Gordian knot of stacked up statistical probability hiding inside a trench coat pretending to be intelligent.
Okay, now that's very weird. I just tried it, cutting and pasting your prompt and adding "gardening" as the action. I got a nice cartoon of a very bearded man doing some gardening. Then I added "clean-shaven" in front of "red-haired" and got a very similar image of the same man with the same beard.
DeleteBut then I re-submitted the same prompt to the same model a third time, only I re-phrased it to separate the concept of hair color from the concept of shaven or bearded. The new prompt was ""Generate an image in comic book style of an older, clean-shaven man wearing glasses doing some gardening". The man has red hair."" And that produced a picture of a beardless, red-haired man doing some gardening. Granted, he had heavy five-o-clock shadow so he wasn't as "clean-shaven" as all that but he very definitely didn't have a beard!
One thing some of the people who do a lot of this sort of thing sometimes mention is that the AI focuses on early words in a prompt but I've found it tends to focus on the beginning and the end but lose a lot of what's in-between. It especially ignores concepts in lists so it's better to keep them as far apart as possible. Still something of a crap-shoot, though.