Thursday, May 18, 2023

A Whole Load Of AIs Fighting Over A Photocopier


As I'm sure some of you may have noticed, while the last three posts here have been lavishly illustrated as usual, it hasn't been with screenshots drawn from my extensive personal archives. The last time I delved back into the past for those was in the post on forced grouping, when I had to reach down into  the very bottom of the barrel to find some EverQuest shots I just possibly might not have used before.

It's a perennial problem here. When I started the blog back in 2011, the model I had in mind, largely unconsciously I think, was the format I'd used for many years during my fanzine and APA days, something a little bit like a magazine page, a little bit like a collage. Something that, on a good day, I might fancifully have considered sticking to my wall like a poster. I did, actually, do that. More than once.

Back in the '80s and '90s, I used to be quite the creative type - for someone with no discernible artistic talent whatsoever. I couldn't draw, which was a bit of a novelty among the crew I ran with, so I had to work with readymades. I kept files of pictures I'd cut out of magazines and gallery catalogs, fliers I'd picked up in clubs, tickets, gum wrappers, bits of paper I'd found in the street... I'd take my own photographs, blow them up on a photocopier until they distorted, then copy them a few more times until you could hardly tell what they were. 

One time I remember driving out into the countryside and hiking across some fields so I could find a
rutted mud track, where I artfully dropped the image I wanted to be the front cover of some zine I was working on into a puddle. Then I videoed it with a camcorder and somehow - I can't remember how - got it printed out. After which I photocopied that to make it look even less like anything you could recognize.

I put in some effort is what I'm saying. Compared to all of that, finding and fitting pictures to the blog has always been super-easy. (Go on, say it. You know you want to.) 

For years and years I mostly wrote about mmorpgs I was playing and I've always taken a huge quantity of screenshots anyway, so it was no extra effort to take a few more. If I needed something specific and I didn't have it, I could always log in and get the shot I needed, no hassle.

In recent years the remit of the blog has expanded to include not just games I don't play but music, movies, TV and who knows what else. I've had to grab shots from Netflix or Prime shows, from YouTube, from developers' websites and press releases and generally all over the place. Compared to the old zine days it's still light work but it does take longer and require more processing.

It's also of dubious legitimacy at times. Mostly, since the shots accompany text that comments on the content they're drawn from, it's probably covered by "fair use". "Commentary and criticism" tends to be most of what happens here. 

Even in the music posts, in which the embeded videos themselves are covered by YouTube's copyright umbrella anyway, I tend to comment on or review pretty much everything as well. Whether that's enough to justify the stills I use at the top of the post is less certain, although after I've hacked them about the way I like to do, making them all but unrecognizeable (Which I do for pretentious "art" reasons not as a way of avoiding attribution, by the way.) I'd hope they'd have a shot at qualifying as "transformational". Then again, if Warhol couldn't swing it...

Sometimes, though, I find myself writing about subjects for which I just don't have any suitable pictures. Those are the times when I either have to try to come up with some referential connection between what's in the text and the illustrations I have on hand or I have to source something from the interweb.

I have never felt comfortable doing that. Oh, it's fine if it comes from an official press release or the "screenshots" or "media" section of game's official website. That's their purpose, after all. 

When it's a screenshot someone else took of a game they played a decade ago, though, it just feels a bit weird. Uncomfortably intimate, shall we say, like leaning over a stranger's shoulder on the train and having a good look at the photographs they keep in their wallet. I mean, you could...but you wouldn't. Would you?

Until recently I didn't have a lot of other options but now that's all changed. Remember back last summer, when a bunch of bloggers, led by Tipa, were posting about the funny pictures they'd made with AI image generators like DALL-E and Craiyon? How charmingly naive and innocent all of that seems now.

Compare the images in the posts I've linked with the ones I've used over the last few days. It's the difference between the poster-paint daubing your seven-year old nephew gave you and you blu-tacked to the fridge and that advertisement you tore out of a 1950s National Geographic and had framed to hang in the hall.

All of those illustrations were generated by prompts I ran through Stable Diffusion, "a latent text-to-image diffusion model capable of generating photo-realistic images given any text input". I have several text-to-image generators bookmarked; the aforementioned DALL-E and Craiyon, Artflow, and the best-known of them all, Midjourney but of late Stable Diffusion has been my go-to option.

Just to be clear, it's not called "Stable Diffusion" any more, by the way. Or then again, maybe it is.  Between Monday's and Tuesday's posts, the bookmark I was using stopped taking me to Stable Diffusion and took me instead to something called "Dream Studio", which I took to be the new, fancier version of the same thing. It has a more user-friendly front end and a suite of improved functions, none of which I've yet explored. I've just been plugging my prompts in like before and pressing "Dream", the romantically-renamed "Process" button.

The quality of the output is amazing, if not always all that accurate. It's apparent when the AI has plenty of reference material to draw on and when it's winging it. Ask for a picture of Lana del Rey and Frank Sinatra at the same mic in a supperclub in Vegas sometime in the 1950s in the style of a Life magazine cover and you get exactly what you wanted, although I was imagining they'd be singing not presenting an award. 

Ask for "A portrait of Lana del Rey in the exact and precise visual style in which it would appear, were she to be added as a "Phantom" in the online game "Noah's Heart" " however, and you get this...

... from which it's pretty clear the AI's never heard of Noah's Heart

Part of the fun at this stage of the evolution of these things is to test the boundaries of what they do and don't know and what they can and can't do so the failures are as interesting as the successes. Sometimes more so. I wanted to be absolutely sure Dream Studio had no idea what Noah's Heart was, so I asked it for "A phantom from the online game "Noah's Heart". Any will do, just to see if you know what the prompt means". 

As you can see, I find it difficult to resist anthropomorphizing these things. I always end up chatting to them like they're people. Of course, a person would have snapped back "Dude, I have no idea what your stupid game is! Screw you!" An AI tries to please even when it has absolutely no idea what you want:

That's not very close, is it? Not the least hint of the game in the art style and "phantom" seems to have been interpreted as "vague, shadowy figure". Which is fair enough. Noah's Heart is an obscure game with relatively few images available online. Why should an AI know anything about it?

What's more interesting is the similarity between the results I got from that prompt and those I got from the one I used the next day, when I needed illustrations for my post on the yet-to-exist Amazon Lord of the Rings MMO. I asked very simply for "Screenshots from Amazon Games' recently announced MMORPG based on the works of JRR Tolkein" and got the images I used in yesterday's post

As this compilation of the two shows, when not given a more specific prompt or when it has no idea what you're talking about, Dream Studio's AI opts for a generic, in-house fantasy game style. It's not a bad one, either. I'd happily play a game that looked like this

When it does have reference, though, and when you make your request sufficiently clear, the results are much more accurate. I don't have the exact prompt recorded that got me the headline image for the post on forced grouping but I asked for World of Warcraft shots and that's very clearly what I got. 

I also asked for some images of "Lana del Rey as a character in the mmorpg Guild Wars 2" (And you can keep your comments about unhealthy obsessions to yourselves, thank you.) Dream Studio obviously knew just what I was talking about.

Pictures of things that don't exist is obviously the USP of all these AI image generators but anyone who's been paying attention will probably be wondering: if I wanted a WoW screenshot to head up a post, why didn't I just use one of the ones I've got on file? 

It's a good question and the answer's a little concerning. The acceptable version is that I only have a limited number of WoW screenshots and I've already used most of the good ones. Since I'm still boycotting Blizzard, I can't log in on the endess free trial to take some more, either. That all makes using the AI sound quite reasonable. I mean, it's better than just "borrowing" someone else's screens from the web, right?

The real reason I used Dream Studio, though, is that it's faster and easier than digging through my files and folders for a suitable shot and much faster and easier than booting up a game and running around looking for a good spot to take a picture. Not to mention that the images the AI created were more aesthetically pleasing than an actual screenshot from WoW would have been.

I have to say that at the moment I find most of the AI images I'm generating to be aesthetically preferable to the screenshots I'm able to take for myself. There's been plenty of talk about how or whether or to what degree AI-generated images are or can be "art" and if they can, whose art they might be - the technicians who wrote the code, the researchers who trained the software, the artists whose images provided the raw data or the individuals who crafted the prompts - but when it comes to the use of AI images for the purpose of illustrating a blog post, there's yet another layer to the argument.

By and large and for a given interpretation of the EULAs, most of the images I upload to Blogger are "mine". I took them using the in-game tools provided or by using a third-party app like FRAPS or with the Windows screenshot function or even just by hitting PrtScr and copy-pasting. I chose the shots and framed them. I may have used some filters if the game offers that facility. If my characters are shown, I'll have dressed them, too. Often, I'll even do a little post-processing to punch them up.

And yet it could be argued that none of that is "my" art in any meaningful way. I didn't design the costumes or place the objects in the world (Except, if it's in my houses, maybe I did...) or choose the color palette or, well, anything. But then, neither did any one individual. Every screenshot, no doubt, includes elements that were crafted by any number of people, some of them actual artists who draw and paint, others those who follow on to use that art to make up tableaux, decorate rooms, plant gardens and replicate nature in the wild. 

And even then it gets messier. There's procedural generation to consider and the coders who wrote the  algorithms that lie behind the hills and valleys and mountains and streams. How far down do we have to dig before we find the true creators, if they can even be said to exist?

As someone who does nothing more than capture fixed images of other peoples' work and maybe tart it up a little bit, is there any meaningful way in which I can be said to have "created" anything? I'm just borrowing - with permission - and exhibiting the work of others. 

When I type a prompt into an AI, though, every word is mine and I would certainly claim authorship of the idea. I would also argue strongly that I was employing my imaginative talents, such as they may be, in an original and unique-to-me way, which is one workable definition of creating art.

That the ideas I create are then made visible by an AI isn't to say much more than that the AI is the tool I use in my act of creation. Granted, it's an extremely powerful tool but it's still a tool all the same. There are legal and moral arguments to be made over ownership, attribution and compensation but those stand to one side of the aesthetic question "Is it art?"

Someone else is going to have to answer that because I'm not touching it. What I will say is that it feels more satisfying in some ways than finding already-existing images that I didn't create either and slotting them in next to words that I did. Not to say that that's not satisfying in its own way - I'm on record as saying that it is and always has been - but at least right now, this feels even more so.

Maybe it's just the novelty but I suspect it's more than that. I think it's that the images are pleasing to me in and of themselves and I know they didn't exist before I summoned them up, not in that exact form. I made them real. It's not so much as though I feel I'm creating art - it's more like I'm doing magic.

Which is highly appropriate, considering the nature of the blog. Magic is a core component of most of the games I write about and much of the TV and movies, too. Maybe not so much the music unless I'm writing about prog rock...

When something's quicker and easier to use and feels more satisfying as well, its going to take an effort of will to avoid using it. I'd need a good reason not to make AI images my first choice for illustrating posts now, wouldn't I? 

Maybe. There is authenticity to consider although that's a worm-can no-one wants to open. There's also specificity, which is a lot more manageable. If I'm reviewing a game or recounting my adventures there, it's going to make a lot more sense to use actual screenshots of what happened than some pretty pictures that just set a mood. I can't see AI imagery replacing screenies of my bags or clips of quest dialog any time soon.

If I want a nice, splashy spot illo to break up a lot of discursive text, though, or something for those not-so-infrequent posts, where I'm writing about things I haven't done or seen or played, I'd be crazy not to fire up one of the AIs and get them to do the hard work for me. Expect to see it happening here more and more and to hear me mention it less and less. I think it's just going to become one of those things I do and probably, one day, one of those things most of us do.

At least, it will be until whoever's behind the technology starts to charge me money for using it. I said I'd need a good reason to stop. That would certainly qualify.

Oh, wait...

I've used up my free credits for Dream Studio. Midjourney, which is probably even better, isn't accepting prompts from non-subscribers right now. Craiyon's still free, though, and Google's working on something that'll probably be free, at least while they beta-test it. Even Adobe is in on the act now.

I'm sure I could get by without paying, somehow. I just found out that the original version of Stable Diffusion is still available here - and for free - although how long that will last is anyone's guess. Eventually, though, I'm pretty sure everyone will be out of beta and done with data-gathering and it'll be on to the next phase.

I'm probably going to have to get my credit card out, aren't I?

4 comments:

  1. You'll have to pay for a reasonably modern graphics card — which you need anyway. Then you can download the Automatic1111 version of Stable Diffusion and run it on your home machine. The websites are charging so much for access because the compute time on the platforms is quite expensive.

    ReplyDelete
    Replies
    1. Ah, thanks for the tip. That's very interesting. I was wondering if we were heading towards stand-alone apps for AI in general. I'd definitely be interested in doing it all "in house". I'll have to look into the possibilities.

      Delete
  2. Given that I've seen people with far more unhealthy obsessions concerning Emma Watson in DeviantArt, you're pretty normal by comparison. I'd be trying to see how Rush would look as a band in LOTRO, so I haven't a leg to stand on.

    The one thing I've noticed about people using Midjourney as either taking it as-is or using it as a base graphic and then working on cleaning it up to achieve their vision is that, well, Midjourney seems to generate a lot of graphics that look quite similar to each other. Same general look, same general face, same general body configuration, etc. It's as if Midjourney wants to... homogenize things as much as possible, and it's only by working hard with prompts can you truly break away from that.

    ReplyDelete
    Replies
    1. I haven't had much experience with Midjourney. I registered while it was free but I didn't get round to doing much with it before they moved to subscription-only. I have noticed, though, that it seems to be the app everyone uses for all those parodies on YouTube and they absolutely do all look the same. I like Stable Diffusion/Dream Studio, but it seems to generate some fairly predictable results too. Craiyon, on the other hand, can so random it's hard to relate the output to the prompt. I guess which is best all depends on what you're trying to achieve at the time.

      Delete

Wider Two Column Modification courtesy of The Blogger Guide