Thursday, November 9, 2023

Being Lazy Is Hard Work, Sometimes

I might have had something more substantial to post today if I hadn't spent several hours watching YouTube videos explaining how to make animated movies using AI, then following links to various utilities so I could play around with the tools. After all of that, my takeaway is that, while the technology may have come a long way, it still has a lot further to go before it can do what I want it to do, by which I mean everything.

The way things stand now, what I can see is a whole lot of extremely useful short-cuts that will save both professionals and dedicated amateurs a great deal of time and effort, just like I found when I was looking into whether I could get a team of AIs to write and draw a comic book for me. Unfortunately, that means instead of a project taking a few weeks, it might now take a few days, when what I want is for it to take a few seconds.

Basically, I want AI to be magic. I want to wave my wand and mutter a few incantations and have a fully-finished, polished, animated movie, ready to publish on my YouTube channel or on the blog or some other place where almost no-one will ever see it. In an ideal world I wouldn't even have to wave the wand. The AI would just know what I wanted and make it for me, without my even having to ask.

This would probably be a good time to ladle out that old pudding "Be careful what you wish for." There's an episode of Carole & Tuesday that hints at what will probably happen if and when the AIs reach that level of sophistication. It's Series One Episode 4: Video Killed The Radio Star. (Every episode of both seasons is named after a well-known song, usually with some relevance to what happens within it.)

This blog post gives a rough impression of what happens in that episode. I sugest not reading the comment thread, though. (And now everyone's gonna do exactly that...) 

Before anyone brings up my own long-promised post on the show, I'm happy to say it's done! I finished it a few days ago but I've been sitting on it. I'm not really happy with it - it's superficial - but I know I'm never going to finish the five or ten thousand word essay I have roiling around in my head so I'll just have to suck it up. I might publish it tomorrow. Depends what else comes along. 

The impact of AI on music and especially on songwriting is a major theme of Carole & Tuesday. The whole premise of the show is that the two girls are extremely unusual in that they write their own songs with no assistance from AI at all. Almost all songwriting in their time is at least AI-assisted, if not entirely subcontracted to software. 

Carole and Tuesday are both already singer-songwriters before they meet but it's a plot point that they only really attain a standard that's commercially viable when they pool their resources and write in tandem. Before then, barely-solvent Carole tries to supplement her meagre gig economy income by busking, mostly playing keyboard instrumentals no-one stops to hear, while poor little rich girl Tuesday composes songs on her acoustic guitar and sings them to herself in her lonely bedroom, somewhere in the invisible depths of a vast mansion.

Every other musician and singer we see in the entire show either performs work written by AIs or works with AIs to create their own music. To almost everyone in the narrative, the idea that anyone would try to do the whole thing alone seems not just unlikely but faintly ludicrous.

When Gus, the girl's manager, reformed alchoholic and one-time drummer in a speed-metal band called Lazy Sandwich, tries to get a video made on the cheap by buying an AI video bot called IDEA, the results are predictably terrible. Worse, the AI is a con artist, completely fooling everyone into thinking it can somehow whip up a stat-of-the-art music video for no money and no effort (By the bot, that is. It's a lot of effort for everyone else.)

I don't think we have to read very far between the lines to see what's going on there. Even four or five years ago, when Carole & Tuesday was being made, creative artists didn't like what was coming. They were right to be worried. 

The protracted actors strike wound up today (Pending final approval by the members.) following an earlier conclusion to similarly-motivated industrial action taken by their screenwriting colleagues. A number of issues were involved but in both cases fear of an AI takeover was one consideration. 

The compromise agreements reached revolve not around an abnegation of the upcoming technological revolution but a means of benefitting from it financially. A share of the spoils. 

And, to be fair, some say over whether studios can do away with existing actors, while still enjoying the public interest in those same actors' generated by their former appearances as living beings. I'm pretty sure the studios will be looking further ahead, to the day they don't need any actors at all, so it's only a temporary stay of execution for the craft but I guess it'll keep the current crop happy for now. Whether we'll even have live screen actors in fifty years is another question, as is whether anyone around then would still want them by then.

As I've said before, I believe all of this will have about as much long-lasting effect on the coming AI-pocalypse as the objections of eighteenth century weavers had on the success of the Spinning Jenny but as with the Industrial Revolution, a full turn will take longer than its proponents would like or its opponents fear. That's what I find so frustrating about it.

It's interesting to see the difference in rate of change between the various media to which AI is being applied. Applications requiring just a single, static image could very fast be approaching the moment when humans won't be necessary at all. I wouldn't think this would be the best time to take up a career producing book covers for self-published e-books for a start. I imagine most DIY authors will also be doing the pictures themselves before long.

As I found today, though, getting the images to move convicingly still takes rather more effort than I imagine most dabblers would care to give. I had no difficulty following the processes outlined in the videos I watched, most of which involved little more than typing prompts, selecting suitable results, then a lot of cutting, pasting and uploading. The problem was being bothered to do it.

Of course, that says more about me than the software. Once, the sheer amount of fiddly detail wouldn't even have slowed me down. 

Watching someone else fast-forward through the nitpicking process reminded me of that summer, back in the 'nineties, when I bought a sampler and spent a week making a three-minute song using samples of Kyle MacLachlan as Special Agent Dale Cooper in Twin Peaks. I was quite pleased with the result but even I didn't want to listen to it again once I'd finished. 

Or there was the time in the early 'aughts, when Mrs Bhagpuss was away with the kids for a few days and I spent every evening making "music" with some sequencing software I'd bought. Again, by the time I'd finished, I never wanted to hear any of it ever again.

The first time, I thought it was time well spent; the second time, I wasn't so sure. If I was in my teens or twenties right now I'm pretty sure I'd be staying up 'til 2am every night, working on some ludicrously time-consuming AI project. At the age I am now, I flatter myself I'm over all of that. 

What I want, as I said, is for the AI to do just about all of it for me and I'm already getting tired of waiting. I never got my flying car or my jet pack and we still can't get our meals in pill form but I can't help thinking this new future is a lot nearer than those science fantasies ever were. It's just not near enough!

Still, every day brings something new. The anouncement made at OpenAI's DevDay that all their AI tools are now available in one package is drawing a lot of attention. With the integration of DALL·E 3 into GPT4, you don't have to swap from text generator to image generator any more. That's sure to save some time. 

Elon Musk is trying to get in on the act but the less said about that the better. Meanwhile his erstwhile partner, Grimes, is - as always - playing both sides. The more I know about her, the better I like her.

There's also a growing trend towards personalised AI packages that allow you to feed in your own data to get responses more closely tailored to your requirements. NightCafe chides me every time I log in because I haven't yet trained my own model, while GPT4 Turbo now lets you input the equivalent of a three hundred page book as a single prompt. With additions and improvements like those, the possibilty of my having an AI write and illustrate a whole post that might plausibly pass as my own work gets nearer every day.

I probably wouldn't use it for that even if it were possible. I like writing posts. Then again, it'd be handy on days when I'm just not feeling it but don't want to skip.

What I really want, though, in addition to a GPT-powered wifi mic and speaker combo I can clip to Beryl's collar so I can talk to her and have her seem to talk back, are some AI apps that can produce music I want to listen to and video I want to watch, infinitely and on demand. Games I can play, too, why not? And I'd like it to tailor them to my preferences from nominal input that doesn't take me more than a minute or two.

Is that too much to ask? I mean, come on!

Right now, though, the best I can come up with is something like this. That's Dylan Turner's lo-fi generator and I've had it on in the background the whole time I've been writing this post. It does a job but it's no Julia Holter.

While I was there, I spent a while playing with Dylan's other toy, LudoTune. It lets you build three-dimensional structures with colored blocks and have them play music. 

Well, some people can get them to play music. I made an IF logo for the blog that plays a station ident. Kinda. If you squint. With your ears.

And that is why there's no real post today (Although the Carole & Tuesday digression is solid, I think.) Once the real AI gets here, there'll be none of this nonsense. It'll be solid gold every time. 

I bet you. 

 

Notes on the AI used in this post.

Ironically, I had a ton of trouble getting images to illustrate my text. I used up a load of credits at NightCafe, trying various models and prompts. None of them were really what I wanted. It turns out that if an AI doesn't know who you're talking about, it's really bad at making a picture of it. Who'd have thought? And it seems as though Carole & Tuesday just isn't sufficiently well-represented in the training data for any of the AIs to be able to render a recognizeable image of even the main characters, let alone the supporting cast. 

This, of course, is where the work comes in. I could have supplied sufficient images to train a model of my own but life's too short to do that for one blog post so instead what we have is:

Top image: DreamShaper XL alpha2 at 50/50, from the prompt "Carole & Tuesday the characters from the anime of the same name . 1970s cartoon. Full color". Background filled out by Uncrop, then manually cropped by me because it kept putting distracting extra characters, peering into the frame or mugging from behind. It's great, being photobombed by an AI.

Third image: same model, same settings. Prompt "drummer in a speed-metal band called Lazy Sandwich. Include the name of the band on the drums. Anime". I had ten goes at ths one and that was the only one that even came close. The AI did not, as asked, "Include the name of the band on the drums". I put that in myself in Paint.net. 

Fourth image: SDXL Beta 50/50. Prompt "Kyle MacLachlan as Special Agent Dale Cooper in Twin Peaks. Scooby Doo style." I did nine of these. I was trying to get a picture of Dale Cooper making a deal with Gus from C&T but none of the AIs could understand what I wanted. One of them put Cooper in a bar and wrote "Gus" on the wall behind him. Another had a hand reaching in from outside the frame offering Dale something that looked like a mutant squirrel killed by a truck. It sounds better than it was.

4 comments:

  1. That gent in the sportcoat has a broken wrist. Or maybe it was reattached strangely. Although to be fair, I think the blonde in the first graphic might have a broken collarbone given the way it unnaturally slumps once you get past the strap. Having had broken collarbones in my past, I know what that feels like, and it is NOT fun.

    ReplyDelete
    Replies
    1. I think he's missing a finger, too. The newer models, particularly the ones you have to pay for, are starting to eliminate all of these faults but even the older, free or cheap ones I use can do better. It's another aspect of my laziness. I can't be bothered to revise and repeat the images, instructing the AIs on what to repair or improve. Obviously, if I had some commercial end in mind I'd be doing all of that but for the blog I just go with what amuses me. And I like the imperfections in the images just like I often like bugs or poor translations in games and glitchy, jittery samples in songs.

      Hmm. There might be a post in all of that...

      Delete
  2. Have you given any consideration to running your own instance of e.g. Stable Diffusion XL? It might take a few minutes to generate images, depending on the power of your hardware, but you have considerably more control, especially with tools like sd-webui which allows you to erase and regenerate portions of the image.

    The hobbyists at CivitAI (https://civitai.com/models) have an enormous quantity of specific LORAs available, too. (Granted, a lot of the hobbyist scene is driven by the desire for, ah, freedom from censorship, so you'll get some libertine content, but there is plenty else besides.)

    ReplyDelete
    Replies
    1. Thanks for the link. I'll check it out. I have thought about it but unsurprisingly I've been too lazy to do anything to make it happen. At the moment I'm content just to play around with what's publicly available but I am tempted to start digging a bit deeper.

      Conversely, I fully expect a whole slew of commercial options to appear soonish, with much more user-friendly front ends that make the idea of pressing one button and having the AI do all the work much closer to reality. I'd pay money for that, if it could feed me the results I'm looking for with minimum input from me.

      Delete

Wider Two Column Modification courtesy of The Blogger Guide