Tuesday, April 16, 2024

Don't Ask Me What's Real. I'll Only Tell You "Everything".


Exactly a month ago
I said I wasn't going to be "dabbling with audio and video" any more," unless and until there are some very major advances. Why? Because "It takes ages and I get nothing interesting out of it.

That remains true for AI-generated video, which still seems a long way from becoming a consumer product. I keep a weather eye on it in case anything worth mentioning develops but so far it's mostly more of the same five second pans and uncanny-valley animation, with tiny, incremental adjustments only the initiated will notice.

AI audio - specifically music - is another matter entirely. Seemingly overnight, a cluster of apps have surfaced, each capable of generating segments of songs that seem barely distinguishable from what, for shorthand purposes only, I'll call the real thing. The first one I ran into was Suno, which I wrote about briefly just over a week ago. 

The AI aggregator There's An AI For That claims to be able to point you to more than a hundred alternatives to Suno but the one that's really getting all the attention is Udio. I watched a couple of YouTube videos about Udio and it looked more than interesting enough to justify some "dabbling". 

Udio is currently in Open Beta. While that lasts you're free to create an astonishingly generous 1200 songs a month. All you have to give them is an email address. The ownership rules on what you make are pretty lenient too, although like all such services they do ask you to credit them, while also retaining the right to do it for you if they feel like it.

At first I just played around with the default text-to-song prompt. That gets you two thirty-second  clips, like the one below, for which I specified some downtempo electronica about an old horse looking back at his life.

The results were pretty good, although no better to my ear than the ones I got from Suno. Once again, the weak point was the AI-generated lyrics. And the titles, which most confusingly change every time you edit or extend a song. AIs still really aren't great at writing anything you'd want to read for pleasure.

What I really wanted to do was upload my own lyrics and have the AI set them to music for me. Both Suno and Udio can do that but the free version of Suno is quite strict in what it allows you to do with anything you create using the service. Udio, at least while it's in beta, is much less restrictive.

With that in mind, I started playing around with Udio to see if I could get it to show me what one of my songs might have sounded like, had I ever managed to get a band to play it the way I wanted it played, something I only rarely and fleetingly achieved because musicians, even incompetent ones, annoyingly have ideas of their own. 

I can't help but be struck by the similarities with the way I used to have to find a group before I could complete certain content in EverQuest. That all changed with the addition of Mercenaries, after which I pretty much never needed to speak to another human being in the game again. AI might just be my musical mercenary solution...

The first problem I ran into was one of duration. Not the thirty-second limit on segments but the way the AI simply speeds the song up to get all the words in. If you give it half a dozen lines it sounds fine. If you give it two verses it starts sounding like The Dickies.

The answer to that is to break the thing up into sections of suitable size and stitch them together, something that's very easy to do using the simple and intuitive interface. If you get muddled, there's a very helpful FAQ

It took me about half an hour to complete my song, which clocked in at 2.44. Just about the perfect length.

It's made up of an intro, two verses, a chorus, a third verse, a second chorus and a coda. That's how I originally wrote it except for the intro, which someone else would no doubt have tacked onto the front whether I liked it or not, had I allowed a bunch of actual musicians to get their hands on it. Along with a solo and some kind of break, no doubt, because musicians always try to complicate things.

When it was done, by far the most surprising thing about it was that the vocal melody, paricularly in the verses, sounds uncannily similar to the one I actually wrote back in the mid-1980s. Eerily so, in fact. If I had one of my old cassettes, I'd upload a version I recorded back then, for comparison. Sadly, even if I was able to find one, I fear all you'd hear after thirty-five years is tape hiss.

The chorus didn't sound much like the one I wrote. More worryingly , the second chorus didn't sound much like the first. It may be that there's a way to cut and paste sections so they're identical but if so I haven't worked out how to do it. I just told the AI to do it again and it did, but differently.

The effect of having the same lines sung in two different ways works quite well, although if it's not the same each time I don't think it actually qualifies as a chorus. There's also an odd moment when the singer appears to improvise a couple of words I didn't give her, one when she rushes the begining of a verse and another when she slurs a word. Oddly, all of those seem to add to the faux veracity of the thing.

Not quite as charmingly quirky are the moments when the segments grind a little against each other before they settle in. All told, though, I have to say it's a better job than most line-ups of any band I ever played in would have been able to come up with. It may not be professional standard but it would definitely have gotten us through any audition needed to play the back room of a pub back in 1985.

Once I was passably content with the music I thought about adding some visuals. I was planning on uploading it to my YouTube channel so I could link to it here and it's nice to have something to watch while you're half-listening, I always find.

My immediate thought was to have another AI make me a video based on the audio file but on investigation that turned out to be way more trouble than I was prepared to take. I've futzed around with that sort of thing before and it always seems to be me doing most of the work. 

As far as I can tell, while the actual output of AI-generated video keeps getting more and more sophisticated, the amount of technical expertise and sheer effort to produce anything longer than three seconds is constantly accelerating too. I was pretty sure it would be quicker to knock something up myself from some old camcorder footage I had lying around so that's what I did.

Actually, it wasn't that much quicker because once I got started I couldn't stop fiddling about with it. I had it done in about an hour and then I thought it would look better with the lyrics and that took an hour more. In the end I got something a not very imaginative twelve year old would probably be mildly embarassed to hand in for media studies homework. Good enough!

The thing to remember here is that I'm very easily pleased. I can hear and see most of what's wrong with what I've made but I still think it's pretty good anyway. I've already watched it half a dozen times and there's every chance I'll watch it half a dozen more.

In fact, the only thing likely to get me to stop is making another one with another of my old songs. I'm very curious to see whether the shape of the lyric, coupled with the intended style, does indeed force the whole thing into a certain melodic pigeonhole. Did I only imagine I was creating those tunes all those years ago, when really they were inherent in the words I was writing and the subculture I inhabited?

I'm aware that we stand on the very edge of musical annihilation here and that in a matter of years or possibly months it may be literally impossible to know if anything we hear contains any human emotion or experience at all. And yet, I'm not unduly concerned. Against such worries I set my faith in the ability of all true creative souls to turn every technical innovation into a means of self-expression.

I'm old enough to remember when album sleeves sometimes bore the passive-aggressive rubric "No Synthesizers". The line between authenticity and artificialty is constantly being re-drawn.

This video I made for a song on which I played none of the instruments and didn't sing a note has words I wrote and images I shot. It sounds remarkably reminiscent of the demo I recorded more than a quarter of a century ago in a rented room with a friend with a guitar and an acquaintance with a drum kit. Only better. 

What's more, I can feel the new pushing out the old. I can already feel the AI singer's phrasing replacing the way I always heard it in my head.

Don't ask me what's real. I'll only tell you "Everything".

2 comments:

  1. I played around with Suno a bit and got a PERFECT SONG, but then it crashed and I lost it. I was so mad. I used the same prompt but it never managed to make a good one again.

    ReplyDelete
    Replies
    1. A video I watched did suggest you'd need to keep iterating to get a good result but either I've been very lucky or my standards are really low because I've been quite impressed with about 75% of my attempts. Then again, if you think about the kind of thing I frequently include in music posts here, it's pretty clear I find glitches, jumps and stutters innately appealing in a song and I certainly don't hold any brief for perfection. I might just be an easy mark for AI.

      Delete

Wider Two Column Modification courtesy of The Blogger Guide