The thing that hadn't occured to me when I started playing around with AI music this time around was just how addictive it would be. It certainly hadn't grabbed me that way the first time I tried it, the best part of a year ago. Turns out there's a huge difference between having a machine churn out some tunes you never heard before and having it bring to life the sounds you've been hearing in your head for forty years.
Actually, there's a bit more to it than that. When I first played around with a couple of AI music generators last year, it was very much in the way of playing with an amusing toy: fun but inconsequential. Which isn't surprising, given that having the AI do one hundred per cent of the work leaves you no other role than being the audience.
When all you're doing is typing in prompts, at best it's like being the guy in the mosh pit who keeps yelling out for the band to play that one song from the second album and then they do. (And yes, I have been that guy...) There's bit of a buzz and a fleeting sense that you might have had some kind of input but then it passes and you never think of it again.
That all changes by at least an order of magnitude when you stop letting the AI make up the words and type in your own lyrics instead. At that point, you do begin to feel some sense of ownership and a degree of artistic involvement in the process. And it's merited, too. I mean, you did write the words. Lyricist is a proper job title.
On a technical level, it also becomes very intriguing to see the extent to which the structure and rhythm inherant in the lyrics, coupled with your instructions on the genre of music and emotional tone to use, all come together to influence the melody. When I was experimenting with it last year, I was quite surprised by how close some of the AI's interpretations were to the original tunes I'd written back in the 'eighties.
Those, though, were the eerie exceptions. Mostly what you get is your familiar words but set to a tune you'd never have thought of and most likely wish you'd never heard. It takes a lot of tries to get the AI to come up with something that feels even okay, let alone right and even when it does it never feels like it's "your" song. It's as frustrating as it is enjoyable.
All of that managed to keep me amused for a couple of afternoons a year ago
but I soon lost interest and I hadn't felt the need to go back for another go
since. It's been much the same story with all the other generative AI agents
I've played around with these last two or three years. It's funny to get an AI
to write a story or a blog post now and again but it gets old fast. As for AI
video, it's a lot of work for very little reward. A few seconds of something
that looks quite fake.
None of which is to suggest these things have no genuine use cases. They certainly do. And that, really, is the point: they're good tools if you have a purpose for them but at the moment that's all they are: tools. It's still you that's going to be doing all the real work, so if you don't have an end in mind, what's the point? You don't buy a hammer just so you can wander around hitting things with it at random. Or I hope you don't, anyway...
With the recovery of my ancient audio-tapes, I finally found a project for which one of the AIs was the exact hammer I needed. That instantly turned the whole experience on its head. Instead of idly playing with the controls to see what would happen, now I was twiddling with them to get a precise result. I was using the tool to a very specific end.
Well... some of the time...
See, here's the thing. Having songs you wrote and recorded back in your youth magically brought to life, almost exactly as you'd always imagined them, that's an amazing experience. But so is hearing those same songs done in a whole range of styles and genres for which they were never intended. And when the results come out sounding exactly like the real songs being covered by a bunch of different bands.... well, it's hard to leave it alone.
I've spent half of this last month trying to get Suno to give me the closest possible approximations of the songs in my head and the other half asking it to give me versions I couldn't even imagine. I've been indulging myself wildly, coming up with bizarre and ridiculous interpretations of the very same songs.
The former is by far the more satisfying, when it works, but the latter is arguably even more addictive. It's irresistibly tempting to see what a grim, dark, miserable song might sound like if it was covered by a hyperactive kawaii future bass act or how a 1970s progressive rock band would handle a ninety-second, sugared-up love song meant for a C86-era tweepop outfit.
Mostly the results are either hilarious or unlistenable but occasionally it just somehow works. Some of the unlikeliest suggestions end up being things I'd happily listen to over and over, like the one above, which was what I got when I set Suno loose on one of the nastiest, darkest songs I ever wrote and asked it to give me a "supercute kawaii bass hyperpop" version - one with "supercute female vocals", just to labor the point. That's actually the correct melody and pretty much the correct phrasing and emphasis, too. If you know what it's supposed to sound like it's quite surreal.
What with the one and the other I've done precious little else since the beginning of March. When I subbed to Suno for a month, I immediately cancelled so the subscription wouldn't auto-renew in April. I thought the five hundred songs that got me would be far more than I'd need for the entire project.
Two weeks later and I'd used them all. I had to buy extra credits, even though you get enough free every day for another ten songs.
At time of writing, I have over 750 songs on Suno. I've saved them in four categories ("Workspaces" as Suno calls them.): Good, Bad, Unrated and a generic unnamed workspace for stuff I either forgot to categorize or haven't gotten around to yet. I also have a workspace for Uploads, songs I've recorded and worked on so far.
Here's how the various categories stack up:
- Good - 373
- Bad - 51
- Unrated - 228
- Workspace - 104
- Uploads - 53
That doesn't include some that I just deleted as I went along. Also, I don't have fifty-three original songs. More like half that. I uploaded different versions of a lot of them.
Uploading is interesting in itself. Unsurprisingly, the more finished the version, the more faithfully Suno follows it. The full band rehearsals I uploaded from my C86 years come out like more polished, better-recorded takes by the same band. Except with a girl singer instead of me. Huge improvement.
The ones with just me and a guitar tend to follow my phrasing, intonation and melody, such as it is, quite closely. They also determinedly stick to my chords and rhythm, provided I prompt for a genre in which all of the above would be appropriate. That can get very close to what I imagine those songs would have sounded like had I been the band-leader rather than just the hired frontman.Finally, there are the songs where I don't have any usable recordings, just the lyrics and my fading memory of what they were meant to sound like. I tried singing those accapella and uploading them but my voice, which wasn't great when I was in my twenties, has very much not improved with age.
I am a much better whistler than I am a singer so I tried whistling a couple instead and that worked surprisingly well. Of course, with only a whistled melody to work from, Suno has to make up the rest. You'd think it wouldn't have a chance of getting anywhere near the result I was looking for. But you'd be wrong.
As you can see, the Good far outweighs the Bad. Suno is really very good at what it does, something I very definitely wouldn't say about its main rival, Udio, on which I wasted ten pounds I wish I hadn't spent. Suno has a lot of idiosyncrasies but it gets the job done. Udio is a waste of time.
The Bad songs are mostly complete failures by the AI to follow instructions although a few are just plain glitches or bugs, where something went badly wrong. The whole generative process is absolutely fascinating. I'd say that about two-thirds of the time the AI is clearly making every attempt to come up with exactly what's been asked for. It doesn't always quite manage it but you can tell that's what it was trying to do.
Then there's a smaller but significant cadre of versions, where the AI appears either to focus wholly on one specific instruction at the expense of everything else or where it sticks closely to the plot for most of the running time then goes completely off-message for brief periods. There's a disturbing tendency for it to go "I've done what you wanted - now it's my turn to have some fun" and produce a decent version of whatever was asked for with ninety seconds of something completely different bolted seemingly randomly onto the end.
Over the course of the month, I've learned a certain amount about how to get exactly what I want but there's still an element of RNG about the whole affair that will feel familiar to any MMORPG player. The exact same prompt that produced a miraculously good result on one song will rarely work as well on another. Part of the reason I have so many versions of the same songs is purely through the necessity for so much trial and error.
Conversely, I finally had to admit to myself that if I wanted the songs to sound like they do in my head, I had to stick to a fairly tight range of instructions. I'd been trying a lot of new things but in the end it was mostly the same few keywords that got me what I was looking for. The whole collection represents the three musical personas I tried on between about 1979 and 1991 and there's no point trying to pretend otherwise. The fourth, missing, persona would have been my punk years, something I have wisely decided to leave where it belongs, back in the past.
Overall, the results have been astonishingly satisfying. I have multiple versions of most of the songs now, which I consider good enough to carry forward to the next stage. That's making lyric videos to post on my new YouTube channel, assuming I have the nerve to go through with making it public. For the moment I'm keeping it strictly private. (Suno automatically creates lyric videos on request, clearly meant for Tik-Tok. Not exactly what I had in mind...)
The biggest problem I have is choosing which final version to go with. For a couple of songs there's been a clear and unequivocal winner, one that I knew immediately was the version, the one that sounded exactly the way I'd always imagined it would.
In most cases, though, I've ended up with several options, each with some small flaw or foible that stops it from being the definitive version. Then it's a case of listening to them over and over and trying to make up my mind. Or, more likely, rolling the dice again, hoping for that perfect take.
I'm about halfway through that stage now. I've completed eleven videos so far, with around a dozen more to go. Making the videos has turned out to be every bit as addictive as making the songs.
But that's a story for next time.
Notes on AI used in this post.
The header image is by StarlightXL at NightCafe. The prompt I entered was very minimalistic: the title of the song, which is "Raised By Wolves (Supercute Mix)".
I'd tried that three times already, along with the exact prompt originally used at Suno to generate the song in the first place: "supercute kawaii bass hyperpop supercute female vocals". I tried it in Flux Schnell and StarlightXL but I didn't get even a single wolf. I just got cute girls with multicolored hair singing into mics.
I've only just noticed that NightCafe now gives you the full "Revised Prompt" that the AI works from. If that was there before, I never noticed it. It's very revealing. The full prompt for the picture I used is"Low-poly art. Medium shot. Wolves raising human children in a futuristic forest. Close-up. Vibrant colors inspired by Syd Mead. Neon blue wolf eyes glowing in the dark. Trees with glowing circuits and wires. Moonlight filtering through the forest canopy. Soft, pastel color scheme with neon accents. Best quality. Futuristic fantasy. Syd Mead style. Low-poly textures. Glowing neon lights. Pastel colors. Moonlit forest. Soft focus."
That is incredibly specific. It also does something I haven't done for a couple of years, which is naming a specific artist. I decided that was a step too far ages ago but it seems the AIs do it anyway. I guess I shouldn't be surprised. I also notice that even though the revised prompt mentions the somewhat essential "raising human children" aspect of the whole thing, there still aren't any humans in the picture. You can have wolves or people but not both, apparently.
So much for the image generators. The other AI in the post is the song itself, which is discussed in the text, and the video that Suno generated for it. I haven't watched the video all the way through so I'm trusting the lyrics are correct. They should be. I typed them in right.
The annoying thing about that video is that you can change the title of the song in Suno but it still uses the title of the uploaded audio anyway. The song is called Raised By Wolves (Supercute Mix) but when I uploaded the recording it's a "cover" of I called it "Raised By Wolves Strangled" to differentiate it from a couple of other uploads of the same song. Even though I later changed the name of that upload to just "Raised By Wolves", the cover remains a cover of "Raised By Wolves Strangled" as far as Suno is concerned and I can't change that in the video.
Lucky I don't plan to use Suno's videos then, isn't it? I'll make my own and call them whatever I want!