Back in the '90s, after I finally got my voluntary redundancy from the very
large company I was working for at the time, I took a year off . Didn't do
very much with it, other than rest, relax and have a pleasant time until the
money ran and I had to find another job. Hey, it was the nineties, okay?
You've all seen
Slacker, right? If not, you should. Someone
put it up on YouTube
in 2017 and it's still there so if you want to go watch it now, I can wait.
Anyway... way to derail my own post, right? So, the point is I had all
this time off and one of the things I was going to do with it was write a
novel. Only it turns out I don't have a novel in me. How about that? Everyone
but me, huh?
What I did have in me, it turned out, was a whole freakin' mess of characters, their never-ending soap-opera-drama, and some kind of ramshackle, rambling narrative that wasn't going anywhere or at
least not anywhere I could follow. I just used to sit down at the PC I'd bought
specially to write the thing and go into what I can only describe as some kind
of trance and come out of it two or three hours later with a few thousand
words that, when I read them back, seemed to
be nothing to do with me. I had no idea how I'd done it. I still don't.
I'm not saying it was like automatic writing in the days of Conan Doyle but it
was freaky as hell and it really took it out of me.
It took me about a week to recover so the whole thing moved quite slowly.
Still, after a year I had something tens of thousands words in two unconnected "stories". There was some kind of shape to both of them
but if either had an actual plot, I couldn't have told you what it was.
And then I had to get another job and I started to use the PC to play video
games, discovered EverQuest, and that was that for a quarter of a century. All of which I believe
I've covered here before, not that I fool myself anyone is likely to remember.
While I was writing them, though, those stories, if that's what we're
going to call them, didn't just sit on a floppy disk on a shelf. They did do
that but they also came out in bimonthly installments in the apazine to which I
belonged, meaning they were nominally public, with a theoretical readership of thirty people, assuming everyone bothered to read them. Again,
something I've written about before and not the point of this post so
let's move on.
What is the point of this post? You may well ask. It's this: I still have all
that stuff, both the writing and its physical manifestation in photocopied
zines. Of course I do. I still have freakin' everything I've ever done, going
back to my exercise books from primary school (They're in the loft.). I never
throw anything away.
I mentioned in
a recent post
that I'd been mining a couple of fragments I wrote around the same time for
lyrics to turn into songs to feed into the all-devouring AI maw. That turned
out really well. I mean really well. I spun those two short
pieces up into a seven song cycle and I'm about as happy with it as I could
be, which, since tend I love my own stuff to a positively nauseating degree to
begin with, is irritatingly predictable.
If you want to judge for yourself, I
made
a playlist. I think they're all pretty good. Your opinion may differ. I'd still like to hear it though.
After the seventh song, even I could tell the well was pretty much
dry. Which was when I had my next big idea. Basically, the same big idea but
who's telling this? If I'd been able to get seven songs out of a couple of
short fragments, how many could I get out of two short novels?
I don't know yet but I've done four so far and once again I'm very pleased
with what I've got. Given I'm finding the process addictive as hell and the
results astonishingly satisfying, there could be a lot more. There
will be a lot more.
And we're still not at the final point of this post, the lede of which I've buried
all the way down here as a reward for the patient, the stubborn and the
determined. Here it is: blogging is great and all and I love it but I'm not sure it's as aesthetically satisfying as what came before.
Here's why.
And that's just the covers. All nine issues of The Final Line, as they originally appeared. Or rather, bleached-out scans that don't do any kind of justice to the grain and texture of the originals, let alone their tactility and feel. Even so, though...
I put a good deal of work into making this blog look as good as I think I can get it and I'm mostly happy with the results but I very much doubt any of the images I've used is going to have the lasting impact on me of any of those covers and I'm not even talking about their ability to trigger memories. (Except the last one. That's not good. I think I knew I was done by then and was already moving on.)
My point, getting to it at last, is that I'm not quite sure why blogs in general are so much less visually and aesthetically satisfying than those old photocopied zines. They are, though. It's a topic I might return to for Blaugust, an event which the observant reader will notice has been on my mind for months now.
I'll tell you how all those covers were made. I spent hours going through photographs I'd taken, flyers I'd picked up, cinema programs, magazines, bits of paper I picked up in the street... anything. Then I blew them up on a photocopier, cut them up with scissors and knives, pasted them with glue and generally hacked them about until I had something that looked good to me.
Is there anything stopping me doing that now? Of course not. I could follow the exact same process or a version of it and at the end, when I'd finished, I could either take a photo of it with my phone or scan it and use it on the blog. But I don't. And why? Because it's easier to take a screenshot and use that. Or, worse, get an AI to fake something up for me. Digitization has made everything orders of magnitude more convenient but at a cost.
The closest I get to that old creativity here, I think, are all the degraded, fucked-about-with images I use at the top of music posts. Making those engages much the same parts of my imagination as those covers did. I'm more likely to look back at those with satisfaction than any other visual images on the blog and yet I pretty much only do it for music posts. Until now, I've never asked myself why. Why not do the same to screenshots or found images or generative AI?
I'll have to go away and think about that. What I also wanted to say is that the whole AI music thing seems to have fired up a creative part of my mind that's been dormant for a long time. I have more creative projects going on now than I can handle, which is unusual for me. Mostly I start something and keep to it until I'm done before I move on to something else, although being "done" should not be mistaken for being finished.
Anyway, the upshot of all this is that there's likely to more of this sort of thing here for a while, since that's what I'm spending most of my free time on just now. Fair warning. Also, I guess, there's a chance some of the usual stuff might start to look a bit different, although honestly I'd be surprised if that happens. Inertia's a bitch.
Also if anyone knows why it's now impossible to center images in Blogger, I'd love to hear about it.
I was looking at my stats this afternoon, something I very rarely do any more,
when I noticed something odd. This year, which is as far back as I looked, posts about
music or AI or the two together seem to be getting slightly more attention than
posts on gaming.
Portmanteau posts, the grab-bags where I jam in whatever I've got lying
around, also seem to be doing a bit better than average, regardless of whether they cover games or not. What aren't going as well are posts about specific games,
unless the game happens to be one that's in the news at the time, like Defiance or Stars Reach. And
even then, I need to put the name of that game in the title to see an uptick in
views.
Bottom of all, and quite consistently so, come posts about Wuthering Waves. Even
posts about EverQuest II do a little better, although admittedly not by much.
That does seem odd, considering WW has millions of players and EQII probably barely makes six figures but I guess
it's hardly surprising not many Wuthering Waves fans are into long-form blogs, let alone
ones as obscure as mine.
At this stage, I really should clarify that the so-called "statistics"
I'm referring to are the ones I see next to each post in the
Blogger console itself. I long ago gave up even opening the emails
Google sends me, telling me I should break out the bubbly because I had
"800 clicks in 28 days" and I never bothered to swap over to the newer
version of Google Analytics when the old one died.
Supposedly, those eight hundred monthly "clicks" come solely from
Google Search, whereas the order of magnitude higher page views, generally settling somewhere between 200-300 per post after a couple of weeks, reflect the number of times
someone has loaded the page in a browser, regardless of how they got there.
(That explanation, by the way, comes from ChatGPT because I've never been
able to find a clear, straightforward explanation anywhere else. Treat it with the caution it deserves.)
However the page views are tallied, as Wilhelm often says, even
bad data can be useful if the source is consistent (Well, something
like that...). I guess on that basis I can say some of the posts I
thought were self-indulgent and of interest only to me might well have been of at
least some interest to others, quite possibly more so than the gaming posts I used to assume were the main reason people came here.
Good news for me, I guess. The blog has always reflected my current interests and I'd like to keep it that way. For a long time that meant all MMORPGs all the time here but my main leisure activity was playing MMORPGs but I'm just not putting in the same hours any more. I still like to write about the games I do play and the ones I'm interested in but if I'm honest, I get a lot more excited about other things right now.
Anyway, that's a long pre-amble to what I hope is going to be a fairly short
post (Spoiler: It wasn't!) on what I've been up to these last few days. I'm sure it won't
surprise anyone reading this to hear I've been fiddling around with
Suno some more but the exact details may not be quite what you'd expect. Not what I was expecting, that's for
sure.
As I wrote the other day, the AI got an update, which gave me a burst of
fresh interest. I was looking forward to trying the new model but the problem was I'd already made far too many
versions of the thirty or so original songs I had available. I knew I probably ought
to start over again and run the whole lot through the upgraded model but fun
thought that might be, it knew it wouldn't be nearly as much fun as creating some entirely new ones.
Except I didn't have any new songs. In the fifteen or so years I was musically
active I doubt I wrote more than fifty altogether. The ones I haven't used by now got left out for very good reasons. They're no good. Embarrassing, some of them. Possibly actionable. You really don't want to know...
That only left me one choice; write some new ones. Except that's really not as
easy as it sounds. You do kind of have to have an idea, just to get started and
I stopped having ideas for songs in the late 1980s.
I strongly believe there's a very, very good reason nearly all good songs are
written by people who... well, who aren't old. There are exceptions, of course, but even those rare songwriters who still turn out good work in their later years rarely eclipse or even equal the songs they wrote when they were young, when everything just mattered so much more. It gets harder to take all those desperate emotions seriously after you've felt them a thousand times. Or harder to convey them to others, anyway.
So, if I was too old to write any new songs and I'd run out of old ones, what was I going to do? I'm so glad you asked!
Remember five years ago, when I posted
a couple of fragments of fiction
from my apazine days? No. I thought not. I'd forgotten it myself until, for
whatever mysterious reason (I have genuinely forgotten how it came about, even though it was only the day before yesterday.) I ended up looking at it again. And as I
was reading it, finding myself surprised by it once again and thinking how I definitely wouldn't
be able to write anything like it now, I had an idea.
What if I cut it up and turned it into song lyrics? I mean, there's a grand
tradition of that sort of thing in rock music, isn't there? Well, there's
Moonage Daydream... although I never did like that one all that much... and
Kurt Cobain apparently did it too. (And Thom Yorke but I'm going
to pretend I didn't know about that.)
This wouldn't be full-on cut-ups, anyway. Leaving Port Silo is a coherent narrative (Oh yes, it is!). I could take
a few, short sections, change them as little as possible. Just reframe the prose
as lyrics, switch a few things around here and there to make it scan. Maybe add a line or two to build some structure...
It certainly
helped that the prose style is imagistic, non-syntactical interior monologue. It comes as close to poetry as prose and poetry is first cousin to song lyrics. So, is that second cousins or once-removed? Close enough for rock and roll, anyway.
I wasn't really expecting much if I'm honest. At best I'd have the words but words aren't songs. The lyrics are important, sure, but you have to have
a tune.
Suno would have been morethan happy to come up with a tune, of course. An infinite
number of tunes. Two problems with that.
First off, in my experience, Suno's own tunes aren't all that great. Second and more important, for me at least, the whole Making Music Using AI As An Instrument trick only works if I feel like it's me doing most of the creative work. There's a huge, existential difference between
hearing an AI turn the song in your head into an actual song coming out of the
speakers and listening to some words you wrote being sung to a melody you
never thought of.
Only way to find out if something works is to try it, though, right? So I picked
a couple of paragraphs and got to work.
When I had something that
looked like a song lyric, I read it through a few times to see if I
could hear the music playing. And I could, if faintly. So I messed about with it some
more, moved a few things about, tried some tentative vocal runs, whistled a few melodies... then I
recorded a guide vocal and uploaded it.
I wasn't expecting a lot. I got a lot more than I expected. This is what I got, first time out:
Those may or may not sound good to you but let's just say I was extremely happy with the results. Although not so happy I didn't
try a whole bunch of times to do it again only better still, if only because neither of the
first two versions follows the lyrics exactly.
Unfortunately, neither did any of the others. Why, I have no idea. It's never happened before or not to that extent, anyway. The odd word, sure. Whole verses missing? Never! I
wonder if it has something to do with the structure, which doesn't conform at all closely to the conventional pop/rock song format? Or maybe it was all the
repetition confused the AI, the way someone will trick a computer into considering a paradox in an old movie, to make the reel-to-reel tapes catch fire.
I did eventually get one take that had all the words in the right
order but it wasn't as vibrant as the first two and I prefer those by a long way, so clearly style really does beat content, the way we all know it does, if we're honest with ourselves. Still, it would be best to have both. Not being able to get the AI to
redo the same version to correct its mistakes is possibly Suno's weakest
point just now.
The first run was so overwhelmingly successful, so much more than I
expected, I spent the next two days doing pretty much nothing else. I've
finagled four songs out of those two prose fragments so far and all of them
are good or at least I like them a lot. So does Suno, apparently. I've got some cracking versions already.
I'm not sure how many more songs I can dig out of the two short fragments but I'm happy to push it as far as it'll go. I always loved those Leaving Port Silo, which is why, when I
found a way to recover all my old fiction from floppy discs back in 2020, it was what I chose to publish here. I never had much of a clue what to do witit , though. The two pieces were only ever meant to be fragments. We used to write
a lot of fragments back in the apazines. No-one really felt the need to finish anything.
Now I finally know what it's going to be: a bunch of sonically and
thematically linked songs. The big thing about songs as opposed to stories is they can be purely impressionistic and still
carry a narrative. They don't need plots. I was always really bad
at plots.
Anyway, that's what I've been up to and what I'm likely to go on being up to
for a while. I'll watch my Blogger stats on this post with interest.
Notes on AI used in this post:
The music, obviously.
Three images, all produced at NightCafe. The first two use the same
model, good old Flux Schnell, my go-to. They also use the same prompt,
three lines from the lyric ("Walking through corn fields/Covered in dust/Lost in this dustbowl) plus a style note ("young female figure, old, worn clothing, line art, color, retro-futurism")
The only difference between the two is that the first one was generated using
the "Short" duration and the second used "Long". I'm not
remotely convinced I can tell the difference, which is concerning because the
Long one costs twice as much. Apart from adding those weird hairy semicircles
and tubular husks, neither of which I asked for or wanted, everything else
looks about the same. I like both images but the first is by a distance the
better, which is why it got the prime spot at the top of the post.
The slightly worrying thing is that, whatever I used to think the unnamed girl
in Leaving Port Silo looked like, now she looks like the girl in picture one. An unrecognized
danger of using AI, that is. You may find it knows your mind better than you do.
For the the third picture I gave Gemini the full lyricsto the
song and asked it for "a prompt for a generative AI image that would produce a suitable illustration
for the cover of a vinyl album featuring this song". I did that because I was already there, trying to teach Gemini to mimic the
prose style I used in the original fragment. After a few tries it was getting better at it but that's a post of its own...
Gemini took an extraordinarily long time thinking about the prompt. I was expecting
something lengthy, verbose and highly detailed, like the ones it gives me for Suno, but in the end all it
spat out was this: "Image Prompt:Dusty fields, a lone figure walking away from a desolate town, vinyl album
cover". I was not impressed.
I was even less impressed when I handed that prompt over to one of the Pro
models at NightCafe. I used one of my five free Pro credits to generate an
image using HiDream1Dev and the results were disappointing. For a
start, the central figure is clearly walking towards the town, not away from
it. Also, she seems to have quite short legs. Not unnaturally short but her
proportions look a little odd.
What I want to talk about today, even though I'm pretty sure no-one but me wants to hear it, is Suno. I wasn't expecting to write about AI music again but that's because, as Vic Reeves used to say, I hadn't thought it through.
Here's the thing: I spent two months generating literally hundreds of versions of about thirty songs until I had what I thought was a definitive version of all of them. I moved on to making the videos and I didn't need Suno for those, so I thought I was probably done with it, for a while at least.
I was just about to unsubscribe then, just before the sub was due to renew, Version 4.5 arrived. Which was a complete surprise to me. Even though the "Cover" feature I'd been using was clearly flagged as "Beta" and there was a drop-down menu of previous models in the app itself, it never occured to me that meant development was still going on.
I mean, I knew, obviously. I just didn't know.
Several things changed with the update, all of them potentially significant for the project. For a start, the cover feature was much improved in 4.5. I'd been very satisfied with the 4.0 beta version but things can always get better so naturally the first thing I did with my remaining credits was generate a few covers for comparison. The difference was hard to miss.
There was a noticeable improvement in quality and the annoying glitches almost completely vanished. Most importantly, the new covers I was generating stuck much more reliably to the uploads. Under 4.0 that had not always been the case.
One of the reason I generated so many covers of the same songs was to get one where the AI didn't decide it had a better melody for the bridge than the one I'd given it or that it could phrase my lyrics more evocatively than I could. Which might have been okay except that nine times out of ten the AI was wrong. The original was better. (The other reason I keep making more and more covers, of course, is that it's incredibly satisfying and tons of fun. I'm not saying it'll never get old but it sure hasn't yet.)
With 4.5 the AI rarely deviates from the template it's given. Hardly at all. When it does, it's usually because I'm trying to make it accomodate my lyrics and melody to an entirely different and wholly unsuitable genre, something I do mostly for my own amusement. I can hardly blame the AI for losing patience with me there.
As well as producing much more accurate versions of my originals, the new model can also make them longer. Output from 4.0 was limited to four minutes. That's doubled in 4.5.
Not that I've got anything that's eight minutes long. This isn't prog rock. But I was having problems with a handful of songs that naturally run just over four minutes. They kept getting cut off in the coda or the final verse.
There are ways to get around that with extensions in the old model but it's fiddly and usually doesn't sound quite right. The new model just keeps going until the song's over, which is infinitely better. And it gets the timing right so the songs have an actual ending rather than just stopping like the machine's been switched off.
All of that meant I've had to think again about the whole project. I might need to go back over all the "finished" versions to see if they really are as good as they could be. I suspect some of them are not. More to the point, I'm now aware there may be further improvements to the AI in future releases, so I'm just going to have to draw a line somewhere for sanity's sake.
Either that or keep playing around with it until I finally get bored, something I'm showing absolutely no sign of doing so far. I have to say that at £8 a month I'm getting at least as much use and entertainment out of the subscription than I ever got from an MMORPG sub. It's cheap entertainment by the Dollar per Hour scale and you have something to show for your time and money at the end.
Perhaps the most intriguing addition to Suno's capabilities in this latest model is the new Prompt Enhancement Helper. The AI as a whole has had its interpretative and creative powers increased and extended, something that's apparent from the results. That's probably the most important change but the most fascinating is the way it now lifts the curtain to show how it's doing it.
I had been under the impression that to get the best results I needed to keep my prompts as concise and tight as possible. I didn't think long, descriptive prompts full of adjectives and metaphors did all that much. I was convinced that even listing multiple genres or naming more than one or two specific instruments led to the AI cherry-picking a couple of them and ignoring the rest.
Based on the new feature, I had it all backwards. What happens when you use the Prompt Enhancement Helper is that your brief summary gets translated into a long paragraph, full of clauses and imagery and opinion. The exact opposite of what I expected, in other words.
For example, if I prompt
"1960s garage rock, female vocals"
the Prompt Enhancer turns it into
"Raw, punchy 1960s garage rock led by gritty female vocals. Begins with fuzzed-out guitar riff and tambourine accents; bass and drums drive a stripped-back rhythm. The chorus bursts open with call-and-response backing vocals and harmonized guitar lines. Short, energetic instrumental break before a brisk finale."
That's some "enhancement".
Because I can't help myself, I tested the results on a few songs using the basic and enhanced prompts and the enhanced ones always come closer to what I was imagining. It's still true that the end result appears to have ignored a lot of the specifics but the overall effect is much closer to the intent. That's great in itself but the best part is that, now I know what the AI is looking for, I can write more detailed prompts myself.
Of course, once I'd started playing with the prompt enhancer, I started getting ideas. The first thing I thought of was to see just what the AI would make of that mysterious, seemingly meaningless prompt I wrote a whole post about a while back, Post-Post-Vibe Cassette.
With enhancement
"Post-Post-Vibe Cassette, female vocals"
becomes
"Dreamy lo-fi "post-post-vibe" cassette track: begins with washed-out, tape-warped synth pads and muffled drum machine groove. Minimalist bass pulses subtly beneath airy, melodic female vocals, layered and filtered. Interspersed analog samples weave throughout, adding texture."
Seriously? Where is it getting all that detail from? As far as I can tell, the original prompt is void of all context other than "female vocals". You can't be "post-post" a "vibe" unless you know what the vibe is and "cassette" isn't a genre. Cassettes don't even sound significantly different to other forms of recording (Except worse.) And the enhanced prompt admits that by putting the phrase in quotes. It doesn't recognize it as a genre. It just takes my word for it that it might be one.
I was curious to see if the prompt enhancer would always interpret post-post-vibe cassette the same way. It doesn't. It's not a fixed response.
Here's another version it gave me:
"Atmospheric synths lay a hazy, lo-fi bed, layered with warbly cassette textures and saturated drums. Minimalist bass pulsates beneath crisp, delayed guitar plucks. Female vocals glide with dreamy reverb, while tape flutter and subtle noise accent shifting, moody sections."
That explains why the results vary when I just use the basic prompt. Even so, the long prompts are very similar. It's clearly not just making stuff up at random. I have no clue how it's landing where it does but it clearly thinks post-post-vibe cassette means something fairly specific, which explains why the variations always sit within a relatively limited range of styles and sounds. They're always a bit woozy and untethered, which is how the descriptions feel.
I had fun playing with that for a while and then I started to wonder... if the AI can interpret an abstract phrase as effectively as that, what could it do with something more concrete? Quite a lot, it turns out.
All the time I'd been using Suno I'd been seeding my prompts with musical genres and styles, specific instrumentation, production and arrangement and some mood descriptors like "sad" or "wistful". It never occurred to me to leave all that out and just tell the AI the context instead.
So I tried this:
"music heard coming from the local radio station of an orbiting space habitat over a vast conurbation in an anime set in the far future, female vocals"
Suno didn't blink. It gave me two very nice interpretations, neither of which felt particularly "spacey" but either of which I could easily imagine hearing in the scenario described.
I have to say that one of the best things about the generative AIs is the way you can ask them to do all kinds of things that would be too embarassing to ask a person. They really don't care. Bearing that in mind, I asked Suno to enhance that description:
"Dreamy synth layers open the song, with glitchy electronic percussion and pulsing sub-bass creating a weightless atmosphere. Bright, airy pads ebb and flow, supporting spacious female vocals processed with subtle delay and reverb. Futuristic textures blend with downtempo beats, weaving ethereal melodies and brief robotized vocal fragments between verses and choruses. The arrangement gradually intensifies, introducing swirling arpeggios, metallic accents, and digital noise, cresting with a lush, enveloping bridge before gently dissolving into haunting echoes and cosmic ambience."
I see the 200 character prompt limit is a thing of the past, then...
Obviously, I wanted to hear the results and I wasn't disappointed. They were excellent, although once again I can't quite see what's futuristic about them. But then, it's an impossible task, predicting the future, even in musical style.
Clearly I am nowhere near done with Suno yet. It's a fantastically entertaining toy. I have a bunch more ideas of things to try and projects to work on. I certainly feel more excited about playing around with it than I do about playing games these days.
In fact, I'm starting to wonder if generative AI might not end up becoming a new entertainment medium in its own right. It's clearly not great at facts but it's pretty good at making stuff up. Maybe that's what it'll end up being used for. I can imagine a whole raft of purpose-specific AIs being marketed and sold just as games are now.
And on that happy thought I'll leave you. I have some songs to generate.
I still have one more post left to write for my cyborg music series, to which, should you need to refer back to it someday, unlikely as that seems, I have given the quasi-ironic label "Home Taping Is Killing Music". I slay me!
Also, I remembered to call it a "Label" not a "Tag" - and early on a Monday morning, too. Yay!
The last post, if I get around to writing it, is going to be all about setting up the YouTube channel and how pointless it's going to be, other than as a very convenient place to enjoy my own work. Before I get to that, though, here's an odd little bonus post I wasn't expecting to write at all.
This is a hard post to quantify. Is it about A.I.? Cultural identity? Serendipity? Or maybe it's the new, fast, automatic supernatural.
I suppose I'd better just get on with it so we can all find out together what I'm talking about.
Here's the background for almost everyone reading this, who doesn't use Suno (Hi, Tipa!) To get a song at all, the software, which I suppose we're beholden to refer to as A.I., requires you to enter some kind of description of the sort of music you'd like it to replicate.
The FAQ, which I didn't even bother to look at until long after I'd stared using the app, tells you virtually nothing about how to do this, referring almost only in passing to "Style of Music", a phrase whoever wrote the article chose to enbolden but not to explain.
The relevant box into which you type your instructions in the app itself is called "Styles" and has the bland instruction "Enter style tags", clearly assuming this is self-explanatory. And it kind of is, although for quite a while I conflated "style" with "genre" in my mind and stuck fairly rigidly to terms I was already familiar with, like "Twee" or" Janglepop" or "Psychobilly".
Fairly soon, however, I started to extemporize, adding descriptive words and phrases indicating moods or techniques such as "sad" or "sweet" or "staccato" or "driving". Then came specific instruments or arrangements - "strings", "cello", "hand-claps", "clean production", "wall of sound" and so on until eventually I was writing mini-essays in note form. There's a 200-character limit but that gets you a lot of description.
Some of this seemed to work, some of it didn't. The software mostly seemed to treat the whole thing as a smorgasbord of suggestions from which it was free to pick and choose as it liked. I noticed that placement of the words and phrases seemed to have some impact so I started putting the most important elements at the beginning and there were always a few instructions, like the gender of the vocalist, that Suno would follow 99% of the time, no matter where they appeared.
Even with experience and care, there was always a significant RNG element to the process. It was impossible to predict which attempt would give me exactly what I'd asked for and which would veer off in some entirely unanticipated direction. Even with the uploaded audio to act as a template, Suno absolutely has a mind of its own and not always a sane mind, either.
The combination of wild unpredictability plus the possibility of hitting the jackpot with a perfect rendition of the song exactly as it was playing in my head made the whole process thrilling, addictive, entertaining and satisfying. If the results had always been what I was looking for first time, I'd have been done with it weeks ago but the randomness keeps me coming back, even though I now have a "finished" version of every song I've uploaded.
Getting back to the style tags, at some point I noticed Suno provides an unlimited number of suggestions in a little box below the input window. For a long time I thought these were from its own Style Library but eventually I figured out they're just examples of things users have actually typed in. The spelling mistakes gave it away.
I found those quite useful on occasion but mostly only because they reminded me of sub-genres I already knew but had forgotten about, like "progressive folk" or ones I'd not heard of but immediately understood, like "emocore". I did discover a couple of well-established but new-to-me genres that I really like that way, too, though. Both "futurepop" and "kawaii future bass", are actual, existing sub-genres and I'm very happy to have been introduced to them.
And then I ran across post-post-vibe cassette. Say what, now?
The strange combination of words jumped out at me the moment I saw it. It seemed both bizarre and contextually meaningless so I guessed it had to be a micro-genre I'd not happened upon before. Hardly surprising. There are an awful lot of micro-genres now. No-one can be expected to know them all unless they write about the subject for a living.
I was curious so I googled it. There were no relevant results. After some finessing, I finally got google to spit out one link. It went back to Suno, where I'd begun, and even then all it was was a song someone had made using the tag. Trying again today, I don't even get that much.
As far as I've been able to tell, there's no such genre as post-post-vibe cassette. It's just a bunch of words that don't appear to have much in the way of semantic value. Still, I wanted to know what it would sound like, if it sounded like anything.
So I gave the tag to one of my most throwaway songs, along with just one other instruction, "supercute kawaiifemale vocals" because I'd just been playing around with some kawaii future bass. I didn't know what to expect.
What I got was something quite lovely. Considerably better than the song, which is really not much more than a draft for another, better song I wrote afterwards, deserved.
I thought it was probably a fluke so I tried again, And again. And it kept working. I add descriptive notes to all the covers of my songs as I first hear them, so I can easily find the good ones again. Here are some of the descriptors I've appended to post-post-vibe cassette versions so far:
Lovely
Also Lovely
Really Good And Very Odd
So Weird, So Good
Very Odd But I Like It
Astonishing
More Astonishing
Pure Magic
and finally...
PPVC Never Fails.
Because, so far, it never has. Out of more than a dozen tries I have yet to have a single failure. The nearest was that one time it skipped almost the entire lyric and gave me an instrumental with a vocal coda - and even that worked!
If I had to describe the post-post-vibe cassette aesthetic I guess I'd say it has elements of vaporwave and futurepop but with a focus on melody, rhythm and coherence. It's very cool and restrained, yet also very welcoming and approachable, contradictory though that sounds.
It needs almost no other tags to do its thing. All I've been doing is specifying the vocal style and leaving everything else to chance. Usually, that's a disaster but here it seems to work every time. I've also been having the best-ever results in terms of the vocals sticking pretty much exactly to my melodies and phrasing, as per the uploads. And for once, on the rare occasions where it improvises, it comes up with something as good or better than my original guide vocal. Usually that is very much not how Suno works.
Maybe I've just had a great run on the RNG. Or maybe there's something going on I don't understand. Someone made this tag up, after all. Perhaps they knew something.
Either way, I'm going with it as long as it lasts. I plan to make covers of all my songs in the post-post-wave cassette style, whatever the hell that is, just so I can have the very great pleasure of listening to them myself.
And if they keep on turning out as good as this, I may very well make another YouTube channel just for them. If post-post-vibe cassette isn't a real genre, it damn well should be.
Notes on AI used in this post.
All the pictures. All done at Nightcafe using FluxSchnell on default settings. All using prompts taken directly from the text of the post. In order of appearance:
"Post-post-vibe cassette"
""terms I was already familiar with, like "Twee" or" Janglepop" or "Psychobilly""
"the randomness keeps me coming back"
and "no such genre as post-post-vibe cassette".
I had to use more credits than usual to get anything useable because the post offers very few tangible, visual examples to work with. The output from FluxSchnell using such abstract phrases really is very unpredictable, unlike the results from the equally abstract post-post-vibe cassette tag on Suno, ironically.
Didn't think I was going to have time for a post today but it appears I do
so I'm going to carry on with the series about making music with the help of
technology, including but not limited to AI. Or, in this case, making videos.
Mostly by brute force.
I started a few weeks ago but I wasn't starting entirely from scratch. I've
made a few videos before. Mostly for game-related stuff to support posts here.
I also have many dozens of hours of camcorder footage from holidays, going
pretty much back to the '90s, some of which I have occasionally bothered to
edit and turn into mini-movies that about two people enjoyed watching, one of
those people being me and the other being my mother. And I don't think she was
all that interested. Mrs Bhagpuss wisely opted out of most of the viewings,
even though she was in them.
I have also, even more occasionally, made videos for songs I liked. Other
peoples' songs, that is. I quite enjoyed it but seemed like a lot of work so I
didn't do it often and I don't think I've done any at all for more than ten
years.
In the last month I've spent more than fifty hours making three-minute videos
for songs and I believe I can at least claim some improvement, even if
actually being "any good" at video-making is still somewhere over the
horizon. I am also a lot more motivated than ever before to keep on doing it
because it turns out that, like most things, when you have an actual goal in
mind, working to achieve it becomes satisfying and even fun.
As soon as I decided I was going to put all the songs I was making onto
YouTube I realized there were two things I'd have to do: make a new
YouTube channel and make videos for all of them. I wanted them on
YouTube because that would be where I'd be most likely to watch them - and it
is me I'm mostly making them for. I suffer from Reverse Imposter
Syndrome, where I think everything I create is pretty damn good whether it is
or not and I rarely tire of reading, listening to or watching my own work.
I also wanted them on a channel that wasn't already cluttered up with a load
of other nonsense, just in case I did eventually decide to make them public.
Still thinking about that.
Obviously, I didn't have to make videos at all. I could just have
posted the songs with static images. If I'd done that I'd be finished already.
Millions of people do it that way. It's perfectly acceptable. As a viewer,
though, I dislike it intensely. I've apologized plenty of times in music posts
here for linking to songs on YouTube that have no video. It just seems rude,
somehow. It's orders of magnitude more likely I'll listen to a song that also
has a video than one with just a picture so I feel obligated to extend that
courtesy to others, if I ever do decide to open them up to the world.
Unless it's one of those bloody videos of a turntable going round and round.
Those are worse than no video at all.
Once I'd decided to make the videos, it was pretty obvious they were going to
be lyric videos. Since there's no actual performer, these being songs that
have been brought to life by machines, obviously there weren't going to be any
performances to share. I certainly wasn't going to film myself, as a sixty-six
year old man, miming to the vocals of a twenty-something woman, not even one
who only existed because I'd just made her up.
All the songs have female vocals, by the way. Well, almost all. There are a
couple that don't, yet, although they may not make it out of production that
way. Given my near-inability to play male characters any more and now this
strongly negative reaction to hearing my lyrics sung by a male voice, I have
to wonder, sometimes. Then again, I do have a fairly strong and
well-established preference for the female voice over the male, especially in
popular song, so maybe that's all it is: aesthetics.
I like lyric videos, anyway. I quite often prefer them to the
"official" videos, which all too often have the whiff of am-dram about
them, along with far too many rubber masks, animal costumes and food
fights.
It's useful to be able follow along with the lyrics as you listen, too. That's
how we did it in the olden days, when we used to sit cross-legged with a
gate-fold album cover open on our laps, squinting at the badly-printed words.
It encourages you to sing along, something I do more often than you might
imagine. Or possibly less often, if you're a regular reader of this blog and
already have your own ideas about the kinds of things I might do...
With the necessity of videos established, the next question was what would
they be videos of? They had to be of something. Lyric videos that
consist of nothing but words scrolling across the screen do exist but in my
opinion they probably shouldn't.
Luckily, I immediately realized I was sitting on the ideal resource: all those
countless hours of holiday video, much of it digitized and some already tucked
away on the hard drive of the very PC I was sitting at.
It would be totally unreasonable to expect anyone to watch straight extracts
from my tedious home movies, especially given that what I mostly like to point
the camera at when I'm away are very large, very old buildings, all of which
tend to look much the same. In between all those castles and churches, though,
are fragments of all kinds of things that just happened to take my fancy at
the time. And, of course, a lot of weather.
Running the lyrics over a backdrop of blue skies, clouds, sparkling water and
sunsets seemed like it might work. At least it would be pretty. Better still,
as I mentioned in a previous post, an awful lot of my songs seem to be partly
or even mostly concerned with the weather, so it might even be appropriate.
As it turned out, though, the weather condition I reference most often as a
lyricist is undoubtedly rain and if there's one thing I don't generally like
to film when I'm on holiday it's rain. I kind of want to come back with the
impression the sun never stopped shining. Consequently, I have almost no
footage of rain, other than a couple of torrential downpours and a
thunderstorm or two, where things seemed spectacular enough to be worth
recording for posterity.
Still, no-one said the interpretations had to be literal. That often
looks labored and anyway, as I found very quickly, trying to fit pictures to
words in anything other than the loosest fashion is bloody hard work. So
mostly I haven't bothered with anything more than the odd, felicitous nod to
what the songs might be about. Always assuming I know what that is. Which,
after forty years, I quite frequently do not.
Having decided to do it, the next question was how. In the past, I've only
really ever used the basic Movie Maker software that Microsoft used to
include with Windows. They have, apparently, discontinued it now,
although it's still on my PC so it must have still been there in Windows 10.
Or maybe I had to download it separately. Anyway, I still have it.
The first three or four videos I made for my new-old songs were done entirely
with Movie Maker and they're okay as they go but it became apparent pretty
quickly that I'd already used up most of the possibilities and I still had
more than two dozen to do. They were all going to look pretty samey unless I
came up with a better idea.
That led me to start googling for alternatives, of which there are many. I
wasn't planning on spending any money so I was limited to the free apps. I
also dislike using free trials so that narrowed it down some more.
By far the most widely-recommended free video-editing app seemed to be
ShotCut. I downloaded that and had a play around with it.
It's great for fucking up images, which is one of the main things I wanted it
for. I absolutely did not want my videos to look like generic camcorder
holiday snaps, that being exactly what they are, so I needed to mess them up a
little, in the way everyone does when they upload stuff they shot on their
phones to social media.
After some trial and error I decided my go-tos would be the filters
labeled "Old Film. There are half a dozen of those and they all have
their merits but I particularly like the Technicolor filter, even if it's
not always extreme enough for me. I sometimes use the Saturation and
Vibrance ones as well and in extreme cases the RGB Shift. That
really makes it look like something I would have loved in the early seventies,
when I was young and had no discernment. Not that I have all that much now...
I found the filters very easy to work with but Key Frames needed more thought.
And effort. And patience. As for the subtitles, they didn't seem either
flexible or intuitive. I was finding the learning curve needed to get the most
out of ShotCut a little daunting so I started looking at easier options. Or
maybe just some specialized apps to do specific things.
The app most widely suggested for adding titles and captions was
CapCut. I installed it and found it excellent - until I tried to
download what I'd done. CapCut likes to say it has a free version and it does
but it also has an extremely annoying and quite clever way of getting you to
buy the Pro version instead: it lets you use the Pro features for free, then
asks you to pay for them as soon as you try to download the video you just
made. That's really working the sunk cost levers.
Luckily, there's a workaround for that. It transpires that CapCut used to be
far more generous with what it allowed free users to get away with and it's
still possible
to install older versions of the software that stick to those rules. You do
have to be constantly on the lookout for annoying Upgrade pop-ups because once
you have your nice, free 2024 edition, the last thing you want to do is
accidentally swap it out for the current model. I've already had to uninstall
and re-install the damn thing twice through not paying enough attention to
what I was clicking on.
CapCut is very good indeed for positioning, timing and stylizing captions or
titles but it is very bad at using AI to interpret what a singer is singing.
Useless, in fact, although it claims otherwise.
You don't, of course need AI to add the lyrics - you can do it perfectly well
manually - but it saves a lot of time if you can get an AI to do it,
especially if it also works out all the timings for you as well. I went
shopping for one that could do both and I found one: Microsoft's very
own Clipchamp.
Clipchamp has an embedded AI because of course it does - everything Microsoft
do has to be AI-enabled these days. This one purports to be able to produce
the full lyrics from a video and turn them into subtitles. And it sort of
does.
There are two problems, other than the inevitable mishearings: firstly, when
Microsoft say subtitles they mean subtitles. The app is meant for
captioning podcasts or spoken-word videos, I believe, and it likes to put the
words along the bottom of the screen, where such things are supposed to go.
Music videos with the lyrics running along the bottom like a
Ted Talk look really stupid. Ugly. Just horrible. Even Movie Maker
doesn't limit you to that. Clipchamp probably doesn't either, if you know how
to use it, but I haven't tried to find out because CapCut lets you put text
anywhere and also make it do things like dissolve or fly off the screen. And I
already knew how to use CapCut.
After a few frustrating attempts to get something half-way decent-looking out
Clipchamp, I figured out how to export the Clipchamp AI-generated lyrics in
SRT format and upload the file into CapCut. Problem solved.
Well, partly. The other thing Clipchamp likes to do is skip verses, usually at
the start. I have no clue why. It doesn't always do it. It doesn't even mostly
do it or I wouldn't bother using it at all. Mostly it gets the whole
song right but sometimes it just... doesn't. And as far as
I can tell, if it decides to start transcribing only after thirty seconds
once, it will always start transcribing thirty seconds in on that
particular video. Even if you re-upload it under a different name. Very
annoying.
Nothing a bit of typing and tweaking in CapCut can't fix, though. And honestly
there's a lot of that going to be needed anyway. By the time you've corrected
the bits the AI misheard and changed the font and moved the positioning around
and stretched some bits and shrunk others and changed the phrasing and taken
out all the punctuation the AI thought ought to have been in there, you
sometimes wonder if it mightn't have been easier to type the whole thing in
yourself .
So far I've made fifteen videos and I don't believe a single one of them has
taken me less than three hours. Several took most of a day. For a three minute
song. Actually more like two -and-a-half minutes in most cases. It's a
labor-intensive process, even with all the labor-saving shortcuts.
First I have to flick through the source material to find a few seconds
here, a few seconds there, tiny bits I can use. Then I have to import those
into MovieMaker, line them up, change the speeds, stitch them together to get
a rough cut the same length as the song. Then it's into ShotCut to rough up
the rough cut some more before taking it to Clipchamp to get the lyrics.
Finally it's off to CapCut to finish the whole thing off.
Usually at least one of those stages goes wrong, somehow, and has to be
adjusted or even redone. Sometimes the whole thing just doesn't come together
the way I was imagining and it's back the drawing board. As yet I haven't had
to completely abandon anything, just move the parts around, but even that
takes a good, long time.
And even when it works, sometimes it still doesn't. Last night I spent
the best part of two hours just getting the timings of a bunch of transitions
exactly right in Movie Maker. When I was happy they were spot-on, I
exported the project to an MP4 file and uploaded it to ShotCut only to find
half the timings were a second or two adrift, as you can clearly see in the
above video. Apparently it's a known bug but since the app is no longer
supported by Microsoft no-one's going to fix it.
That is the first and only time it's happened so fingers crossed it won't
trouble me again. Anyway, it had the serendipitous effect of making me realize
I didn't have to make these things perfect, as if I even could. They're
backdrops for the words and the music, not works of art in their own right. So
as long as there aren't any spelling mistakes it doesn't really matter if the
color change is a few seconds behind the beat, does it?
Well, probably not, but you like to do your best work at all times, don't you?
And the whole process is extremely involving, addictive and enormously
entertaining, so the temptation to keep at it until it's as good as it can
possibly be is high. Video editing also has the merit of being a potentially
useful skill, so even if it takes up a huge amount of my free time, that's
arguably time well-spent. Certainly better-spent than it would have been
playing video games, anyway, which is what I would be doing otherwise.
Whether I'll ever reach the point where I even feel subjectively good at
making three-minute music videos is very far from certain. I think it's a safe
bet I'm never going to be objectively good at it. I am, at least,
better at it now than I was a month ago, though, so that's something.
I have another nineteen songs to make videos for and then, if I want to carry
on after that, I'll either have to write some new ones or do ones of other
people's stuff. I wouldn't rule either out. I re-tuned my guitar a couple of
days ago. First time I've picked it up to do anything other than dust around
it since 1994.
I mostly did it because I discovered there's an app for tuning now. And it
works. And its free. I used to really hate tuning. And I was really bad
at it. Now both my guitars are in tune. And they're both completely unplayable
because they both have actions like cheese-graters and my fingers need six
months hardening-up even to hold down the strings.
I'm thinking of buying a new guitar with an easier action. And an amp. Geez. What have I started?
The thing that hadn't occured to me when I started playing around with AI
music this time around was just how addictive it would be. It certainly hadn't
grabbed me that way
the first time I tried it,
the best part of a year ago. Turns out there's a huge difference between having
a machine churn out some tunes you never heard before and having it bring to
life the sounds you've been hearing in your head for forty years.
Actually, there's a bit more to it than that. When I first played around with
a couple of AI music generators last year, it was very much in the way of
playing with an amusing toy: fun but inconsequential. Which isn't surprising,
given that having the AI do one hundred per cent of the work leaves you no
other role than being the audience.
When all you're doing is typing in prompts, at best it's like being the guy in
the mosh pit who keeps yelling out for the band to play that one song from the
second album and then they do. (And yes, I have been that guy...) There's bit
of a buzz and a fleeting sense that you might have had some kind of input but
then it passes and you never think of it again.
That all changes by at least an order of magnitude when you stop letting the
AI make up the words and type in your own lyrics instead. At that point, you
do begin to feel some sense of ownership and a degree of artistic
involvement in the process. And it's merited, too. I mean, you
did write the words. Lyricist is a proper job title.
On a technical level, it also becomes very intriguing to see the extent to
which the structure and rhythm inherant in the lyrics, coupled with your
instructions on the genre of music and emotional tone to use, all come
together to influence the melody. When I was experimenting with it last year,
I was quite surprised by how close some of the AI's interpretations were to
the original tunes I'd written back in the 'eighties.
Those, though, were the eerie exceptions. Mostly what you get is your familiar
words but set to a tune you'd never have thought of and most likely wish you'd
never heard. It takes a lot of tries to get the AI to come up with something
that feels even okay, let alone right and even when it does it never feels
like it's "your" song. It's as frustrating as it is enjoyable.
All of that managed to keep me amused for a couple of afternoons a year ago
but I soon lost interest and I hadn't felt the need to go back for another go
since. It's been much the same story with all the other generative AI agents
I've played around with these last two or three years. It's funny to get an AI
to write a story or a blog post now and again but it gets old fast. As for AI
video, it's a lot of work for very little reward. A few seconds of something
that looks quite fake.
None of which is to suggest these things have no genuine use cases. They
certainly do. And that, really, is the point: they're good tools if you have a
purpose for them but at the moment that's all they are: tools. It's still you
that's going to be doing all the real work, so if you don't have an end in
mind, what's the point? You don't buy a hammer just so you can wander around
hitting things with it at random. Or I hope you don't, anyway...
With the recovery of my ancient audio-tapes, I finally found a project for
which one of the AIs was the exact hammer I needed. That instantly turned the
whole experience on its head. Instead of idly playing with the controls to see
what would happen, now I was twiddling with them to get a precise result. I
was using the tool to a very specific end.
Well... some of the time...
See, here's the thing. Having songs you wrote and recorded back in your
youth magically brought to life, almost exactly as you'd always imagined them,
that's an amazing experience. But so is hearing those same songs done in a whole
range of styles and genres for which they were never intended. And when the
results come out sounding exactly like the real songs being covered by a bunch
of different bands.... well, it's hard to leave it alone.
I've spent half of this last month trying to get Suno to give me the closest
possible approximations of the songs in my head and the other half asking it
to give me versions I couldn't even imagine. I've been indulging myself
wildly, coming up with bizarre and ridiculous interpretations of the very same
songs.
The former is by far the more satisfying, when it works, but the latter is
arguably even more addictive. It's irresistibly tempting to see what a grim,
dark, miserable song might sound like if it was covered by a hyperactive
kawaii future bass act or how a 1970s progressive rock band would handle a
ninety-second, sugared-up love song meant for a C86-era tweepop outfit.
Mostly the results are either hilarious or unlistenable but occasionally it
just somehow works. Some of the unlikeliest suggestions end up being
things I'd happily listen to over and over, like the one above, which was what
I got when I set Suno loose on one of the nastiest, darkest songs I ever wrote
and asked it to give me a "supercute kawaii bass hyperpop" version -
one with "supercute female vocals", just to labor the point. That's
actually the correct melody and pretty much the correct phrasing and emphasis,
too. If you know what it's supposed to sound like it's quite surreal.
What with the one and the other I've done precious little else since the beginning
of March. When I subbed to Suno for a month, I immediately cancelled so the
subscription wouldn't auto-renew in April. I thought the five hundred songs
that got me would be far more than I'd need for the entire project.
Two weeks later and I'd used them all. I had to buy extra credits, even though
you get enough free every day for another ten songs.
At time of writing, I have over 750 songs on Suno. I've saved them in four
categories ("Workspaces" as Suno calls them.): Good, Bad,
Unrated and a generic unnamed workspace for stuff I either forgot to
categorize or haven't gotten around to yet. I also have a workspace for
Uploads, songs I've recorded and worked on so far.
Here's how the various categories stack up:
Good - 373
Bad - 51
Unrated - 228
Workspace - 104
Uploads - 53
That doesn't include some that I just deleted as I went along. Also, I don't
have fifty-three original songs. More like half that. I uploaded different
versions of a lot of them.
Uploading is interesting in itself. Unsurprisingly, the more finished the
version, the more faithfully Suno follows it. The full band rehearsals I
uploaded from my C86 years come out like more polished, better-recorded takes by
the same band. Except with a girl singer instead of me. Huge improvement.
The ones with just me and a guitar tend to follow my phrasing, intonation and
melody, such as it is, quite closely. They also determinedly stick to my chords
and rhythm, provided I prompt for a genre in which all of the above would be
appropriate. That can get very close to what I imagine those songs would have
sounded like had I been the band-leader rather than just the hired frontman.
Finally, there are the songs where I don't have any usable recordings, just
the lyrics and my fading memory of what they were meant to sound like. I tried
singing those accapella and uploading them but my voice, which wasn't great
when I was in my twenties, has very much not improved with age.
I am a much better whistler than I am a singer so I tried whistling a couple
instead and that worked surprisingly well. Of course, with only a whistled
melody to work from, Suno has to make up the rest. You'd think it wouldn't
have a chance of getting anywhere near the result I was looking for. But you'd
be wrong.
As you can see, the Good far outweighs the Bad. Suno is really very good at
what it does, something I very definitely wouldn't say about its main rival,
Udio, on which I wasted ten pounds I wish I hadn't spent. Suno has a
lot of idiosyncrasies but it gets the job done. Udio is a waste of time.
The Bad songs are mostly complete failures by the AI to follow instructions
although a few are just plain glitches or bugs, where something went badly
wrong. The whole generative process is absolutely fascinating. I'd say that
about two-thirds of the time the AI is clearly making every attempt to come up
with exactly what's been asked for. It doesn't always quite manage it but you
can tell that's what it was trying to do.
Then there's a smaller but significant cadre of versions, where the AI appears
either to focus wholly on one specific instruction at the expense of
everything else or where it sticks closely to the plot for most of the running
time then goes completely off-message for brief periods. There's a disturbing
tendency for it to go "I've done what you wanted - now it's my turn to have some fun" and produce a decent version of whatever was asked for with ninety seconds
of something completely different bolted seemingly randomly onto the end.
Over the course of the month, I've learned a certain amount about how to get
exactly what I want but there's still an element of RNG about the whole affair
that will feel familiar to any MMORPG player. The exact same prompt that
produced a miraculously good result on one song will rarely work as well on
another. Part of the reason I have so many versions of the same songs is
purely through the necessity for so much trial and error.
Conversely, I finally had to admit to myself that if I wanted the songs to
sound like they do in my head, I had to stick to a fairly tight range of
instructions. I'd been trying a lot of new things but in the end it was mostly
the same few keywords that got me what I was looking for. The whole collection
represents the three musical personas I tried on between about 1979 and 1991
and there's no point trying to pretend otherwise. The fourth, missing, persona
would have been my punk years, something I have wisely decided to leave where
it belongs, back in the past.
Overall, the results have been astonishingly satisfying. I have multiple
versions of most of the songs now, which I consider good enough to carry
forward to the next stage. That's making lyric videos to post on my new
YouTube channel, assuming I have the nerve to go through with making it
public. For the moment I'm keeping it strictly private. (Suno automatically
creates lyric videos on request, clearly meant for Tik-Tok. Not exactly what I
had in mind...)
The biggest problem I have is choosing which final version to go with. For a
couple of songs there's been a clear and unequivocal winner, one that I knew
immediately was the version, the one that sounded exactly the way I'd always
imagined it would.
In most cases, though, I've ended up with several options, each with some
small flaw or foible that stops it from being the definitive version. Then it's a
case of listening to them over and over and trying to make up my mind. Or,
more likely, rolling the dice again, hoping for that perfect take.
I'm about halfway through that stage now. I've completed eleven videos so far,
with around a dozen more to go. Making the videos has turned out to be every
bit as addictive as making the songs.
But that's a story for next time.
Notes on AI used in this post.
The header image is by StarlightXL at NightCafe. The
prompt I entered was very minimalistic: the title of the song, which is "Raised By Wolves (Supercute Mix)".
I'd tried that three times already, along with the exact prompt
originally used at Suno to generate the song in the first place: "supercute kawaii bass hyperpop supercute female vocals". I tried it in Flux Schnell and StarlightXL but I didn't get even a
single wolf. I just got cute girls with multicolored hair singing into
mics. Also, I've only just noticed that some of the wolves have more than the requisite number of legs. I thought that was a solved problem with AI image generation but apparently not.
I've only just noticed that NightCafe now gives you the full "Revised Prompt"
that the AI works from. If that was there before, I never noticed it. It's very
revealing. The full prompt for the picture I used is
"Low-poly art. Medium shot. Wolves raising human children in a futuristic
forest. Close-up. Vibrant colors inspired by Syd Mead. Neon blue wolf eyes
glowing in the dark. Trees with glowing circuits and wires. Moonlight
filtering through the forest canopy. Soft, pastel color scheme with neon
accents. Best quality. Futuristic fantasy. Syd Mead style. Low-poly
textures. Glowing neon lights. Pastel colors. Moonlit forest. Soft focus."
That is incredibly specific. It also does something I haven't done for a
couple of years, which is naming a specific artist. I decided that was a step
too far ages ago but it seems the AIs do it anyway. I guess I shouldn't be
surprised. I also notice that even though the revised prompt mentions the
somewhat essential "raising human children" aspect of the whole thing,
there still aren't any humans in the picture. You can have wolves or people but not
both, apparently.
So much for the image generators. The other AI in the post is the song itself,
which is discussed in the text, and the video that Suno generated for it. I
haven't watched the video all the way through so I'm trusting the lyrics are
correct. They should be. I typed them in right.
The annoying thing about that video is that you can change the title of the
song in Suno but it still uses the title of the uploaded audio anyway. The
song is called Raised By Wolves (Supercute Mix) but when I uploaded the
recording it's a "cover" of I called it "Raised By Wolves Strangled" to
differentiate it from a couple of other uploads of the same song. Even though
I later changed the name of that upload to just "Raised By Wolves", the cover
remains a cover of "Raised By Wolves Strangled" as far as Suno is concerned
and I can't change that in the video.
Lucky I don't plan to use Suno's videos then, isn't it? I'll make my own and
call them whatever I want!
Time for the next episode in our thrilling story! So far it's been all
set-up and no action. Will today see some new music being made at last?
Maybe, although I see three potential points of contention just in that last
sentence alone.
How "new" can music be if it was written and recorded forty years
ago?
Is it really "music" if an AI is making it?
And can an AI actually "make" anything?
I'm going to take a bit of a position on this one right away. Having spent
many hours with the tools and having listened to the results, I'm going to say
yes to all three. This is music. I did make it.
As for it being new, it's new in the form it exists now, although it also
remains time-locked in some essential, existential way. It's sometimes seems
to me, as I listen to the new versions of these old songs, that they were
written by someone I scarcely recognize. Someone who isn't me any more. It's
the good old intentional fallacy brought to light yet again.
I'll get to that later, probably. First, the mechanics. So, in the
last two
installments, I reported on how I got the old recordings off tape and onto my
hard drive and how I decided what software to use to try and turn them into
something that sounded like the music I was hearing in my head.
At that point I was under the impression I'd be able to upload the files to
one of the AI apps and tinker around with it to get what I wanted. That turned
out to not to be the case at all. Not at first.
Like that's going to help...
The mistake I made was to assume that the "Edit" option in
Suno would allow me to do things like specify instrumentation and vocals
and have the AI replace my strumming and singing with a facsimile of someone who
could do it properly. Then I could ask the AI to add drums and bass and so on
until I'd built up a composite version that sounded how I imagined it should.
This is categorically not how the software works. There is an Edit
function but it doesn't really do any of that. It lets you extend the song,
replace sections of it, crop it, fade it and change the lyrics. It's still the
same damn song, though.
I played around with that for a bit but it was totally useless for my
purposes. It was still me singing and playing and it still sounded like a bad
recording of a bad singer and a bad guitarist.
I was pretty fed up at that point and on the verge of giving up on the whole
idea, at least until the technology advanced some more, when I happened to
notice, purely by chance as I was fiddling with the settings and pressing
buttons to see what they did, that there was something called a "Cover"
function.
"This feature lets you take anything from a simple voice memo to a
fully-produced track and transform it into an entirely new style, all
while preserving the original melody that makes it uniquely yours."
There's a lot more. You can read it all at the link if you're interested. The
tl:dr is that the Cover function does exactly what the name suggests: it
creates a cover of whatever you feed into it. It's just as though you'd asked
a performer or a band to cover a specific song, given them a recording and let
them get on with it.
It's magic, basically. Actual magic.
This immediately had two effects on me, other than making me punch the air and
yell "Yes!", obviously. As I began to play around with the Cover
function I realized I could either try to get the AI to produce the closest
possible match to how I'd always wanted the songs to sound, if I could ever
have gotten a bunch of people to play them the way I wanted - or I
could indulge myself with endless variations, hearing my songs being performed
in all kinds of deeply inappropriate styles, the way Nouvelle Vague or
Postmodern Jukebox have been doing for years.
The Cover feature is still in beta. Can't wait to see what it's going to be when it's done.
Naturally, I did both. I couldn't stop myself. I still can't. I realize now
that doing it the way I have has implications I hadn't considered when I
started but it's too late to worry about that now. Also, those implications
are probably best left for another post altogether.
For now, let's stick to the mechanics. So, how does this Cover thing work?
It's simple - until it's not. You upload your source material to Suno, click
the menu option for "Cover", specify whatever style or genre you want,
along with any other specific instructions such as mood, type of singer etc.
Then you hit "Create" and sit back and wait.
Not for long. It takes Suno maybe thirty seconds to spit out two covers. You
always get two. Suno does everything in pairs. And even though the
instructions are identical, the two versions are often radically different.
They are also often radically different from the instructions and
sometimes from anything you could possibly have expected. Here's something you
really need to know about Suno: it has a mind of its own.
Suno's wild fantasies can be amusing or bizarre but it wouldn't be much use if
it didn't do what it was told most of the time. Better than eighty per cent of
the time, I'd say. Then, it sticks fairly closely to whatever you've told it
to do.
I'm not going to buy into the whole "Prompt Engineer" malarkey but
there are some basics anyone needs to pay attention to. Suno doesn't parse
long lists of styles and moods well, for example. If you ask it to produce
something that's "Indiepop, Dark Pop, Dreampop, C86, Janglepop"
it won't usually try to meld all of them together into a pleasing gestalt;
it'll pick one and go with that.
It also doesn't especially like to try and create unlikely style or genre
combos like "Twee Funk Psychobilly" although sometimes it can be
persuaded to give it a go. On the other hand, it clearly finds some
combinations very comfortable, so it will happily give you "Futurepop Kawaii Bass" or "Dark Ambient Vaporwave" if you ask nicely.
Suno also recognizes or can interpret an absolutely astounding number of genres
and moods. I've yet to find anything it can't at least have a recognizable stab
at, although the results can be variable in the deep woods of microgenre.
One thing I discovered early on is that it almost always follows clear
instructions on vocal gender. If you ask for "Female Vocals", that's
what you get.
Trying to get a specific kind of female vocal is a bit harder. I never
really wanted to be the lead singer in any of the bands I was in (Makes it
sound like there were dozens of them - there were four, in fact.) even though
that's what I always ended up being. I always wanted us to have a female
singer but we only ever managed to persuade our girlfriends to sing
back-up.
All the same, I've always imagined most of the songs I've written being sung
with a female voice, so that's what I hear in my head. I know what kind of
voice it is, too, but getting Suno to sing it the way I want to hear it
reminds me altogether too much of how it used to be, when I was on the
receiving end of "Can you sing it a bit more like this...?" at pretty
much every rehearsal.
I tried using just one adjective and I tried using several. I tried whole
descriptive phrases. I was never sure which worked better. A lot of the voices
were almost right but just not quite. In the end, I ended up using
"world-weary", sometimes in combination with "innocent" or
"naive", which generally seemed to give me something close to the
personality I was after.
Of course, once I'd got it on one song, I had to try and get it again and
again on the rest. There is a function called "Persona" that allows you
to save a particular voice and/or style and re-use it for other songs, lending
them all a consistency that makes them seem like they were recorded by the
same singer or band. That was exactly what I needed, but frustratingly, it
can't be used for covers of songs that have been uploaded.
The reason for this would seem to be the makers of Suno having an
understandable desire not to get sued out of existence by the megacorps. I
mean, they already are but why give them the ammunition? There are a number of
safeguards built into the system to try and prevent that happening.
I never had any intention of making the covers of my songs publicly available
on Suno, which is just as well because it turns out you can't. Even though you
have to tick a box to say you own all the rights before you can upload
anything, all uploads and covers thereof are automatically blocked within the
app from being lited as "Public". No uploading someone else's song then
trying to pass it off as an original after you've had Suno "cover" it.
"it's never going to be exactly what you originally had in mind."
With similar caution, the Persona function cannot be used on uploaded content or
covers made from it. If it could, presumably you'd be able to upload your
favorite singer, clone their voice and then apply it to anything. If you want to
do that, best talk to Grimes. She's up for it.
This is one of the limitations of making covers of your own songs using Suno.
There are several more, probably the most awkward of which is that uploads can
be no more than two minutes long.
I didn't notice at first because the songs I started with were less than two
minutes long to begin with. They were the ones I wrote specifically in the
late 'eighties, when I was trying to get a C86/Twee band together.
Unsuccessfully, as it turned out.
I uploaded them and Suno happily gobbled the lot. It was only when I got
around to some of the longer songs from an earlier phase that I realized it
was cutting them off before they'd finished.
In most cases, this is less of a problem than you might imagine, at least if
you're dealing with traditional pop or rock song structure. You can select
which two minutes of the track to upload and Suno easily recognizes verses and
choruses. (You can specify them but I haven't found it necessary.)
The AI picks up melody, phrasing and even intonation almost perfectly and it
can extrapolate from what it knows, so if your three minute song has a
traditional verse-chorus structure and just carries on the same to the end, it
doesn't matter to Suno that it only has two-thirds of the song to go on. It'll
get the rest right anyway.
The problems start when there's a change of some kind outside the part
Suno knows - a middle-eight or a coda or a solo or something. Structurally,
Suno works from the lyrics, plus any additional instructions you type in in
square brackets, such as [Middle Eight] or [Guitar Solo]. When it hits
something that doesn't match the pre-existing pattern, the AI will have a
jolly good go at coming up with something appropriate but it's never going to
be exactly what you originally had in mind, which means that cover is
going to have a little more AI influence than perhaps you wanted.
"poorly-recorded caterwauling converted from forty-year old
audio-tape"
There are ways around this, although it took me a while to work them out. You
can, for example, cut and shut parts of your original recording so it comes in
under two minutes but has all the separate elements of the structure and
melody. Suno will then apply the right melody, rhythm or phrasing to the
lyrics that match what it knows. It's very clever like that.
And that brings us to the lyrics, where there's another problem to look out
for. When you upload your song to Suno, it does its damnedest to figure out
what you're on about and write it all down as accurately as possible. If you
articulate clearly and don't use a lot of made-up words, it does a more than
fair job of it, too. There were a few uploads where all I had to do was make a
handful of minor corrections.
On the other hand, if what you're feeding it is mostly poorly-recorded
caterwauling converted from forty-year old audio-tape, Suno takes a wild stab at
the few bits it can just about make out and gives up on the rest. For most of
the recordings, I had to get out my old lever-arch file with the hand-written
lyrics and type the whole lot in.
Which I really enjoyed. It was a lot of fun going back over those old lyrics
and figuring out what I'd been trying to say. A bit like reading an old diary,
which I guess it what happens if you tend to write from experience.
There were several distinct phases, the best, at least from my perspective
four decades on, being the time when I seem to have been mildly depressed and
obsessed with weather and the turning of the seasons. Or, put it another way,
when every bloody song I wrote was about the rain, one way or another.
And that, I fear, is about as far as we're going to get today. This does go
on, doesn't it? And there's quite a lot more to come. In Part Four I promise
I'll get to some of the actual songs although I'm not promising there'll be
examples you can listen to. I'm in the process of making videos for all of
them. I've done five so far. I have about twenty more to do.
But that's another post altogether.
Notes on AI used in this post.
The header picture and two spot illos. All done at NightCafe using Flux Schnell with default settings, other than Aspect Ratio (Changed to 4:3) and Runtime (Changed to Medium.)
All prompts were taken directly from the text and are shown in the captions except for the header image, which was generated using the quote in the post title plus the next part of the quote "at pretty much every rehearsal", and the color illustration, which also had the instruction "Band rehearsing" appended to be sure of getting something suitable.
I also added a style instruction, "Line art", after all prompts to be sure of getting drawings rather than photographs.