‘It’s Christmas time! It’s hot tub time!” sings Frank Sinatra. At least, it sounds like him. With an easy swing, cheery bonhomie, and understated brass and string flourishes, this could just about pass as some long lost Sinatra demo. Even the voice – that rich tone once described as “all legato and regrets” – is eerily familiar, even if it does lurch between keys and, at times, sounds as if it was recorded at the bottom of a swimming pool.
The song in question not a genuine track, but a convincing fake created by “research and deployment company” OpenAI, whose Jukebox project uses artificial intelligence to generate music, complete with lyrics, in a variety of genres and artist styles. Along with Sinatra, they’ve done what are known as “deepfakes” of Katy Perry, Elvis, Simon and Garfunkel, 2Pac, Céline Dion and more. Having trained the model using 1.2m songs scraped from the web, complete with the corresponding lyrics and metadata, it can output raw audio several minutes long based on whatever you feed it. Input, say, Queen or Dolly Parton or Mozart, and you’ll get an approximation out the other end.
“As a piece of engineering, it’s really impressive,” says Dr Matthew Yee-King, an electronic musician, researcher and academic at Goldsmiths. (OpenAI declined to be interviewed.) “They break down an audio signal into a set of lexemes of music – a dictionary if you like – at three different layers of time, giving you a set of core fragments that is sufficient to reconstruct the music that was fed in. The algorithm can then rearrange these fragments, based on the stimulus you input. So, give it some Ella Fitzgerald for example, and it will find and piece together the relevant bits of the ‘dictionary’ to create something in her musical space.”
Admirable as the technical achievement is, there’s something horrifying about some of the samples, particularly those of artists who have long since died – sad ghosts lost in the machine, mumbling banal cliches. “The screams of the damned” reads one comment below that Sinatra sample; “SOUNDS FUCKING DEMONIC” reads another. We’re down in the Uncanny Valley.
Deepfake music is set to have wide-ranging ramifications for the music industry as more companies apply algorithms to music. Google’s Magenta Project – billed as “exploring machine learning as a tool in the creative process” – has developed several open source APIs that allow composition using entirely new, machine-generated sounds, or human-AI co-creations. Numerous startups, such as Amper Music, produce custom, AI-generated music for media content, complete with global copyright. Even Spotify is dabbling; its AI research group is led by François Pachet, former head of Sony Music’s computer science lab.
It’s not hard to foresee, though, how such deepfakes could lead to ethical and intellectual property issues. If you didn’t want to pay the market rate for using an established artist’s music in a film, TV show or commercial, you could create your own imitation. Streaming services could, meanwhile, pad out genre playlists with similar sounding AI artists who don’t earn royalties, thereby increasing profits. Ultimately, will streaming services, radio stations and others increasingly avoid paying humans for music?
Legal departments in the music industry are following developments closely. Earlier this year, Roc Nation filed DMCA takedown requests against an anonymous YouTube user for using AI to mimic Jay-Z’s voice and cadence to rap Shakespeare and Billy Joel. (Both are incredibly realistic.) “This content unlawfully uses an AI to impersonate our client’s voice,” said the filing. And while the videos were eventually reinstated “pending more information from the claimant”, the case – the first of its kind – rumbles on.
Roc Nation declined to comment on the legal implications of AI impersonation, as did several other major labels contacted by the Guardian: “As a public company, we have to exercise caution when discussing future facing topics,” said one anonymously. Even UK industry body the BPI refused to go on the record with regard to how the industry will deal with this brave new world and what steps might be taken to protect artists and the integrity of their work. The IFPI, an international music trade body, did not respond to emails.
Perhaps the reason is, in the UK at least, there’s a worry that there’s not actually a basis for legal protection. “With music there are two separate copyrights,” says Rupert Skellett, head of legal for Beggars Group, which encompasses indie labels 4AD, XL, Rough Trade and more. “One in the music notation and the lyrics – ie the song – and a separate one in the sound recording, which is what labels are concerned with. And if someone hasn’t used the actual recording” – if they’ve created a simulacrum using AI – “you’d have no legal action against them in terms of copyright with regards to the sound recording.”
There’d be a potential cause of action with regards to “passing off” the recording, but, says Skellett, the burden of proof is onerous, and such action would be more likely to succeed in the US, where legal protections exist against impersonating famous people for commercial purposes, and where plagiarism cases like Marvin Gaye’s estate taking on Blurred Lines have succeeded. UK law has no such provisions or precedents, so even the commercial exploitation of deepfakes, if the creator was explicit about their nature, might not be actionable. “It would depend on the facts of each case,” Skellett says.
Some, however, are excited by the creative possibilities. “If you’ve got a statistical model of millions of songs, you can ask the algorithm: what haven’t you seen?” says Yee-King. “You can find that blank space, and then create something new.” Mat Dryhurst, an artist and podcaster who has spent years researching and working with AI and associated technology, says: “The closest analogy we see is to sampling. These models allow a new dimension of that, and represent the difference between sampling a fixed recording of Bowie’s voice and having Bowie sing whatever you like – an extraordinary power and responsibility.”
Deepfakes also pose deeper questions: what makes a particular artist special? Why do we respond to certain styles or types of music, and what happens when that can be created on demand? Yee-King imagines machines able to generate the perfect piece of music for you at any time, based on settings that you select – something already being pioneered by the startup Endel – as well as pop stars using an AI listening model to predict which songs will be popular or what different demographics respond to. “Just feeding people an optimised stream of sound,” he says, “with artists taken out of the loop completely.”
But if we lose all sense of emotional investment in what artists do – and in the human side of creation – we will lose something fundamental to music. “These systems are trained on human expression and will augment it,” says Dryhurst. “But the missing piece of the puzzle is finding ways to compensate people, not replace them.”