The sudden explosion of OpenAI’s ChatGPT and other large language models has led to renewed attention on artificial intelligence that’s capable of mimicking speech, especially for well-known figures. It has left some spectators fearful of a rise in deepfake-fueled misinformation, copyright chaos, or possibly even the destruction of reality itself. It will almost certainly lead to some pretty cool art too.
Deepfakes and art, specifically music, have a years-long symbiotic history. Musicians, often some of the earliest adopters of all bleeding-edge tech, have been thinking with algorithms to make music and generate strange, and at times unsettling, videos for years. When used creatively, deepfake images can help musicians like Kendrick Lamar buttress their stories by inserting jarring, alternative perspectives from different faces. In other cases, more advanced AI models can ingest the corpus of a musician’s library and spit out an entirely new creation. The evolution of digital music tools means those constructions can come to life in what sounds like a real song. Some creators have used this technique to essentially bring deceased musicians like Nirvana’s Kurt Cobain back from the grave.
The gradual iteration and improvement of deepfakes in music isn’t just about the art either. Over time, these creations display the growing power and accessibility of AI tools. In just a span of two or three years, deepfake music videos have improved dramatically. That’s thanks in part to the breakneck speed of improvements in AI. Engineers and creators are just now starting to scratch the surface of what’s possible when combining music with more advanced generative AI tools. The slides that follow serve as a brief history of the novel, weird, and often flat-out goofy examples of deepfakes in music up to this point.
Our first entry may not exactly meet a textbook definition of deepfakes, but it nonetheless helps lay the conceptual groundwork for many other examples that would follow it. The song is attributed to Lil Miquela, a first-of-its-kind digitally constructed “influencer” who became a viral sensation in the late 2010s after duping untold numbers of Instagram scrolled into believing it was a real person.
Out of context, Lil Miquela, designed to resemble a half Brazilian half Spanish 19-year-old girl, looks pretty clearly like a CPU wandering around Cyberpunk 2077. However, when smothered in expensive-looking Supreme and Prada clothes and touched up with some filters, the avatar managed to expertly exit in the uncanny valley of the already not-quite real-looking social media patina. Lil Miquela ushered in a wave of other virtual influences and made its developers, a company called Brud, a boatload of money in the process.
In 2017, Lil Miquela’s creators took that trickery even further by releasing a lyrics video for an original song called “Not Mine.” The song itself is a 2:42 R&B track with “Miquela’s” pitch-correct vocals layered overtop. It’s unclear whether or not the vocals themselves are computer generated or simply sung by a human but their heavy electronic feel leaves the listener guessing.
Though artists kept tinkering with various forms of seemingly digitally altered videos over the next few years, the real wave of deepfake music videos really started to take off around 2020 with this video made for rapper Lil Uzi Vert’s track Wassup.
Produced by Bill Posters, who previously created this deepfake video of Mark Zuckerberg roasting himself, Wassup’s videos is made up of a collection of “Boom” video calls with a variety of celebrities. Boom, is a spin on Zoom and the entire video pokes at the surreal reality of pandemic-era video communication. The opening frames of the video show a number of rappers and politicians, including Obama, Trump, and North Korean leader Kim Jong-Un all joining the remote call. Drake, Ye West, Rhianna, and Young Thug also make appearances.
Though crude compared to more polished deep fake videos in 2023, Wassup was able to make animated videos based on still celebrity headshots to make it seem like they are lip-synching the lyrics of the song. That breakthrough helped pave the way for future videos that would refine that process to create a video based solely on an individual’s headshot.
In early 2020, a creative agency called Space150 created an AI music bott modeled after rapped Travis Scott named, fittingly Travis Bott. The Travis Bott was trained using all of Scott’s actual lyrics and instructed to spit out a song, complete with lyrics and a melody. Space150 says it extracted the melodies, chords and beat from Scott’s actual tracks which were then used to create a series of midi files that were the building blocks of Travis Bott’s work The resulting product, is a roughly three-minute track called “Jack Park Canny Dope Man.”
“The most surprising aspect of the project is that song is good,” Space150 executive director Ned Lampert said in an interview with Muse. “It sounds like a legitimate Travis Scott song—it has emotion, it has lyrical hooks and pop sensibilities.”
To bring the song to life, the agency employed a human rapper whose voice was altered to sound closer to Scott’s. The accompanying music video features real human actors, though Travis Bott’s face is at times swapped out with Scott’s using a deepfake technology sitting right at the border between cool and creepy.
“Using the music library of Travis Scott, we influenced and trained our machine to produce original beats and lyrics,” Space150 said in a statement. “Our team taught the machine both grammar and rhyming cadence, building custom workflows to match rhyming couplets and number of syllables in each line.”
Grunge legend and 90s cultural icon Kurt Cobain may have died in 1994 but AI models honoring his legacy attempted to bring him back to life in 2021 via the generated song, Drowned in the Sun. Created by the advocacy group Over the Bridge, the AI-generated track was trained on a mash-up of around 30 different Nirvana songs run through Google’s Magenta software. Once analyzed, the AI models spit out the song’s melody, guitar riffs, and drums. The developers reportedly used a separate neural network to create the song’s lyrics.
Remarkably, the song actually sounds pretty close to Nirvana, odd dark, brooding imagery and all. Every element of the track is computer-generated except for the raw vocals. Those were laid down by Eric Hogan, a singer in a Nirvana tribute band. Sean O’Connor, a member of Over the Birdge’s board of directors, told Rolling Stone the Nirvana-inspired song was particularly difficult to create because the band’s mix of heavy distortion, fuzz, chorus, and other effects led to a crowded “wall of sound.”
Back on the video side of the spectrum, deep-faked music videos may have possibly reached their creative zenith last year with this video accompanying Kendrick Lamar’s “The Heart Part 5.” The track begins with a quote reading “I am. All of us.”
Viewers see Lamar looking towards the camera against a burgundy backdrop saying “As I get older, I realize life is perspective.” Then, in a twist, when the song picks up Lamar’s face suddenly transforms into O.J. Simpson all while still singing the song. Over the course of the next few minutes, Lamar’s face transforms again to Ye West, Jussie Smollet, and Will Smith.
The key factor that makes Lamar’s video stand out is its subtlety. Unlike past deep fake videos that crudely transplanted images onto faces in a janky, slapstick rush, Lamar’s transformations are at times hardly even noticeable. One viewer watching the video with this writer didn’t even know deepfakes were involved on the first watch.
Lamar’s video was made in collaboration with Deep Voodoo, a creative deepfake studio run by South Park creators Trey Parker and Matt Stone. Deep Voodoo gained fame for their 2020 deepfake of then-president Donald Trump pathetically reading a rendition of Rudolf the Red Nose Reindeer. Now, it’s working with artists of all stripes to create “original synthetic media projects.”
Before he was getting canceled for spewing anti-Semitic slurs through a gimp mask on InfoWars, Ye West was busy dipping his toes in video deepfakes. In “Life of the Party” West, who went by Kanye at the time, uses deepfake tech to animate a variety of photos from his childhood.
The end result is a somewhat crude, but nonetheless interesting collage of young West photos with their lips and facial expressions synched up to the song. “Life of the Party” was released on the eve of Mother’s Day and takes the concept of nostalgia, an already common thread in popular music, to its logical conclusion. Given his current state of affairs, West probably wishes he could actually travel back in time.
New advancements in large text language models and AI’s capable of impersonating voices means, with enough tweaking, an intrepid artist or misinformation pedaler can use new AI tech to make a celebrity say just about anything they want. That might not be great for society long term, but it can lead to some cool music. French DJ David Guetta proved that point earlier this year at a rave where a voice sounding eerily similar to rapper Eminem came over the speakers saying, “This is the future rave sound. I’m getting awesome and underground.”
In a Twitter post, Guetta said the clip, intended as a joke, was made by combining a text generated from one model in the style of Eminem with another AI capable of generating mimicry audio.
“There’s something that I made as a joke and it worked so good!” Guetta said on a clip posted on Twitter. “I could not believe it.”
If there’s one thing aging nu-metal rap-rockers Limp Bizkit are not, it’s subtle. Naturally, that applies to their recent foray into deep fake videos as well. Last month, the band responsible “Nookie,” “Break Stuff,” and the popularization of Dickies shorts released a video for their song song “Out of Style,” where each of the band members’ faces are replaced with a powerful world leader. Vladimir Putin, Xi Jinping, Kin Jung Un, Joe Biden, and Volodymyr Zelensky are, and a hot-dog-wielding Tom Cruise all featured in the video throwing down and getting their groove on. Biden, if you were wondering, is shredding on lead guitar.
The 3:24 track is both an acutely self-aware dig on aging and maintaining relevance and commentary during a time of geopolitical anxiety. It’s also so stupid it’s funny. Since the deepfake only applies to the faces, the bodies still bear the artists’ tattoo sleeves and early 2000’s apparel. In one of the more memorable parts of the video, Biden, Xi, Kim, and Zelensky huddle beside Puting, rooting him on as he struggles to chug a beer through a siphon. Putin succeeds.