By Senne Batsleer, Dorian Van den Heede
Recently, Dataroots received an intriguing request from a journalist at De Tijd, a prominent Flemish newspaper. She was working on an AI-focused podcast series called “De Aionauten”, where the audio team of De Tijd explores the current state of AI in various domains.
As a challenge for their podcast, she approached Dataroots with a specific task in mind: creating a brand new hit song for the acclaimed Belgian artist, Niels Destadsbader. To tackle this ambitious task, De Tijd had already reached out to his producer, Miguel Wiels, seeking the data required for this challenge. With a limited time of 4 working days and armed with only 11 songs, including lyrics, lead sheets, and vocals-only music, generously provided by Miguel himself, we embarked on our creative journey. The challenge ahead was daunting, but we (Dorian Van den Heede, Mateusz Marciniewicz, Senne Batsleer and Virginie Marelli) eagerly embraced the opportunity, determined to craft a remarkable new hit within the constraints of time and resources.
Recognizing the complexity of the task and the need for experimentation, we approached the project from multiple angles, utilizing various AI models and techniques to achieve an authentic Niels Destadsbader-style hit song. In the sections below, we'll discuss the three different approaches we took and explain how we produced a song based on a combination of these techniques.
Following our song preparation work, we were lucky enough to sit together with Miguel himself to showcase our work. He shared his feedback on the music we generated and we even got the opportunity to collaborate on an AI-song. You’ll find his comments and feedback throughout this blogpost.
Edit: On 08/06/2023 Facebook Research released a new model MusicGen. Although too late to make the cut for the podcast, we have played around with MusicGen and used it to enrich our generated samples (see MusicGen section at the end).
Approach 1: Replicating Niels Destadsbader's Style with Jukebox
We first turned our attention to Jukebox, a model developed by OpenAI, since we had some previous experience with it from participating in the AI Song Contest, an international competition for AI-generated music (blogpost series: 1, 2, 3, 4). Trained on a vast dataset of 1.2 million songs, including metadata such as genre, artist, and keywords, Jukebox offers the capability to generate music samples in a specific genre or emulate the style of a particular artist.
Since Niels Destadsbader's songs were not part of Jukebox's training corpus, replicating his style was not possible out of the box. Therefore we used the songs we received from Miguel to fine-tune Jukebox to Niels' style.
Although we didn't use Jukebox in our final songwriting pipeline, we presented the generated samples to Miguel for evaluation and feedback. Here are his most notable remarks:
- (+) Some of the samples exhibit unquestionably novel attributes, indicating originality that was not derived from existing songs.
- (-) Other samples draw clear inspiration from the songs used during the model's finetuning process or contain extended periods of silence.
- (-) The samples are missing a clear structure or recurring musical components such as choruses. This limitation was also mentioned by OpenAI researchers upon the release of Jukebox. According to Miguel, this aspect is crucial for creating recognizable and successful songs.
- (-) The vocals are often faint and unintelligible. In part, this can be explained because the model was trained primarily on English-language songs. Generating comprehensible Dutch vocals is therefore a bit much to ask.
The interested reader can have a look at the Jukebox repository to find out more about the model. Please note that the model is no longer actively maintained, which has an impact on its user-friendliness.
Approach 2: Harnessing the Power of Generative AI
To explore generative AI's potential, Dataroots employed GPT-4, a cutting-edge language model. It played a central role in our songwriting process, enabling us to generate lyrics, chords, and melodies in the style of Niels Destadsbader.
Our approach began by incorporating all available lyrics into the prompt provided to GPT-4. This allowed the model to familiarize itself with Niels' typical lyrical style. Additionally, we included specific details in the prompt, such as the desired emotions (joyful, sad, melancholic, etc.) or themes (sorrow, love, etc.) we wanted to be reflected in the lyrics.
Building upon the lyrics, we asked GPT-4 to generate new chords inspired by the existing chord progressions present in the leadsheets. This process allowed us to create harmonies that complemented the intended emotional and thematic aspects of the song.
Lastly, GPT-4 was employed to generate a melody that harmoniously blended with the lyrics and chords. By considering the lyrical content and the corresponding chords, GPT-4 produced melodic lines that were cohesive and fitting for the song.
As you can see, GPT-4 played a crucial role in our songwriting process. We'll cover Miguel’s comments on the quality of GPT-4's work later on in this article.
If you're interested in our prompt engineering techniques, please take a look at our colab notebook.
Approach 3: Cloning Niels Destadsbader's Vocal Essence
Next to crafting compelling melodies and lyrics, it was also crucial to capture the essence of Niels Destadsbader's vocal style. Initially, we explored models capable of converting lyrics into singing voices, which are readily available for English vocals but not yet for Dutch vocals. Given that Niels primarily sings in Dutch, we sought an alternative solution. While models that convert text into Dutch speech exist, they fall short in generating high-quality singing performances. We required a more specialized approach to achieve the desired vocal emulation.
Therefore we delved into the concept of singing voice conversion, where an existing vocal fragment serves as input, and the singing voice is transformed to resemble Niels' voice. To accomplish this, we leveraged an open-source model and fine-tuned it using Niels' vocals-only music.
One notable challenge we encountered was that the quality of the input greatly influenced the output. As relatively unskilled singers, our own recorded vocals, when subjected to vocal conversion, did not accurately replicate Niels' pitch and tone.
Replicating Niels' ability to hit high notes and deliver powerful vocal performances proved to be another hurdle. We experimented with different loss functions to enhance this aspect, yet it remains a difficulty. We believe that more extensive and representative training data would significantly enhance the results in this regard.
While the current rendition may still exhibit some robotic qualities, it undeniably carries the distinct essence of Niels Destadsbader's voice, much to the surprise of Miguel.
Co-composing with Miguel Wiels
With a limited time of only 30 minutes, our collaboration with Miguel focused on composing the chorus of a new song.
As explained before, we used GPT-4 to generate lyrics for the chorus. However, Miguel found the initial lyrics to be lackluster and clichéd. Taking his feedback into account, we iterated and refined the lyrics through prompt engineering, eventually arriving at lyrics of decent quality.
Next, we generated chord progressions for the chorus. Also for this step, we incorporated feedback from Miguel to reach the desired quality.
Crafting a melody that aligns with the lyrics and chord progressions posed a non-trivial challenge. The melody, while of decent quality on its own, did not consider the natural intonation of specific words, as again, GPT is trained mostly on English data. This is one of the major drawbacks of our approach and deserves more attention in future research.
To bring the chorus to life, we needed a vocalist. As many readers may know, Miguel is an accomplished singer, as demonstrated by his recent appearance as 'Champignon' in The Masked Singer. He graciously lent his voice to sing the chorus. Given the difficulty in aligning lyrics and melody, we allowed him the freedom to deviate from the prescribed notes for a more natural vocal delivery.
Finally, we applied voice conversion techniques to transform Miguel's vocals into the distinctive voice of Niels. Curious about the result? Then stay tuned to De Tijd's podcast 'De Aionauten' through one of the following links:
The fourth episode, released on June 16th, will feature some of our samples!
EDIT: Full instrumentals with MusicGen
With the top guns investing a lot of research effort in revolutionary AI models and the upcoming trend of open sourcing to stimulate the speed of research, it must be no surprise our approach could sensibly be improved in the 1 month time window between our project for De Aionauten and its release.
Facebook Research has released MusicGen, a language model for melody generation. Next to the possibility to generate melodies and short musical pieces out of the box with simple prompts like 'Danceable pop song with energetic drums', it provides a very cool feature to condition your generations towards a target melody.
We use this approach to enrich our simple melodic and harmonic ideas generated with GPT-4 to full instrumentations. The result is instrumental, after which we (manually) aligned the voice cloned vocals of Niels to the MusicGen output. You'll have better results if you sing after receiving the MusicGen output.
While the collaboration between Dataroots, Miguel Wiels, and AI models showcased the potential of AI in music production, Miguel expressed confidence in the continued relevance of human producers. He recognized AI's value as an inspiration tool for writing lyrics and melodies but felt that it was not yet capable of independently creating complete songs. We never got to hear his reaction after enriching our audio samples with MusicGen which is a giant step towards more realistic sounding musical samples and ideas.
As the boundaries of AI-generated music continue to be pushed, collaborations like this highlight the evolving relationship between AI and human creativity, setting the stage for exciting possibilities in the future.