The use of AI to disrupt the music industry has been gaining traction of late. Earlier this year, Google demonstrated how to shred notes into lower dimensions and then perform fundamental techniques like batch normalisation and autoregressive factorisation to create new soundtracks from old ones. Though using the word disruption in domains steered by human creativity is still a hyperbole, one can’t help but sense the growing use of AI as an augmentation to yield creativity.
So far, AI-assisted music production has been pet projects of some lone researchers or had been introduced by up and coming start-ups. In a piece of somewhat unanticipated news, Sony Computer Science Laboratory (CSL) in Paris has introduced DrumNet that can autonomously generate kick drum tracks.
History of DrumNet
DrumNet is based on an artificial neural network which learns rhythmic relationships between different instruments and encodes these relationships in a 16-dimensional style space.
The style of a kick drum track is determined by a 16-dimensional vector sampled from independent multivariate Gaussians. This style vector defines the relationship between the kick onsets and onsets of bass, snare, beat and downbeats. The model adjusts the tempo and timing of the output according to the input.
For time series modelling, common dense GAE architecture to 1D convolution in time was adapted, yielding a Convolutional Gated Autoencoder (CGAE).
As depicted in the figure above, ‘x’ represents 1D signals of length T indicating onset functions of instrument tracks and beat- and downbeat information of a song, while y represents the onset function of a target instrument. Then the rhythmic interactions (henceforth referred to as mappings) between ‘x’ and ‘y’ are defined as
m = W∗(U∗x·V∗y)
Weight matrices W, U and V act as placeholders for several convolutional layers.
To generate a kick drum track, researchers sampled only one mapping code m_t (from a 16-dimensional standard Gaussian), repeat it across the time dimension, and reconstruct y given the resulting m, as well as x. Then they performed k-means clustering over all m_t.
The model was trained for 2500 epochs with batch size 100, using 50% dropout on the inputs ‘x’. During training, a data augmentation based regularization method is used to make the mappings invariant to time shift and tempo change.
Regardless of the success of this technology, the researchers at Sony CSL insist that their goal is not to replace musicians but to provide them with better tools to be more efficient in realizing their creative ideas.
Dawn Of AI Leveraged Creativity
The turn of this century witnessed a new form of intelligence. Augmentation of human idea with the computational powers of the machines. These machines, now have become massive data-driven engine. With every innovation in the algorithms, the machines got better.
Companies like Sony CSL are working with musicians and content providers to push the boundaries of creativity and understand the complexity of modern music production processes. By combining cutting-edge A.I. research with strong musical expertise, they believe that they can pave the way for musical experiences yet to imagine.
If not for the excessive use of autotune for music creation, the industry has mostly benefited from the advancements in technology. From the way the instruments were manufactured to the way a sound wave is electronically manipulated, technology has touched many aspects of music. Without digital technology, popular music in the twenty-first century is almost unthinkable.
Innovating through Interdisciplinary research approach such as these, where blending theoretical modelling, data science and machine learning, gaming and participation is mainly – aimed at developing a science of the “new”, focusing on how the “new” emerges in social and technological systems and how humans and machines explore the space of possibilities and find new solutions.
Listen to the beats of AI here