Song’s like ‘A hard day’s night‘ and ‘In my life‘ by The Beatles have been a point of controversy because the fans have always been curious into who wrote the lyrics exactly. The credits for the songs have been given to “Lennon-McCartney Partnership” but there has always been an aura of suspense around it.
Now, researchers from Harvard University have trained a machine learning algorithm on hundreds of hits by the British band to build a “musical fingerprint” for each songwriter.
“As Lennon-McCartney songs of known and unknown authorship written and recorded over the period 1962-66, we extracted musical features from each song or song portion,” said the researchers.
Researchers Mark Glickman, Jason Brown, and Ryan Song extracted features like:
- Occurrence of melodic notes
- Melodic note pairs
- Chord change pairs
- Four-note melody contours
They then developed a prediction model based on variable screening followed by logistic regression with elastic net regularisation.
“Out-of-sample classification accuracy for songs with known authorship was 76%, with a c-statistic from an ROC analysis of 83.7%. We applied our model to the prediction of songs and song portions with unknown or disputed authorship,” they explained.
“Our approach to musical authorship attribution is most closely related to methods applied to genome expression studies and other areas in which the number of predictors is considerably larger than the sample size. In a musical context, we reduce each song to a vector of binary variables indicating the occurrences of specified local musical features,” said the researchers.
The researchers developed their modelling approach as a two-step algorithm. First, they kept only musical features that had a sufficiently strong bivariate association with authorship, an application of sure independence screening. With the features that remained, they then modelled the authorship attribution as a logistic regression, but estimated the model parameters using elastic net regularisation, an approach that flexibly constrains the average log-likelihood by a convex combination of a ridge penalty and a lasso penalty.