MITB Banner

Beginner’s Guide To Building A Song Recommender In Python

Share

 The number of songs available exceeds the listening capacity of an individual in their lifetime. It is tedious for an individual to sometimes to choose from millions of songs and there is also a good chance missing out on songs which could have been the favourites.

Music service providers like Spotify need an efficient way to manage songs and help their customers to discover music by giving a quality recommendation. For building this recommendation system, they deploy machine learning algorithms to process data from a million sources and present the listener with the most relevant songs.

There are mainly three types of recommendation system: content-based, collaborative and popularity.

The content-based system predicts what a user like based on what that user like in the past. The collaborative based system predicts what a particular user like based on what other similar users like.

The problem with popularity based recommendation system is that the personalisation is not available with this method i.e. even if the behaviour of the user is known, a personalised recommendation cannot be made.

Here we illustrate a naive popularity based approach and a more customised one using Python:

# Importing essential libraries #

import pandas as pd

from sklearn.model_selection

import train_test_split

import numpy as np

import timefrom sklearn.externals

import joblib

import Recommenders as Recommenders

# Download this file into your source code directory#

import Evaluation as Evaluation

#The following lines will download the data directly#

triplets_file = 'https://static.turi.com/datasets/millionsong/10000.txt'

songs_metadata_file = 'https://static.turi.com/datasets/millionsong/song_data.csv'

song_df_1 = pd.read_csv(triplets_file, header=None, sep = "\t")

#in the above line the separator is a TAB hence \t otherwise the file is read as single column#

song_df_1.columns = ['user_id', 'song_id', 'listen_count']

song_df_1.columns = ['user_id', 'song_id', 'listen_count']

print(song_df_1)

#Read song  metadata

song_df_2 =  pd.read_csv(songs_metadata_file)

#Merge the two dataframes

song_df = pd.merge(song_df_1, song_df_2.drop_duplicates(['song_id']), on="song_id", how="left")

song_df.head()

len(song_df)

ong_df = song_df.head(10000)

#CREATING A SUBSET FROM THE DATASET#

#Merge song title and artist_name columns to make a merged column

song_df['song'] = song_df['title'].map(str) + " - " + song_df['artist_name']

song_grouped = song_df.groupby([‘song’]).agg({‘listen_count’: ‘count’}).reset_index()

grouped_sum = song_grouped[‘listen_count’].sum()

song_grouped[‘percentage’]  = song_grouped[‘listen_count’].div(grouped_sum)*100

song_grouped.sort_values([‘listen_count’, ‘song’], ascending = [0,1])

# TRAINING AND TESTING THE DATA#

train_data, test_data = train_test_split(song_df, test_size = 0.20, random_state=0)

print(train_data.head(5))

#CREATING AN INSTANCE BASED ON POPULARITY#

pm = Recommenders.popularity_recommender_py()

pm.create(train_data, ‘user_id’, ‘song’)

#PREDICTING#

user_id = users[5]

pm.recommend(user_id)

#CREATING A CLASS FOR SONG SIMILARITY#

is_model = Recommenders.item_similarity_recommender_py()

is_model.create(train_data, 'user_id', 'song')

#RECOMMENDATION#

user_id = users[9]

user_items = is_model.get_user_items(user_id)

for user_item in user_items:

   print(user_item)

#GET SIMILAR SONGS#

song = ‘Yellow – Coldplay’

is_model.get_similar_items([‘XYZ’])

Here a testing size of 20% is taken arbitrarily pick 20% as the testing size. A popularity based recommender class is used as a blackbox to train the model. We create an instance of popularity based recommender class and feed it with our training data.

train_data, test_data = train_test_split(song_df, test_size = 0.20, random_state=0)

print(train_data.head(5))

pm = Recommenders.popularity_recommender_py()

pm.create(train_data, 'user_id', 'song')

user_id = users[9]

pm.recommend(user_id)

Even if we change the user, the result that we get from the system is the same since it is a popularity based recommendation system.

This is a naive approach and not many insights can be drawn from this. To make a more personalised recommender system, item similarity can be considered.

Item Similarity Based Personalized Recommender

Memory based filtering mainly consists of two main methods:

  • User-item filtering:  Users who are similar to you also liked…”
  • Item-item filtering: users who liked the item you liked also liked…”

Most companies like Netflix use the hybrid approach, which provides a recommendation based on the combination of what content a user like in the past as well as what other similar users like.

#Personalised System Part II

#Creating an instance of item similarity based recommender class

is_model = Recommenders.item_similarity_recommender_py()

is_model.create(train_data, 'user_id', 'song')

#Use the personalized model to make some song recommendations

#Print the songs for the user in training data

user_id = users[9]

user_items = is_model.get_user_items(user_id)

for user_item in user_items:

    print(user_item)

#Recommend songs for the user using personalized model

is_model.recommend(user_id)

is_model.get_similar_items(['Mr Sandman - The Chordettes'])

song = ‘Yellow – Coldplay’

is_model.get_similar_items([song])

In item similarity, the main method is “generate_top_recommendation”. So, what this does is it creates a co-occurrence matrix. This matrix can be thought of as a set of data items containing user preferences.

A snippet of code from the file

 

Here songs are the items. We are calculating weighted average of scores in the co-occurence matrix for all user songs. Then the indices are sort based on their value and the corresponding score.

is_model = Recommenders.item_similarity_recommender_py()

is_model.create(train_data, 'user_id', 'song')

# this prints training data

user_id = users[5]user_items = is_model.get_user_items(user_id)

for user_item in user_items:

print(user_item)

is_model.recommend(user_id)

 

Output:

The output consists of user_id and its corresponding song name.

This article is an attempt to give a beginner, a guide on how to implement simple song recommender and talk in brief on how to execute the source code for simple application so that this can be taken further and experimented with.

Check the full notebook here.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.