Many students do online beginner courses in machine learning and fall into a quandary about deciding what to do next. Rather than doing another similar or slightly advanced course in machine learning, most people look forward to applying the skills they learnt in their first beginner ML course in the form of a project, giving them a better outlet to use the knowledge for practical purposes.
What Does A Beginner Course Teach?
This is taking into account a majority of beginner online machine learning courses. Most of them give a brief idea of the basic algorithms like Support Vector Machines (SVM) and neural networks, of machine learning. They strengthen concepts like matrix operations and linear regression, thoroughly introduce to the basic concepts of supervised and unsupervised learning. An introduction to some assignments using programming languages, generally Matlab, R, Python or Octave also form a part. These include projects like ‘text recognition’, ‘spam classifier’, ‘movie recommender systems’.
Things to Keep In Mind
Noted computer scientist and entrepreneur, Andrew Ng, when asked about what projects could be done after completing his popular machine learning Coursera, he had said that a great way to get ideas for new projects is to spend time studying previous projects. He talked about how the human brain learns when studying different kinds of projects to invent new examples in the category, replicating the old learnt examples. His advice for people to do new, interesting projects was to read previous projects that they liked, to begin to get own ideas for projects.
He also said on his Quora answer to write an Arxiv paper or a blog post or an open-source your code on GitHub once the project is done. This will help to get feedback on the project and also help others in the community to learn from this project. He also suggested spending time talking to people — including experts in areas other than ML, to inspire new projects.
List Of Projects
Here is a list of top 5 project ideas that you can do right after your beginner course in machine learning:
1. Predict The Data Scientists Salary In India: Dataset
The dataset is hosted on MachineHack.com. The dataset is based on salary and job postings in India across the internet. The train and the test data consists of attributes mentioned below. The dataset has a rich amount of information regarding the job posting such as the name of the designation and key skills required for the job. Based on the given attributes and salary information, build a robust machine learning model that predicts the salary range of the salary post.
Get the data here and participate.
2. Iris Flowers Classification
Since beginner courses cover support vector machines, the Iris flower classification is a very popular beginner level project to understand the SVM algorithm. It involves predicting the class of Iris flower in the given Iris dataset. The dataset consists of physical parameters of three species of flower: Versicolor, Setosa and Virginica. The numeric parameters which the dataset contains are Sepal width, Sepal length, Petal width and Petal length. Based on these parameters, the class of flowers has to be predicted. The data consists of continuous numeric values describing the dimensions of the respective features, based on which the model is trained. Here is a guide for this project.
Download the dataset here
3. MovieLens 100K
Since recommendation dataset is also covered in beginner courses, a project to test these skills can be used. Recommendation systems have many applications, from Youtube to Netflix, everyone is using for a better browsing experience. MovieLens 100K is one such example. This dataset was collected over various periods of time, depending on its size. It is a small dataset which can be experimented with simple recommendation algorithms.
Download the dataset here.
4. Turkiye Student Evaluation Dataset
This project is to test your understanding of unsupervised learning. The data was collected by using school reports and questionnaires. This data approach student achievement in secondary education of two Portuguese schools. The dataset includes attributes like student grades, demographic, social and school-related features.
Download the dataset here.
5. BigMart Sales Prediction:
Another project to test your unsupervised learning has the objective to build a predictive model and find out the sales of each product at a particular store. This is a regression problem and data consists of transaction records of a sales store. The model made is used to understand the properties of products and stores which play a key role in increasing sales.