MachineHack recently concluded the “Predict The Data Scientists Salary In India Hackathon” and announced prizes in the form of exclusive passes to the Machine Learning Developers Summit (MLDS), India’s largest summit for machine learning developers. Analytics India Magazine talked to the winners of the hackathon to get insights into their journey in data science and their experience competing in this classification hackathon of MachineHack.
The first rank went to Saurabh Kumar, who has already won multiple hackathons on MachineHack. Saurabh first got interested in data science way back in 2014 when he heard some Machine learning algorithm named Random Forrest is performing really good in classification tasks as compared to traditional classifiers. He started exploring and was overwhelmed by the amount of information available online and variety of real-world problem we can solve using such Machine learning algorithms. Since then he kept my curiosity and consistency in learning about this field. For this problem in predicting data science salaries, Saurabh used loads of feature engineering on text features, like count of words, no. of punctuation/stop words, SVDs, NB, Word vectors etc. Then he created 5 fold validation strategy along with Xgboost model for prediction. He was getting a decent score on LB so he did not try deep learning methods.
Talking about his experience at MachineHack, Saurabh said “This is my third Win in MachineHack. And all three were different problems. I first heard about Machine Hack when they organised Beer Hackathon and since then I have been participating in all their hackathons. My experience on this platform is great as they are continuously evolving in enriching user’s experience. Also, moderators are really helpful and prompt in answering participant’s queries.”
The second rank went to Pravin Mhaske and is a Mechanical Engineer by qualification, Scrum master/Project Manager by profession and Statistician by passion. He has been working with Infosys for 15 years now. He started with the BIG data buzz in 2016 but eventually landed in Data Science and doing a variety of things here since 2017. Initially, he trained in Statistics, Probability, Linear Algebra from sources like Udacity, edX, Khan Academy, etc. followed by courses in Machine Learning and Data Science from Udemy and Coursera. After getting inspiration from MachineHack he completed certifications in Python and Data Analytics from IIT Madras and is now currently pursuing MS in Analytics. The approach taken by Pravin was the following:
- DATA: Is of utmost importance while solving any Data Science/Machine Learning problem so spent most of the time on understanding the data, cleaning, preprocessing it and feature engineering.
- ALGORITHMS: Started with SVC and Logistic Regression and then my favourite XGBoost, but my machine couldn’t sustain that load. Finally, LightGBM (executed on Google Colab’s GPU) came to rescue. Blazing fast!
- ENSEMBLING: The final solution was an Ensemble of 3 LightGBM models along with 1 Logistic Regression model.
Talking about MachineHack, Pravin said, “MachineHack is a great stress buster! I’m with you guys right from your first hackathon and it has become a playground for me. Thinking to change my homepage from Google to MachineHack.com now. Your team is doing a wonderful job in providing this platform to DS/ML practitioners (and giving attractive prizes). Tickets to MLDS was indeed a great bait!!”
The third rank was taken by Chetan Ambi and is currently working as Technology Lead at Infosys Ltd (Mysore) from about 5 years and has a total nine years of experience in IT Industry.
Chetan got attracted to Machine Learning about a year back when he accidentally watched Andrew Ng’s Stanford Machine Learning lecture video on YouTube. Andrew does a great job of explaining complex things in a lucid manner. He really gained a lot of knowledge from his Machine Learning and Deep Learning courses on Coursera. Also, Udemy courses from Kirill Eremenko and Jose Portilla helped me gain a good understanding of Machine Learning. He is an avid reader of Analytics India Magazine, Machinelearningmastery, Pyimagesearch etc.
He started his problem-solving approach with separate TFIDF vectoriser for each text feature and then combining them with numerical feature (i.e. experience). Later, he started experimenting with combining 2 or more text features (columns) into one which started giving good accuracy. He has tried all classification algorithms starting from logistic regression and finally settled with LightGBM. He has also spent enough time tuning TFIDF vectorisers and lightGBM to get my best score on Leaderboard.
Talking about MachineHack, Chetan says, “MachineHack is a really wonderful platform for everyone from beginners to experts to showcase their data science skills. I am really enjoying solving industry curated problems on MachineHack. Author Identification hackathon was first ever ML hackathon I have attended and being a beginner, I was able to secure 3rd position. I am expecting more challenging problems in the future from MachineHack.”