In this article, we present the winners of Analytics India Magazine’s MachineHack’s recently concluded hackathon “Predict A Doctor’s Consultation Fee Hackathon”. All the three winners received tickets for the The Rising 2019 as a prize. We talked to the winners of the hackathon to know about their experience and how they solved this problem of hackathon. The three winners are Utkarsh Phirke, Sujit Horakeri and Chetan Ambi.
Utkarsh Phirke has rank 1 on the leaderboard
Phirke studied Electrical Engineering from IIT Bombay and is currently a business analyst at Capital One DataLabs. At CapitalOne DataLabs, he got plenty of opportunities to work on a variety of analytical problems and build familiarity with the modelling exercise. He says, “Focusing less on accuracy and more on questions like “Does deduping of data create a bias in the population?” helped me understand the nuances of modelling.”
Approach to solve the problem
Phirke discussed the hackathon with his dad and they figured that location is the most important predictor followed by the specialization degree. Since both these variables are categorical with more than 100 levels, he spent the bulk of his time in feature creation and created indicator variables for important locations and degrees. He tried Random Forests, XGBoost, Linear regression and KNN, and the performance, he says, was in the same order. According to Phirke, two things that made him perform well in the hackathon were– a) he used the logarithm of fees as the target variable since Logarithmic Error (RMSLE) was the scoring metric. Second, he augmented the dataset using Google Maps geocode API to get longitude/latitude and created location clusters of 5kms radius.
Experience on MachineHack
Phirke discovered MachineHack when he had attended AIM’s Cypher conference in Bangalore. He was excited because the problems of MachineHack hackathons had an Indian context. MachineHack, he says, gave him a seamless experience and he could solve the hackathon problem without encountering technical glitches on the website. He thinks that MachineHack is a great platform for beginners and experts as well to take a crack at solving data science problems.
Sujit Horakeri ranked 2 on the leaderboard
Currently working as a data scientist at Landmark Group, Horakeri is an Electronics & Communication graduate who started his career as a Java Developer at Mercedes-Benz, during which he worked on topics involving data analytics. This was when he put a serious thought to move in a complete data science oriented stream. He participated in various hackathons because of which he also got job offers. Horakeri also said that he has a very supportive supervisors at MBRDI who helped him pursue his interest.
Approach to solving the problem
Firstly, Horakeri converted the variable called ‘Rating and Experience’ to categorial based on their distribution. He extracted the city and the area from the ‘Place’ variable and then cleansed ‘Miscellaneous_Info’ variable. Some of the rows in miscellaneous_info had the consultation fee mentioned. There was an interesting pattern which said, if consultation fee in Miscellaneous_Info is greater than 999 then the actual consultation fee was 100 and his, according to him, helped to increase his score on the leaderboard. He further used CountVectorizer to vectorize the text variables. He said that GridsearchCV helped him find the best parameters for LightGBM. Finally, he did some post-processing of the output variable to ceiling/floor to the nearest 50’s value.
Experience on MachineHack:
Horakeri said that his experience of MachineHack was good and also gave some suggestions regarding the leaderboard updation and the submission file selection.
Chetan Ambi ranked 3 on the leaderboard
Ambi is currently working as Technology Lead at Infosys Ltd (Mysore) from about 5 years and have a total of 9 years of experience in the IT Industry. He was intrigued by machine learning about a year ago when he watched Andrew Ng’s Stanford Machine Learning lecture video on YouTube. He learned from various ML and deep learning courses on Coursera. Also, Udemy courses from Kirill Eremenko, Jose Portilla and LazyProgrammer helped him gain a good understanding of machine learning. Chetan says that he is an avid reader of Analytics India Magazine, machinelearningmastery, pyimagesearch and most importantly Kaggle.
Approach to solve the problem:
Ambi said that his approach to solve this particular problem of hackathon was similar to Pravin Mhaske’s approach in the “Predict The Data Scientists Salary In India Hackathon”.
As among the 5 rows of data features, only one of them, the Fees column has numeric values, whereas the rest have text values. So he converted the text features into numeric before feeding to any machine learning algorithms. According to Chetan, data cleaning and feature engineering played an important role.
After understanding the data, he spent time in data-cleaning, pre-processing and feature engineering. As the data was messy, according to Chetan, feature engineering played a very crucial role here. He said, “Only with good feature engineering and data cleaning one can get a score above 75 (RMSLE).”
He used the XGBoost algorithm which gave him a good CV and LB score. After trying other regression algorithms, he finally selected four models for his next step of ‘Ensemble’. His final solution is an ensemble of LightGBM, XGBoost, Gradient Boosting and Random Forest.
Experience on MachineHack
Ambi shared that MachineHack is a wonderful platform for everyone from beginners to experts to showcase their data science skills. He enjoys solving industry-curated problems on the platform and has previously secured rank 3 on the leaderboard of the hackathon Author Identification and Predict Data Scientist Salary hackathons.
Talking about his past hackathon winner prize — which was the Machine Learning Developers Summit (MLDS) 2019, he said,”Tickets to MLDS’2019 for winning Data Scientist Salary hackathon was really awesome. It was really a very good experience attending MLDS and I would love to attend MLDS going forward. Three cheers to Analytics India Magazine for organizing such a beautiful event. I am expecting similar experience from ‘The Rising’ event”.
MachineHack hosts plenty of interesting hackathons and also has interesting prices for the winners. It has recently launched a new hackathon called Predict The Flight Ticket Price Hackathon to solve the unpredictability of flight ticket prices.