All of us rely on online restaurant reviews before we try a new restaurant. These reviews are written by normal people like us not just help people recognise the good and bad restaurants or dishes but also add a greater value. In fact, after taking help from machine learning, this data can also be used to predict health risks. Because reports have suggested that restaurants are the most common source of foodborne illness.
Why Does This Research Stand Out
The research titled Where Not to Eat? Improving Public Policy by Predicting Hygiene
Inspections Using Online Reviews by Jun Seok Kang and Polina Kuznetsova of Stony Brook University, and Michael Luca and Yejin Choi of Harvard Business School tries to find an approach for governments to harness the information contained in social media in order to make public inspections and disclosure more efficient.
There have been studies in the past which tried to touch this issue using data analysis. But they concentrated on specific problems like influenza or food-poisoning and that is why they had to pay attention to a very small set of words for the NLP algorithm to train.
But this research differs from the rest of the attempts in the following two ways:
(1) It accounts for all the words that people use in the online restaurant reviews. They also considered words that may not be directly related to hygiene but are relevant to a certain extent. By doing this, they had more data and a broader set of conclusions to make as an aftermath of the analysis.
(2) This is the first work to use online reviews for the context of improving public policy and suggesting an additional source of information for policymakers.
Here’s How The Researchers Worked
The data set consisted of restaurant reviews from 2006 to 2013. Reviews that had words which suggested an unlikable taste in the food like “raw” or “salty” were not chosen since the research involved health concerns, not taste. Reviews containing severe things affecting health were only considered because they are the ones that deserve more attention and are of importance to food inspectors.
The team used in total two methods to examine and predict this because one kind of classification alone was not enough to extract a meaningful enough prediction. So, they had two kinds of predictions added together — the first one based on hygiene reviews, and the second one on content.
1. Hygiene-Based Reviews:
The first step was to check whether the sentiment of the customers correlated with the hygiene of the restaurant or not. It was found out by studying the average review rating and by counting the negative reviews. After that, the correlation between hygiene violations and degree of deception were studied using the bimodal distribution of review ratings and volume of deceptive reviews based on certain linguistic patterns in the reviews.
Reviews that were too far away from the average review rating were removed. There were certain reviews that seemed dubious and therefore acted as noise. Such reviews were also removed from the dataset.
2. Customer Opinion And Restaurant Metadata:
The researchers also examined features based on customer opinion like aggregated opinion and features based on restaurant metadata. This included cuisine and the area the restaurant was located in. These restaurant metadata features of cuisine and area alone showed a predictive power of about 66% which are significantly higher than the expected accuracy of random guessing which is 50%. They also found out that the past restaurant performance was a good predictor of future performance. With all the features combined, the accuracy was 81.37%. In this method, an assistance from Support Vector Machine (SVM) and Support Vector Regression (SVR) was chosen.
Results Of The Research
The research found out that hygiene-related words are very popular in being negative. The researchers, therefore, concluded that people pay special attention to bad hygiene when going to restaurants and do not care much to report when the place is clean. Another thing that the research found out that the words used by reviewers to describe food items, such as, “noodle”, “egg” do not seem to have a good sign, whereas words describing the way dish is prepared or presented like “grill”, “frosting” are of a good experience. Cuisines had a clear correlation with the inspection outcome. Using machine learning and data analysis, the research showed how can online restaurant reviews impact the to educate about health risks.
This research is just another example depicting the potential of analytics. With data, we can do several things and now predicting food inspection is one of them. With this technology being popular and more dependable, people will know what restaurant to go and what restaurant to avoid. Also, restaurants, in turn, will take ardent measures and pay extra attention to a better quality of hygiene in food as well as their restaurant territory.