Machine Learning has found its application in various parts of the industry. One of the toughest challenges for an intelligent system can be to build something sensible out of raw inputs — for example, the culinary arts. But what if a machine or an algorithm can compose and generate a recipe for you? This is exactly the question which was answered by a team of researchers from MIT and Qatar Computing Research Institute.
A joint team from these institutes worked on a machine learning system which can follow a recipe and make a pizza. The researchers looked at food preparation as following a set of instructions and also as changing how the food looks after adding a key ingredient or putting the food through a process. To achieve a system that can perceive food making as following a manual, the researchers compose operators that can add or remove ingredients from a dish. Each of the operators is actually a Generative Adversarial Network (GAN) which predict how the food looks after every step.
The aim of the researcher is to build a model that will:
- Classify pizza toppings by using supervised learning
- Remove the toppings and show what is underneath the topping
- Infer the ordering of the pizza topping
The researchers built a custom dataset which was synthetic in nature and consisted of clip art style pizza images. Researchers see two main advantages of having such images as training data. They say, “ First, it allows us to generate an arbitrarily large set of pizza examples with zero human annotation cost. Second and more importantly, we have access to accurate ground-truth ordering information and multi-layer pixel segmentation of the toppings.”
They also had ground truth annotation which marked the topping for each synthetic pizza. They also downloaded some half a million pizza images from Instagram using the hashtag #pizza. And they got more than 9000 images annotated using human annotators for various toppings found on the pizza.
Given image level labels from RCB training images, the team has a binary vector representing labels for each of the pizza images. The goal for the researchers is to learn how the toppings look from the training data. For this purpose, they create small datasets with and without a particular topping. In this architecture, the generator generates a topping on the pizza image and another generator checks how the topping matches the pizza and removes the topping. The discriminator is involved with judging the quality of the generated composite images.
The two generators and the discriminator are learned jointly. At the test time, the model can now generate pizzas and can be soon as assembling a pizza using its generator and discriminator architecture (GAN). This can be also seen as following a set of instructions. A reverse scenario can also be envisioned. The researchers put it in the following way, “The reverse scenario is to predict the ordered set of instructions that were used to create an image.”
The inference procedure happens in the following manner:
Classification: Discriminator identifies pizza toppings.
Ordering: The model also manages to understand the layering of toppings and which to remove.
Training Process and Results
The researchers trained using a learning rate of 0.0002 for the first 100 epochs and the decay took it to zero in the next 100 epochs. For pizza images which were real, the researchers’ centre cropped and resized the images to 256 by 256 pixels. The researchers achieved a 99.9% mAP on the classification of toppings. Furthermore, the average normalized Damerau–Levenshtein distance for the PizzaGAN is claimed to be 0.33.
This is a good step towards understanding food science and an innovative way of looking at how AI can change food for humans. This new experiment can be transferred to other layered food items. The researchers say, “Though we have evaluated our model only in the context of pizza, we believe that a similar approach is promising for other types of foods that are naturally layered such as burgers, sandwiches, and salads.”