In the domain of deep learning, development of Recurrent neural networks (RNN) has had a stellar improvement in the past few decades. RNN has progressed from just being a possible theoretical concept to a standard element in neural network applications, which are now being used in machine learning areas such as handwriting recognition, language learning, speech recognition among others.
In this article, we will discuss a new type of RNN developed by Google Brain, their artificial intelligence team, which analyses human sketches and presents them on a vector format.
The Issue Of Consistency In RNN
RNN is a type of neural network in which the weights in the network are manipulated in order to form a network graph displaying a time sequence that will help process and remember inputs. This is done because RNNs can learn and remember the context in prediction problems for machine learning. This is also the reason it has now been explored in modelling images.
However, the inherent problem in basic RNN models lie in the inconsistency in the output images. They sometimes give out an unnecessary image feature or give out a low quality or might even alter the images altogether, which is undesirable.
Using Sketch-RNN To Analyse And Generate Sketch Drawings
David Ha and Douglas Eck, scientists at Google Brain, have come up with a novel RNN that would generate handwritten drawings of common objects. This RNN, called Sketch-RNN, would seek and learn information from loads of drawings by humans which are again segregated into different classes and is trained to generate them. This may help artists and graphic designers in their work to come up with newer patterns or ideas.
At first, they specify two criteria, known as unconditional and conditional generation respectively, to create vector images based on human drawings obtained. They also consider the mathematical aspects related to generating a vector image. After all this, the RNN is trained extensively on these images. They ensure that coherence in the images is met based on the type of strokes in the drawing (this is the required to calculate loss function).
The dataset for the neural network model was obtained from Google’s QuickDraw, an AI web application, where the researchers asked users to draw objects of a specific class. The drawings were compiled into a dataset called ‘quickdraw-75’, which consisted of 75 classes. Each class had a training set of 70,000 samples. On top of this, 2500 samples were selected for validation and testing.
The sketch is represented as a set of pen stroke actions which again helps in establishing the vectors through coordinates. The researchers give a description with an example:
“A sketch is a list of points, and each point is a vector consisting of 5 elements: (∆x, ∆y, p1, p2, p3). The first two elements are the offset distance in the x and y directions of the pen from the previous point. The last 3 elements represents a binary one-hot vector of 3 possible states. The first pen state, p1, indicates that the pen is currently touching the paper and that a line will be drawn connecting the next point with the current point. The second pen state, p2, indicates that the pen will be lifted from the paper after the current point, and that no line will be drawn next. The final pen state, p3 indicates that the drawing has ended, and subsequent points, including the current point, will not be rendered.”
They use an autoencoder type of neural network called Sequence-to-sequence Variational Autoencoder, which is used for sequential predictions. The block diagram is given below
From the diagram, it can be seen that the input consists of a bidirectional RNN (encoder) which takes the sketches, and the output decoder gives out the vectors of specific latent size. The outputs and inputs pertaining to the end-to-end decoders are determined through a probability distribution function (pdf).
The challenge for the RNN lays in the action of stopping the drawing which is unbalanced due to the probabilities associated with the model. In the study, this is controlled using unconditional generation where the researchers train only the decoder RNN module, and thus trained wholly from then on.
Finally, the sketch-RNN was tested experimentally on various settings as well as different image classes (single-sketch, multi-sketch drawing etc.). Long short-term memory(LSTM) network forms the encoder part of the RNN while Hyper LSTM forms the decoder. When tested, the RNN performed well with an accuracy of 80% to generate as well as distinguish the 75 classes.
Although the accuracy achieved in the output drawing might not be that exceptional, nonetheless it will help designers and artists to idealise patterns on a large scale. In addition, the sketch RNN might also be tried with variations with respect to encoder-decoder systems to improve accuracy in the generated sketches.