Automation is slowly replacing numerous manual jobs, but in the case of art, the role of artificial intelligence has only been that of an assistant to human creativity. New and improved AI-based tools are invigorating artists by providing a platform to visualise every idea that inspires them by allowing them to draw, erase and redraw. This effectively eliminates the process of filling trash cans with discarded ideas.
Experiments like SketchRNN, Google’s experiment with AI, allows artists to draw something no matter how bleak the idea is. Essentially, the sketch is passed through recurrent neural networks which have been trained on millions of doodles collected from the Quick, Draw! game. Built on TensorFlow, Sketch-RNN comes up with many possible ways to continue drawing this object based on where the artist left. The model can also mimic drawings and produce similar doodles.
Brown University Researchers Are Trying To Ace Perfection
Though GANs have stunned the world with their surrealistic art pieces, AI is still a long way from replicating a human-like writing or drawing skills. Given the complexity of human hand movement not to forget other factors like the speed of each stroke and spacing, it is difficult for a robotic arm to do the job elegantly.
To perform these functions efficiently, the research team of Atsunobu Kotani, an undergraduate student at Brown and Stefanie Tellex have devised two separate models: a local model to draw each stroke and a global model to learn the shifting action. The researchers demonstrate a robot that was able to write “hello” in 10 languages that employ different character sets. The robot was also able to reproduce rough sketches, including one of the Mona Lisa.
Given an image of just drawn handwritten characters, the robot infers a plan to replicate the image.
This method enables the machine to learn in real time and with more ease. Just by observing the target image, the robot arm immediately makes an attempt to draw/write.
The researchers behind this project segregate writing into two steps:
- A drawing action which draws each stroke, and,
- A shifting action to reach a new point to start
How Does The Network Work
To take action, the following attributes are considered:
- Already visited regions
- Current location
- Difference image
- Continuously connected target region
The network combines the image formed by segregating a certain word into a tensor. Now, this combined image is encoded with a residual network to obtain a tensor of complete size. From this, a part is extracted say (5,5,64) from a tensor size of (100,100,64).
This extracted tensor is flattened to form a vector of a certain length and is fed into an LSTM cell which outputs a new vector.
The final prediction is made by mapping these vectors using two fully connected layers and this process is iteratively done until termination. This whole process encapsulates the local model. The termination of the local process will be followed by the construction of tensor from a new set of images. And, again a fully connected layer is applied and receive an image of say, size (100,100,1) which becomes the global model.
Source: Paper by Atsunobu Kotani
The algorithm makes use of deep learning networks as discussed above, which analyze images of handwritten words or sketches and can deduce the likely series of pen strokes that created them. The robot can then easily reproduce the words or sketches using the pen strokes it learned as can be seen below.
Key to making the system work, Kotani says, is that the algorithm uses two distinct models of the image it’s trying to reproduce. Using a global model that considers the image as a whole, the algorithm identifies a likely starting point for making the first stroke. Once that stroke has begun, the algorithm zooms in, looking at the image pixel by pixel to determine where that stroke should go and how long it should be. When it reaches the end of the stroke, the algorithm again calls the global model to determine where the next stroke should start, then it’s back to the zoomed-in model. This process is repeated until the image is complete.
Hard coding a robot to perform all the above skills even poorly, takes a lot of computational heavy lifting and some ingenious constraint assumption to make the robot perform decently, especially when put under unstructured, real-world situations.
Asking a robot to run, do a cartwheel or throw a pitch would have sounded like a chapter from a generic sci-fi novel a few years ago. Now, with the advancement of hardware acceleration and the optimisation of machine learning algorithms, techniques like reinforcement learning are now being put into practical use.
With this new ability of human-like see and draw simultaneously, machine vision has attained new heights.
Know more about the work here.