3D objects on a computer screen look like real life, and with the 3D glasses on, it’s almost like witnessing the event live. But is it possible to convert a two dimensional image to a 3D object and make it “come alive” with artificial intelligence? Let us dive into a research project developed in Japan.
Mesh rendering gives up exceptional objects by constructing it with the help of neural networks. This process usually involves conversion of a 2D image into 3D by overlaying the image over a 3D object. It is then redefined with the backward pass of 3D rendering and then pushed through a neural network. This platform has been explored by researchers Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada from the RIKEN institute in Japan. This paper explains how the team built a solution for mesh rendering with other gradients.
A polymesh is a promising candidate to get this job done. So what is polymesh? A 3D model which is composed of polygons. The polygons are connected to each other to form a sort of net (or ‘mesh’) that defines the shape. It is not an easy process to render a polymesh of a two-dimensional image onto a three-dimensional object; this is composed by integrating a process called rasterization, which prevents back-propagation
Single-Image 3D Reconstruction
If an aeroplane can be constructed by this method with the help of a 2D image, it is definitely possible to reconstruct any 3D model a human can perceive. Rendering with a voxel-based method gives a not-so-sharp 3D model like below — but with this technique it is possible to develop more dynamic objects with higher resolution.
Comparison With Voxel-Based Method
Mesh reconstruction does not suffer from the low-resolution problem and cubic artifacts in voxel reconstruction. Lets us look at how these techniques incorporate the neural network to build high resolution models:
This approach outperforms the voxel-based model in 10 out of 13 categories. This table shows the comparison of the metric with the voxel-based model.
2D-To-3D Style Transfer
The styles of the paintings are accurately transferred to the textures and shapes by mesh-based rendering method. In the examples below, one can observe it in the outline of the rabbit and the lid of the teapot.
The style images are Thomson No. 5 (Yellow Sunset) (D. Coupland, 2011), The Tower of Babel (P. Bruegel the Elder, 1563), The Scream (E. Munch, 1910), and Portrait of Pablo Picasso (J. Gris, 1912).
The team has also worked on implementing Google’s deep dream tech onto their 3D models. This technique gives us a qualitative sense of the level of abstraction that a particular layer has achieved in its understanding of images. This technique is called as Inceptionism in reference to the neural net architecture used. (Explore Inceptionism gallery for more pairs of images and their processed results, plus some cool video animations.)
Incorporating the 2D images to 3D world is one of the hardest problems in the field of computer vision. And rendering (3D-to-2D conversion) lies on the edge between 3D and 2D world. A polygon mesh has been found to be an efficient and intuitive way of representing in 3D. Hence, the backward pass of rendering a 3D is worth implementing.
Rendering cannot be implemented onto the neural networks without modifications because the back-propagation prevented from the renderer. In this work, it has been proposed to find the approximate gradient for rendering. The pipeline of actions conducted to the processes involved in this technique is represented below.
The 3D mesh generator has been trained with the silhouette images. The generator tried to minimise the difference between the silhouettes reconstructed into 3D shape.
2D-to-3D style transfer was performed by optimising the shape and texture of a mesh to minimise style loss defined on the images. 3D DeepDream was also performed in a similar way.
Both applications were realised by flowing information in 2D image space into 3D space through our renderer.
The neural network renderer code is available for the public here. One can make use of this code to replicate their model and experiment on it. Other applications like 3D reconstruction, style transfer and DeepDream are currently being constructed. With tuning the parameters, it possible to extract higher resolution images compared to the ones posted above.
Try deep learning using MATLAB