In the tech world, not a single day goes by without this popular buzzword that many believe would soon become a scientific reality. We are talking about Singularity— that dreaded and much-awaited moment where AI exceeds its makers and takes over the world. Robots and artificial agents in the real world are already hastening the full-apocalyptic future but are we anyway closer to Singularity. AI advances, accurate as they may be from the fields of computer vision, speech recognition, object recognition and natural language processing are focused on one task and the applications are fed many examples to learn from and to be objectively classified. Human beings, on the other hand become intelligent in an unsupervised manner, often working from few examples and few objective labels, notes Daniel Lemire, a computer science professor at the University of Quebec.
IBM Research’s Raghavendra Singh who is a member of the Cognitive Computing group has a similar view. During a recent Cypher Talk, Singh shared his insights on the most revolutionary AI application that is transforming ecommerce and automotive industry — computer vision and why it is so difficult? “We talk about Deep Learning and how it has transformed computer vision but if I still give a photograph and ask you to segment out the people and the objects, it is going to be fairly difficult for a computer vision algorithm to do it,” he said during the talk.
Over the years, computers have been outfitted with hardware to capture data and are now fitted with algorithms to crunch that data. However, making the algorithm perform deductive reasoning is the challenging part. So, what exactly is the challenging part in computer vision? “Computer vision is a challenging problem and the problem in vision is basically is it sees the image as matrix numbers. So, when one inputs an image or a video you are just giving it as a matrix number and there is nothing else to it. The vision algorithm has to basically processes these numbers, analyse the numbers and make sense out of it just as we make sense out of it,” Singh shared.
Adversarial Images can fool AI systems
Over the years, researchers have discovered that the patterns AI looks for in images can be reverse-engineered and exploited through an “adversarial example.” Earlier research indicated how a Google team fooled it’s AI system into seeing an ostrich by tweaking an image of a school bus by just 3. Adversarial images imply that computer vision systems are still not fool-proof and can be tricked into seeing something else just as easily. Google’s research is backed by new study by an MIT team of five researchers who have generated an algorithm that can fool an AI system with adversarial images and this can be applied both to two-dimensional images and 3D printing. So, what’s an adversarial image? An adversarial image is one which uses patterns – overlayed or on the image to trick the AI system. The research paper presented a general-purpose algorithm for generating adversarial examples. Here’s the striking part of the result — MIT team 3D-printed a toy turtle with a pattern that fooled Google’s object detection Artificial Intelligence system into thinking it was seeing a gun with more than 90% accuracy.
So why is Computer Vision difficult?
According to writer and analyst David Amerland, in computer vision (and object recognition in search) computers require a higher standard of processing than humans. The development of artificial intelligence in both the specific and the general sense is closely tied to cracking the computer vision challenge. Citing an example, he emphasized how even though search has become good in recognizing objects in images and videos, computer logic still stumbles on inductive reasoning (context) when it is unsupervised. This, in a way also affects computer vision, where the specific context is involved.
Case in point — a self-driving car outfitted with a telemetry system can deliver a vision which is far better than human vision but will it be able to see through a fog and know what’s coming around the corner and optimize the journey accordingly.
It is a view echoed by Singh as well. Today, machines are supposed to beat us in Poker, lip-reading, the ancient game of Go and in 2015 it revolutionized with advances in image classification. “Unlike machines, we don’t need one million different examples to go through to learn what an apple is. So, when people talk about status of human vs machine, I think it should be taken in a larger context. While Deep Learning models have transformed the world with various applications, machine are just focused on one task,” he said.
Understanding Convolutional Neural Network and how they are biologically inspired
It is a well-known fact how the science behind building artificial computational systems is inspired by the human brain and the cross-pollination between neuroscience and computer science is what led to the advances in artificial computational systems. During the talk, Singh delved into the connections between neuroscience and computing and the recent progress in computer vision with an emphasis on visual object recognition, an area which is fueling a lot of development in e-commerce, logistics and automotive sector.
One of the most popular artificial neural networks — Convolutional Neural Network, a biologically inspired model (inspired by the visual cortex) includes various multilayer perceptrons, Singh shared during the talk, emphasizing how CNNs have gained popularity for recognizing thousands of object categories from natural image databases. With an architecture similar to the human vision system, CNNs are still far from matching human accuracy in object recognition.
According to Singh, the invariant object recognition has always been a very challenging task in computer vision, but this can be achieved easily by a three-year old child. And CNNs lack the processing mechanism that humans possess. Case in point – in a cluttered image (with objects in the foreground and background), improving the recognition accuracy in cluttered images is even more difficult.
Of late, there has been a groundswell of research in this area and this feed-forward architecture has inspired a new generation of bio-inspired computer vision systems termed Deep Convolutional Neural Networks (DCNN), that are pegged as “the best algorithms for object recognition in natural images”. The research suggests that humans recognize objects via 2D template matching, as opposed to constructing 3D object models, and DCNNs are gradually inching towards close-to-human level performance in object recognition accuracy.
Citing a use case from the e-commerce sector, Singh demonstrated how users are demanding an in-store experience with search recommendation built. Users have gravitated from catalogs to photos clicked from streets. “A photograph from the street visual search use case is quite complicated since the input image is very different and could have lots of different variations. The difference between visual search and visual browse could be more how to classify. One will have to learn invariances to certain things – pose invariance and background invariance which is extremely important for visual browsing,” he shared.
Outlook— gearing towards cognitive systems
In the last few years, there has been a lot of buzz around cognitive systems. IBM defines cognitive systems as “deterministic, programmable systems that perform operations to a world of probabilistic, cognitive systems that create hypotheses and make recommendations”. According to a white paper authored by IBM Research’s John E Kelly, these cognitive systems are not meant to replace human thought or actions — but rather augment them and will lead to products and services that can think. Powered by continuous learning, these cognitive systems will automate repetitive tasks. So, in a way the much-touted ‘man vs. machine’ argument is just a misguided marketing gimmick since it is still early days for cognitive computing.