Do you own a smartphone with a powerful camera, and still struggle to take good pictures of surprising, beautiful and memorable moments? Well, not anymore. In the coming days, artificial intelligence (AI) and machine learning (ML) will soon command cameras and photography equipment to capture favorite pictures and videos, using trained algorithms and patterns formed from data. One such camera device is Google Clips which uses ML to detect and identify faces and continuously checks for any photogenic moments to record.
Google Clips and the User Experience
Released in 2017, Clips was conceptualised as a part of the Google’s User Experience (UX) program. Google wanted the User Experiencers (UXers) to feel more than just a typical interface, by introducing ML concepts to address realistic needs. It stressed on its “human-centered machine learning”, so that users who are acquainted with core ideas of ML will get a seamless and idealistic experience. The emphasis on AI is on three agendas:
- Seeing a real human need: When it comes to photography, capturing pictures at a perfect moment is a challenging task. Building a product that would focus more on spontaneous moments.
- Focusing on intelligence: UX ensures that the input to AI is of the same intellectual ability that of humans. If they cannot do it, AI wouldn’t do it either. The data collection and modeling only makes sense if it imitates human intelligence.
- Gaining trust : One of the crucial aspect is how far AI would surpass humans. Google has a ‘reductionistic’ approach to make the process more simple and effective. It says that there is no one-for-all intelligence approach for AI systems.
Clips underwent a lot of work before it was conceived as a product ready for the market. Google took three years to work on the design and user interface aspects. This was mainly because ML needs to be aligned with human needs. ML relies heavily on the type of data, the methods used to train that data and so on.
How Does Google Clips Work?
Clips is a small square handheld device which gets activated when the protruded lens is twisted. It is then kept on a steady platform and is ready to go. It records moving images at a burst of every seven seconds, with a field of view being 130 degrees for the lens.It also has a silicone clip to rotate the device from any angle and be clipped on to equipments. It adjusts to the objects surrounding it and picks up only the familiar ones to learn a pattern. This way it recognises objects of interest such as family members and friends. The device does not have a viewfinder but has a shutter button which is made unnecessary since AI takes care of the recording.
- It sports a 12 megapixel sensor under the 130-degree field of view lens.
- The storage capacity is 16GB, which is a considerably low storage space for today’s technology. However, Google has provided the option to store photos on Google Photos.
- It takes pictures at a rate of 15 frames per second (fps).
- The device is available only in US as of now. It is priced at $249. Google says it will roll out the device globally in the following months.
Juston Payne, product head for Google Clips, explained that the device is trained on ML models of cats and dogs. He went on to say that it can also identify goats.
ML In Google Clips:
Google has focussed on four aspects to achieve consistency when it comes to ML — capture, composition, social norms and editing. The data for the model is built with photographic factors such as sharpness and stability of the pictures. On the other hand, diversity and redundancy in the images are addressed using three vectors for the model such as time, visual and people, as these factors prevent too much data being duplicated in the ML model. Since the product is AI-driven, Google has taken a leaps-and-bounds approach to test the UX rather than directly implement it onto the ML model before finalising the device’s working. Therefore, the device uses less power and processes the media much faster.
The difficulty with Google Clips is that you cannot classify it under a single category such as a surveillance camera or even an action camera. UX will expect a dip and UXers will not find it interesting if it is interferes with basic functionalities and affords no creativity.
Try deep learning using MATLAB