MITB Banner

How Google’s Machine Learning Model Is Helping Decode Whale Songs

Share

 

 

The Single largest direct human impact on marine ecosystems comes from over-exploitation of resources by activities like fishing. Now, researchers have devised a technique to track the population of endangered marine life, like   Humpback Whales, by listening to their peculiar hour-long song.

 

Audio analysing techniques developed by Google’s Artificial Intelligence Perception team have been used before for YouTube videos for non-speech captions. And, now similar techniques are being used for conservational activities.

 

Google, in association with the Pacific Islands Fisheries Science Center of the US National Oceanic and Atmospheric Administration (NOAA), has developed algorithms to identify humpback whale calls in 15 years of underwater recordings from a number of locations in the Pacific. This research provides new and important information about humpback whale presence, seasonality, daily calling behaviour, and population structure.

 

HARP

In fact, Google has used HARP (high-frequency acoustic recording packages) devices to collect audio data (9.2 terabytes) over a period of 15 years.

 

Manually marking humpback whale calls is extremely time-consuming. That is why, researchers at Google, deployed supervised machine learning models to optimize images for detecting the whales. An audio event detection is considered as an image classification problem.

 

This data of different magnitudes of sound intensities are plotted on time-frequency axes.

Spectrograms of audio events found in the dataset, with time on the x-axis and frequency on the y-axis. Left: a humpback whale call, Center: narrow-band noise from an unknown source, Right: hard disk noise from the HARP

 

Spectrograms of audio events found in the dataset, with time on the x-axis and frequency on the y-axis. Left: a humpback whale call, Center: narrow-band noise from an unknown source, Right: hard disk noise from the HARP

 

For classifying the images, a Resnet-50, was used which has given reliable results in classifying non-speech audio.

 

A Resnet is a residual learning framework introduced back in 2015 to ease the training of deep networks. The inputs to the layers in the neural network are used as a reference to model these layers as a residual function for learning. So, as the depth of a network increases, the accuracy gain from this residual layers increases.

 

The idea behind using residual learning instead of stacking layers over layers is, challenges like delayed convergence caused to vanishing or exploding gradients. For example, in case of a vanishing gradient problem, the weights in the network receive an update with respect to the error function after each iteration, and the weights become vanishingly small and might lead to complete shutdown of a network’s training.  Resnet’s have also been good with tackling the degradation of training accuracy problem caused due to increasing depth and saturated weights.

 

Humpback whales have a varied, but sustained frequencies. These frequencies, if don’t vary at all then a spectrogram would display a horizontal bar. The arcs mean that the signals have been modulated.  The challenge with collecting humpback’s audio data is the noise that gets mixed up along with it; noise caused by the propellers of the ships and other equipment. This noise is taken as an unvaried signal and displayed as a horizontal bar on a spectrogram.

 

PCEN (per-channel energy normalization) is a technique used in far-field speech recognition tasks. PCEN is modelled as deep neural networks and uses dynamic compression instead of logarithmic compression to spot keywords in distant or noisy acoustic environments.This technique suppresses the stationary narrow band noise generated by machines resulting in an error reduction at a rate of 24%.

 

A whale song is generally a structure, sequential audio signal that can last over 20 minutes. And, there is a high possibility of a new song beginning within a few seconds. This incoming audio units with such large time windows give extra information useful for predicting with improved precision.The test-set consists of 75-second audio clips, for which the model showed accuracy scores above 90%.

 

Unsupervised Learning For Similar Song Units

In this approach, the labels are used to learn from the ResNet output. The classification is done by identifying the Euclidean distances between two ResNet output vectors belonging to corresponding audio units of similar time frames. This helps in distinguishing different humpback unit types from each other.

 

Calculating distances has been done using Unsupervised Learning Of Semantic Audio Representations. The basic idea here is to highlight the correlation between closeness in time and closeness in meaning. So a sample consisting of three vectors which are representations of sounds of humpback unit(anchor), a similar unit(positive) and noise(negative) respectively. The model tries to minimize the loss such by forcing the Euclidean distance between the anchor-negative exceeds that of anchor-positive distance.  The nearest neighbours in the entire dataset are retrieved using Euclidean distance between embedding vectors.

Time density of presence on year/month axes via Google AI blog

The above plot summarises the model output with respect to time and location(Kona and Saipan). The results clearly show that seasonal variation is consistent with a known pattern in which humpback populations spend summers feeding near Alaska and then migrate to the vicinity of the Hawaiian Islands to breed and give birth.

 

These results further can be used to determine the effects due to anthropogenic activity and the success of this project demands for application of machine learning tool to a wider spectrum of environmental challenges.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.