MITB Banner

Unlocking Value from Speech Data

Share

Screen Shot 2015-12-26 at 11.09.07 AMToday a lot of unstructured data is being generated in the form of text, images, videos and speech. This data could contain valuable information that companies can utilize to make the right decisions. In this article, we focus on one such form of unstructured data which is speech.

We present a use case, where we analyzed speech in clinical trials to automate a significant part of the operational processes, which has the potential to reduce the quality control costs by half.

What Is Speech Analytics?

Speech has several aspects to it. Some of the elements of speech like words, speech rate, tone, emotions etc. are discernible by humans.

There are other elements that humans don’t identify so easily like minor variations in pitch and speech rates.

Speech analytics is the characterization of speech based on these factors to derive actionable business insights from the data.

There are several ways in which speech can be analyzed, based on the type of application:

Full transcription

Full transcription involves conversion of speech into text format in applications like Siri or in transcribing meetings (for example, between a doctor and a patient), conferences, etc. Converting speech into text allows it to be searched more easily.

Speaker diarization

Speaker diarization involves separation of certain sections of speech based on the speaker. While transcribing speech with more than one speaker, like a meeting or a conference, it is important to not just convert speech to text but to identify who the speaker is.

Keyword detection

Keyword detection entails identification of certain specific keywords in an audio. Customer care centers can detect certain keywords like “unhappy” and “disappointed” and use them to monitor agent performance.

Speaker authentication/identification (voice fingerprinting)

Speaker authentication/identification (voice fingerprinting) involves identifying unique characteristics in every speaker’s voice that allow us humans to differentiate between and identify speakers. Some fraud detection applications capture these unique features and create voice fingerprints during customer care interactions and compare against known blacklists.

Emotion detection

Emotion detection involves identification of the emotional state of the speaker. This can help identify irate customers during customer care interactions, among other applications.

Other characteristics of conversation

These are pauses, noise, etc. Characteristics like loud noises or long pauses could be indicators of a bad customer care conversation.

Depending on the type of business problem, the analysis framework would have one or more of the above.

Problems Faced During Clinical Trials

Testing the efficacy of drugs for mental illnesses involves the doctor having detailed discussions with the patients to evaluate their mental state at various stages of the treatment.

The clinical trials evaluate both the quality of the interviews and then whether or not the drug meets its targets. Interview quality evaluation typically involves experts listening to audio recordings of the interviews and scoring it on various quality metrics. This manual review is quite expensive.

The objective here is to use speech analytics to assist the manual reviewers and significantly cut down the costs associated with review time.

Role of Speech Analytics

Pre-processing

The first step was for us to remove any background noise so that the spoken dialog is clearly heard. We then split the files into sections of alternating speech and silence. Following this, we grouped the speech sections into clusters, each representing different speakers.

Feature extraction

We then extracted several hundred features from the audio files starting from direct features like duration and amplitudes to more abstract features like speech rates, frequency wise energy content and MFCCs. Among other things, these features also helped capture information that was characteristic of a person, similar to how a human would identify a person by their voice.

Prediction

The objective was to predict an interview quality score, a single number constructed by combining several qualitative aspects of the interview quality. We computed this score manually for a few audio files and then developed machine learning algorithms to identify inherent patterns and predict this score for all other audio files. We used various supervised machine learning techniques – logistic regression, boosted trees, random forests, support vector machines, etc. The best performing algorithm improved accuracy of identifying bad interviews by more than 50% compared to the random baseline, meaning the cost of identifying potentially bad interviews was halved. In other words, in the same amount of time, one could identify and review twice the number of bad interviews and gain rich insights which will eventually help the quality of clinical trials significantly.

Conclusion

Speech analytics is an area with potential applications in almost all businesses that have any form of verbal interaction from call centers to classrooms. With the increase in computing power,

and big data technologies, analyzing large volumes of unstructured speech data is becoming increasingly mainstream. When used appropriately, it can give a company significant reduction in cost as well as strong competitive advantage. Some functions like customer care have started incorporating speech analytics but there is still a long way to go before the full potential is realized.

[divider divider_color=”#777777″ link_color=”#777777″ size=”1″]

About the Authors/Tiger Analytics

Patanjali V, the primary author, is a Lead Data Scientist at Tiger Analytics. He leads advanced analytics engagements that involve complex/unstructured data.

Anand Bharadwaj, the co-author, is a Director at Tiger Analytics. He has 18+ years of experience in the consulting industry and loves to ensure business value realization of analytics solutions

Tiger Analytics, (www.tigeranalytics.com) provides Big Data and advanced analytics solutions to help businesses make data driven business decisions. We bring deep expertise in data sciences along with understanding of business needs and state-of-the-art technologies to solve business problems.

Share
Picture of Anand Bharadwaj

Anand Bharadwaj

Anand leads business development for Tiger Analytics, bringing in a strong consulting perspective which puts success of our clients first. He has led teams in IBM, Cognizant, among others, for 18+ years building start-ups and key client relationships. He has helped clients solve a variety of problems across verticals using IT, analytics or consulting. He has an MBA from Xavier Institute of Management Bhubaneswar (XIMB), India and has done executive education programs from London Business School and Carnegie Mellon University.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.