Reports state that 70 percent or more of enterprise data is usually inaccessible for analysis. This data is either locked away or remains hidden in the form of email message files, word processing documents, spreadsheets, PDF files, drawings, photographs, handwritten notes, scanned docs, notes, and flags. Enterprises use the term “Dark Data” to define such data types.
Insights generated from these types of data can be combined with the already available structured data insights, and utilized to address various industry pain points. Computer vision and pattern recognition has made it possible for enterprises to unlock insights from unstructured data, that was considered lost until now. There is a separate field dedicated for this, called “Dark Analytics.”
Deloitte annual Technology Trends report analyses the trends that could disrupt businesses in the next 18-24 months. The 2017 Report recognizes dark analytics as a disruptive technology, among others.
Understanding Dark Analytics
Dark Analytics transcends the barriers of structured data, casting a much wider data net that can capture a wider array of unstructured data, which was previously untapped or hidden.
The three data dimensions that dark analytics stresses on include:
Traditional unstructured data: Most of the organizations have heaps of structured and unstructured data. Emails, notes, messages, documents, logs, and notifications usually constitute the “traditional” unstructured data. These are usually text-based, and largely remain untapped. They could reveal information on pricing, customer behavior, and competitors leveraging dark analytics. “The 80 percent rule” states that 80 percent of the total data enterprises have, is traditionally unstructured data.
Nontraditional unstructured data: The second dimension to dark analytics focuses on a different category of unstructured data, audio and video files, and still images, among others. New opportunities for signal detection and response are realized when a layer of analytics is added in real-time to audio and video feeds.
However, this type of data can’t be mined using traditional reporting and analytics techniques. Today, progress in the fields of computer vision, advanced pattern recognition, and video and sound analytics have facilitated companies in mining that data to better understand their customers, employees, operations, and markets.
Data that lies within the deep web: The third dimension to dark analytics features the infamous deep web. This could contain the largest body of untapped information, comprising data curated by academics, consortia, government agencies, communities, and other third-party domains.
The sheer size of the domain and its lack of structure makes data search daunting task for businesses. However, the intelligence community constantly monitors the volume and context of deep web activity to identify potential threats. On the same lines, enterprises might soon have the tools required to curate competitive intelligence from deep web. For instance, Deep Web Technologies designs search tools for retrieving and analyzing data, usually inaccessible to standard search engines.
Pitfalls in mining dark data
The field is relatively new, and venturing into the landscape could imply risks for the continued business health and well-being of enterprises. Some of the typical risks faced by an enterprise leveraging dark analytics include:
- Legal and regulatory risk: Data covered by mandate or regulation, for instance confidential, financial information (credit card or other account data), or patient records could end up appearing anywhere in dark data collections. This exposure could involve legal and financial liability.
- Intelligence risk: Dark data can often contain proprietary or sensitive information, which reflects upon business operations, practices, competitive advantages, important partnerships, and joint ventures. Disclosure of such classified information could compromise important business activities and relationships.
- Open-ended exposure: Dark data can contain unknown and unevaluated sources of intelligence, entailing exposure to loss or harm. But, the user usually lacks the tool or solution that can help him ascertain such a scenario.
- Reputation risk: A data breach usually reflects badly on the organizations affected. Enterprises could end up losing customer trust and reputation, in case such an incident occurs.
Handling dark data better with Dark Analytics
Undoubtedly dark analytics promises growth for enterprises in the upcoming years, despite the risks involved. The space will draw a lot of investments form companies worldwide. However, it’s unlikely that all the dark data will be valuable. Enterprises will have to devise a strategy while approaching the space.
Care must be taken to regularly audit and trim the database. The old data must be structured and assigned categories, to make it easier for businesses to swiftly retrieve the stored information later. To deal with data security concerns, it’s advisable to encrypt the data. Companies must ensure that encryption is performed both for data sitting in the in-house servers and the cloud storage.
Furthermore, businesses must implement data retention and safe disposal policies in place, which allows them to retain valuable data for later use. The policies must be aligned with the prescriptions of the Department of Defense.
How can Dark Analytics lead to unexplored opportunities?
Dark data is potentially a land of undiscovered, neglected opportunities. It has so much to offer for the entire length and breadth of the industry. Companies can gain valuable insights to drive their business.
Dark analytics can help organizations precisely forecast demand for products and services by accurately analyzing clickstream data, or obtaining product telematics. Besides, it can help them solve customer issues by isolating them. Dark analytics can also help companies in building a powerful supply chain, by furnishing them with granular level information.
Dark analytics can help in revealing key insights pertaining to customer feedback. This insight can be leveraged for improving product quality. Companies usually check customer call detail records to reveal their sentiments and feelings. Companies can also use server log files to obtain statistics related to website traffic.
Case in point involves a project at Copenhagen Airport, which serves as a great instance for dark analytics. The airport was collating useful information by crunching the data in the log files of the airport’s wi-fi routers. Passengers’ smartphones would “ping” routers while they walked through the terminals, offering data on passenger movements. This data showcased the ability to answer commercial questions, for say, “which is the most visited area of duty free?”
Dark data can contain sensitive information. Inadvertent or deliberate leakage of information could mean trouble. Therefore, organizations need to implement reliable data tagging and structuring technologies to identify and categorize the data obtained. This step is crucial, otherwise, financial, regulatory, loss of competitive advantage, and legal troubles could soon follow.
Try deep learning using MATLAB