Analytics India Magazine presents third in the series, Interview with our Valued Speaker at Cypher 2016, Prasad Y.
Prasad Y is the CEO of HIDDIME.COM, a Cloud Analytics Service and the founder of Lead Semantics, a Semantic Big Data Analytics company, which owns HIDDIME.COM
Prior to Lead Semantics, Prasad had founded a Telecom analytics company that developed and sold a carrier grade Analytics Appliance built with COTS hardware, a first of its kind in 2008 offering Peta-scale network data mining and analysis at sub $1Million price.
He has also spear headed a 5 year long AI promotion effort through a Lab run that focused on Semantic Technology. This lab delivered solutions in NLP, Knowledge based systems in Digital learning and Domain specific rapid -integration and analytics with project support from Franz, IIIT Hyderabad and institutes like Stanford Research Institute, Palo-Alto, US, AI labs at Auckland University of Technology, NZ, etc.
Here is the excerpt from the Interview.
AIM: Would you like to share with us about your talk at Cypher?
Prasad: Sure. The talk is about ‘Smart Big Data Lakes’. We are at yet another exciting phase in the BIG DATA story.
Harvesting analytics from BIG DATA has resulted in the concept of DATA LAKEs, which are assorted BIG DATA clusters of storage, compute, software pipeline infrastructures that are setup as the ‘staging area’ of source-formatted data for purpose of computing analytics (separate from analytics from Datawarehouses). DATA LAKES are part of newer BI ecosystem (Gartner).
Promise of bigger benefits from combining and processing together the (largely) unstructured source data (under DATA LAKEs) and the structured source data (fed to Datawarehouses) as needed has recently lead to the idea of a comprehensive BI and Analytics substrate. This new BI ecosystem with BIG DATA supported DATA LAKES is still in a nascent state with many open questions.
Alongside the noisy beat of BIG DATA hum-drum, mature, standards-based Semantic and Knowledgebase technologies having successfully delivered on very large Intelligence, Defence, Life sciences and governments’ data integration and analysis projects have assembled decade(s) of tried and tested technologies and best practices, which now seem appropriate for the challenges facing new BI ecosystem.
So, comes the SMART BIG-DATA LAKE (SBDL). SBDL answers challenges of DATA LAKES and the older rigid Datawarehouses
AIM: Tell us about your journey in the analytics industry.
Prasad: My background is in Theoretical Computer Science. Prior to founding Lead Semantics, I was part of teams that worked on midsize projects involving AI and Analytics using mostly Knowledge based techniques.
Lead Semantics has worked with Semantic Technology and BIG DATA from its inception. Our teams had the privilege of working with experts (Franz and others) who are considered pioneers and leaders in Semantic Technology and BIG DATA. Our teams has executed several large projects where solutions involved using Knowledge Bases, Semantic Technology, NLP and Distributed and In-Memory processing. We are naturally cut out to do big data analytics.
AIM: Why do you think analytics is important for an organization?
Prasad: Organizations collect data so they can draw insights from it. Analytics are a way to generate insights. Big Data presents an unprecedented opportunity for organizations to collect and process vast amounts data. Within the interconnections (may extend back to historical data) that exist in data lay hidden the latent meaning which can be gold for marketers, product developers in virtually every industry vertical.
Today and going forward, organizations without ‘analytics’ strategy can no longer compete! My view is this is true even for small businesses as it is for the largest of organizations.
AIM: Would you like to tell us something about HIDDIME.COM?
Prasad: HIDDIME.COM (from Lead Semantics) is new generation IDEA tool (‘Interactive Discovery and Exploratory Analytics’ tool) in the Browser for frontline business managers and data analysts.
Hiddime can work with the current BI Datawarehouses, DATA LAKEs and importantly the new SMART BIGDATA LAKEs. So, with Hiddime customers are future proofing for seemingly inevitable shifts in the evolving BI ecosystem within the enterprise.
Hiddime.com is built around a patent-pending ‘storage-tied interactive visual-grammar’, which makes it possible for fast and easy point & click exploration of large data sets in the browser by non-IT savvy business managers and data analysts who are experts in their line of business and data.
It is a volume based subscription service that is fully secure with back to back guarantee for data security and privacy on Amazon AWS.
AIM: Would you like to share some of the analytics solutions that you have worked on?
Prasad: We had worked on several large projects before embarking on our own product driven service (Hdidime.com). Can highlight a couple:
- Analysed ~100k insurance buyers’ data to detect and score ‘propensity to discontinue’ based on generated models from available data and knowledge of buyer behaviour
- Analysed ~340K e-commerce anonymous buyer patterns against catalog items, life-cycle, price, geographies and demographics for braod categorization for target marketing
AIM: Could you tell us about some important contemporary trends that you see emerging in the present analytics space across the globe?
Prasad: Hybrid of Data Lakes and Datawarehouses are being tried today to address BI requirements, which will continue for some time. JSON as representation of un-structured data is being stored alongside of structured Relational data. Systems like Apache DRILL also will play a role as common query layer over multiple types of BIG DATA systems such as Elastic Search, HBase, Hive, Cassandra etc.
More Advanced development shops will move to try to integrate entities from dynamic sources or dynamically computed soft data that result in higher level of automation and decision support, which will remain a key differentiator among leaders in their respective industries. These capabilities are possible with Semantic Technology and the Smart Big Data Lakes.
AIM: What are the most significant challenges you see in the Analytics space?
Prasad: In my view dearth of skills/ know-how / Best Practices are the weakest link and pose biggest challenges.
- Technology is accessible and increasingly data collection and applying complex pipelines of algorithmic processing (like ML, Graph, Text & Language, Image, Voice processing etc.) are likely no longer the differentiators among competitors. But, instead challenge would be to assemble the teams/skills/know-how that can leverage the easily available technology.
- Security, Provenance, Governance will remain big challenge
- Substantiating assumptions, data selection, hypothesis and confirming ML models with verified knowledge and mathematical basis will remain challenge – skills shortage will play a big role here too
- Shortage of Information Modelling skills will be a big challenge as well
Try deep learning using MATLAB