In an interesting talk delivered by Seth DeLand and Amit Doshi at the recently concluded MATLAB expo held at Bangalore, there was a lot to chew on how MATLAB is pegging its future growth on Big Data and Machine learning.
MATLAB, essentially a language of technical computing has been a long time favorite amongst the engineers globally, where they use it to analyze and design the products. But it’s a little less known fact that MATLAB can also be used for machine learning, signal processing, image processing, computer vision & control design; competing with industry giants like SAS & R.
The talk that began with understanding the four key levels of adoption of analytics— i.e. descriptive, diagnostic, predictive and prescriptive, and how customers make use of it to turn large volumes of data into an effective use, it led us all the way through how big data and machine learning can be performed using MATLAB. Amit Doshi stated that analytics, big data and machine learning are widely used aspects in almost all the industries, which he exemplified using Gas Natural Fenosa, a Spain based integrated gas and electricity company. “It uses analytics to forecast the power consumption and machine learning to develop the price simulation”, he said.
Seth DeLand, Product Marketing Manager- Data Analytics and Amit Doshi, Senior Application Engineer – Data Analytics at Mathworks illustrated working with big data and machine learning in MATLAB with a use case of “Taxi Fare Predictor Web App”.
Working with Big Data in MATLAB-
Initiating the talk on the same, DeLand said “let’s follow the objective of creating a model to predict cost of taxi ride in the New York city. With about 21 GB of data, let’s try to find out how much it does so using MATLAB.”
For fetching an estimated fare for various locations, these are the steps they followed:
The first was, Data Access– With a variety of file formats, it is a big challenge to sample the data and the quality is compromised in most of the cases. “80% of the time is spent in data cleaning”, said DeLand. “Whereas with MATLAB, you can download the data from an open source in just four lines of code, making it more simple, safe and it can be run independently”, he added.
The second step is to get the data in MATLAB from hard drive. “The Datastore feature of MATLAB allows to establish a connection between computer and other locations such as Hadoop, MySQL, etc. It can give a preview of first few observations to make sure that you are accessing the right data and in right format”, said Doshi.
The third step is processing Big Data. “The Tall arrays feature allows the new data type that doesn’t fit into memory. It looks like normal MATLAB array that supports numeric types, tables, etc.”, said Doshi. The biggest advantage is that it doesn’t require any coding and processes a lot of data chunk by using parallel computing inherently. It reduces the time of processing the data quite significantly.
The next step is to visualize big data and find out if the data is making any sense. “It can find an information of how much distance has cab driver covered etc. in the form of a histogram or scattered graph”, said DeLand.
This is followed by developing predictive analytics. Machine learning uses data and program to perform tasks. “It can collect lot of data and then train machine learning models for decision making”, notes Doshi. “Supervised learning develop predictive model based on input and output”, he adds.
In a nutshell, MATLAB provides a single, high-performance environment for working with big data. It is easy as it uses familiar MATLAB functions and syntax to work with big datasets, is convenient, as it can work with big data storage systems and is scalable i.e. it allows to use the processing platform that suits customer need, without rewriting algorithm.
Working with Machine learning-
Machine learning at MATLAB come into play as engineers and data scientists work with large amount data as discussed above, which can be in the form of sensor, image video, telemetry, databases, and more. Machine Learning can be used to find the patterns in data and build models that predict future outcomes based on historical data.
MATLAB, as it is widely known, gives an access to prebuilt functions, extensive toolbox, regression, clustering, classification and more.
Giving an insight on Regression learner Doshi noted “With this user can build models to predict continuous data and can make predictions about the future data points.”
Highlighting the Classification learner DeLand said “It builds model to classify data into different categories. This can help to accurately analyze and visualize the data. It allows classification for application such as credit scoring, tumor detection, and face recognition is allowed”.
On a concluding note-
With the above use case, it becomes evident that MATLAB come with complete set of statistics and machine learning functionality along with advanced methods such as nonlinear optimization and system identification are a lot of features that come along.
With its ability to access data from a wide variety of sources and formats such as databases, financial data servers, IoT devices, spreadsheet, XML, etc., it allows a greater functionality.