Last updated February 5, 2020
In AI Origins & Evolution

How GraphLab’s Framework Powers Parallel Computing For Machine Learning

Published on June 12, 2018
by Abhishek Sharma

The advances in computing hardware technology have paved way for easier implementations of machine learning and artificial intelligence applications to solve real-world problems. But most of these developments in computer architecture have focussed more on parallel scaling rather than frequency scaling. This technological digression is unfavorable for applications which has sequential ML algorithms. Therefore, this requires to have a general framework for ML models to work efficiently in the parallel computing environment.

In this article, we will discuss GraphLab (now called Turi), which is a distribution framework written in C++. It was originally developed by academics at Carnegie Mellon University for handling ML tasks, but now has extended to data mining applications as well. It provides a high-level programming interface for graphical ML algorithms, and is mostly suitable for sparse data and iterative algorithms. It also has a Python library to meet ML algorithmic requirements. It enables easy and efficient design of parallel ML algorithms by taking care of computation requirements, data consistency and scheduling.

Graphical ML Models: The Basis For Creating GraphLab

Carlos Guestrin, the founder of GraphLab, recalls that the platform was developed to speed up graphical ML model computations. These models were difficult to run on computation frameworks such as Hadoop which also took a really long time. Since the graphs were computation-intensive, they demanded a computation framework for ML algorithms, specifically that contained graphical data. Thus emerged GraphLab, a platform that it could address data as well as their computations simultaneously on a shared-memory architecture.

This is also made possible by providing a high-level data abstraction for users, which is evident in the data graphs. Hence, GraphLab obtains a fine balance between low-level abstractions such as POSIX threads, and high-level abstractions such as MapReduce models, without confusing ML experts with many intricate details. As mentioned earlier, GraphLab considers sparse data and iterative computations to achieve this balanced level of abstraction.

The GraphLab Data Model

According to the design developed by Carlos Guestrin and team at Carnegie Mellon University, the data model in GraphLab consists of a data graph and a shared data table. In the words of the researchers,

“The data graph G = (V, E) encodes both the problem specific sparse computational structure and directly modifiable program state. The user can associate arbitrary blocks of data (or parameters) with each vertex and directed edge in G. We denote the data associated with vertex v by D_v, and the data associated with edge (u → v) by D_u→v. In addition, we use (u → ∗) to represent the set of all outbound edges from u and (∗ → v) for inbound edges at v. To support globally shared state, GraphLab provides a shared data table (SDT) which is an associative map, T [Key] → Value, between keys and arbitrary blocks of data.”

The researchers make use of Loopy belief propagation coupled with Markov Random Fields (MRF) for GraphLab’s framework to demonstrate its functionality in line with ML.

The Structure For Developing A Parallel Model In GraphLab

Referring back to the design for executing parallel ML algorithms, GraphLab follows on the three key steps which are put forth below:

User-defined computation: GraphLab’s computation criteria stems from an ‘update function’ from the data, which is defined by the user. The computation can also be invoked through a sync mechanism that works with the graphical data.
Data consistency: GraphLab provides three data models to balance consistency and system performance. They are labelled as full consistency model, edge consistency model and vertex consistency model. Considering the scope or the extent of data [f(v)] for the project, the full consistency model permits execution of function along the vertex data on a graph. The edge consistency model focuses on the edges of the whereas the vertex consistency model focuses on the vertex. Therefore, it is essential that the appropriate model is thought-out for ML models with a parallel computing function. The consistency models are illustrated below.
Consistency models depiction (Image courtesy: Carlos Guestrin)
Scheduling: The ‘update schedule’ function takes care of the update functions applied to the vertices based on the consistency model, and is represented by a data structure called scheduler. This scheduler maps a dynamic list of vertex-functions combination and is executed in the GraphLab engine. The schedulers can be modified according to the ML algorithms, for example, in a residual neural network with backpropagation, there are task schedulers such as FIFO scheduler and Priority Schedulers that assigns ML tasks accordingly.

In order for a GraphLab program to execute successfully, these steps must be followed in a sequential order.

Conclusion

Parallel ML algorithms or models built with GraphLab provide a computing environment for parallel data structures. This means that much of the data structure features are yet to be fully realised in terms of their ML as well as computational potential. With GraphLab’s API, this can be achieved and ML professionals can easily import algorithms from other frameworks such as MapReduce.

Access all our open Survey & Awards Nomination forms in one place >>

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.

How GraphLab’s Framework Powers Parallel Computing For Machine Learning

Graphical ML Models: The Basis For Creating GraphLab

The GraphLab Data Model

The Structure For Developing A Parallel Model In GraphLab

Conclusion

Abhishek Sharma

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.