Last updated February 20, 2019
In AI Origins & Evolution

Microsoft’s ADF Automates Data Movement & Transformation Without Coding

Published on February 20, 2019

by Disha Misal

In the world of data and technology, unorganised data ends up in relational, non-relational and other storage systems. But raw data does not have a content appropriate enough to provide with relevant, important information that the people in the data science team can grasp and learn from.

Microsoft Azure Data Factory (ADF) is a cloud-based data integration platform to solve issues like this regarding data. It is managed by cloud service that’s built for complex data integration projects.

What Is ADF?

Data Flow is a feature of ADF that allows you to develop graphical data transformation logic that can be executed as activities within ADF pipelines. The objective of data flows is to provide a visual experience without needing the need of writing a code. It allows the development of graphical data transformation logic that can be executed as activities within ADF pipelines. ADF can handle large data in rapid succession and can handle all the code translation, spark optimization and execution of transformation in Data Flows.

The important feature is that the user does not have to write any line of code. An entire business logic can be designed from scratch using Data Flow UX and appropriate code in Scala will be prepared. Behind the scenes, the ADF JSON code is converted to the appropriate code in the Scala programming language. After the code, it is compiled and executed in Azure DataBricks. So the data science team gets enough time to engage in important contributions like data cleaning, aggregation, data preparation and build code-free dataflow pipelines.

ADF enables the creation of data-driven workflows for the purpose of data automation and transformation. It can be used to create and schedule data pipelines that can take data from different data stores. It can transform the data with the help of Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics and Azure Machine Learning services. It supports a variety of processing services like Data Lake Analytics and Hadoop.

No Need To Code

ADF uses Azure DataBricks as the compute for the data transformations built. It has activities to invoke Azure Databricks as a control flow component. These activities involve calling a Python file, a Juptyer Notebook or using some compiled Scala in a Jar file. These three options all requires the user to write either Python or Scala to process the data. With ADF data flow, the JSON output from the graphic ADF-DF user interface is used to write the Scala, which gets compiled into the Jar file and passed to Azure Databricks to execute as a job on a given cluster.

ADF Features

The V2 feature of ADF is a data integration tool. The tool is used in the cloud to provide coordination of both data movement and activity dispatch. With its data flow, ADF has become a genuine cloud replacement for SSIS. It has helped with an easy movement of massive amounts of data with Azure and has an on-premise data movement. It can dispatch activities for data transformation via scripting or using the custom mode.

Because no code is needed to be written, the user can can now perform data transformation, code-free, scaled-out on DataBricks, without leaving the ADF browser-based UI. Every data flow that you create are reusable entities that can be executed in many different pipelines and in multiple activities.

Advantages Of Data Flow

Data flow provides a GUI-dependant solution with no need of coding, which means that the user gets to build the solution by using drag-and-drop features of the ADF interface to perform data cleaning, data preparation and data aggregation.
Because of this feature, developing the ETL and ELT solutions will be easy to maintain.
The implementation of Spark in ADF dataflows allows for a high speed transformation run.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Disha Misal

Found a way to Data Science and AI though her fascination for Technology. Likes to read, watch football and has an enourmous amount affection for Astrophysics.

Open Source Libraries are Going Through Trust Issues

Ruff Emerges as the Fastest-Growing Python Linter Ever

Google Launches TensorFlow GNN 1.0 for Advanced Graph Neural Networks

8 Must-Know OCR Tools for Training AI/ML Models

Python Adds Support to JIT Compiler

How GPT-4 Fast-Tracked Novice Developers to Pros in Less Than a Year

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

Sukriti Gupta

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the