Introduction to GCP Dataflow

Harshini Komali
2 min readFeb 9, 2021

Google Cloud Dataflow is a data processing service in cloud which supports both batch and streaming data applications. It enables developers to integrate, prepare, process and analyze large volumes of data such as Web analytics, IOT analytics, Big Data analytics etc.

Stream and Batch data processing with Cloud Dataflow

Dataflow is an evolution of MapReduce, Google’s erstwhile programming paradigm. According to Google, Cloud Dataflow helps business in getting actionable insights from data with less operational overhead.

Key features of Dataflow:

  1. It is server less
  2. It is a fully managed service
  3. Horizontal autoscaling of worker resources to maximize resource utilization
  4. Fast and cost-effective
  5. Big data compatible
  6. Multi-functionality(A service which supports batch and streaming data efficiently)

How does Dataflow benefit the existing Google Cloud customers

Dataflow is designed to complement the rest of Google Cloud’s existing services. If you are using Google Big Query, Dataflow can be used to clean, prep and filter data before loading into Big Query. Dataflow can also be used to read from Big Query if you want to join the Big Query data with other sources. The data can be written back to Big Query.

How is data processed using Dataflow?

Developers need to build pipelines to consume data, transform it and produce the desired output. The Dataflow pipelines are built using Apache Beam SDK. Apache Beam is an open-source unified programming model to define both batch and streaming data processing pipelines. Cloud Dataflow is a distributed processing backend of Apache Beam to execute pipelines in Google Cloud Platform. The Apache Beam model provides useful abstractions that lets developers focus on the logic part of data processing rather than physical details. Dataflow fully takes care of the low-level distributed processing details such as coordinating individual workers, sharding datasets etc.

--

--

Harshini Komali

Just another Software Engineer working on making myself a better engineer than yesterday.