What is a data pipeline?

Data pipeline, or pipeline, is a series of data processing steps. First, Data is ingested at the beginning of the pipeline. Then, there are a series of steps where the output of one step is the input of the next one. This continues until the pipeline is complete. The steps of a pipeline are often executed in a parallel or time-sliced fashion.

Data pipelines consist of three key elements: a source, a processing step or steps, and a destination. The source may be a database, an application, or a cloud. The output may be data consumers like a machine learning or data visualization algorithm or even another database.

Data pipelines enable the flow of data from, for example, an application to a data warehouse, a data lake to an analytics database, or into a payment processing system.

Common processing steps in data pipelines include data transformation, augmentation, enrichment, filtering, grouping, aggregating, and the running of algorithms against that data.