Real-Time Data Ingestion with Greenplum Database and Apache NiFi

VMware Tanzu Greenplum is a massively parallel processing database server specially designed to manage large scale analytic data warehouses and business intelligence workloads. Apache NiFi is a framework that provides an interactive user interface through which you create and manage automated dataflows between systems. The VMware Tanzu Greenplum Connector for Apache NiFi provides organizations a fast and simple way to build data ingestion pipelines for Greenplum Database, code-free.

You can use the web-based Apache NiFi user interface and built-in NiFi processors to set up a data pipeline that employs the Connector’s PutGreenplumRecord processor to load record-oriented data into Greenplum Database for subsequent analytics.

The Connector:

  • Utilizes the drag-and-drop-based Apache NiFi user interface for component and data pipeline configuration.
  • Supports CSV, Avro, Parquet, JSON, and XML input data formats using built-in NiFi Record Readers.
  • Converts NiFi records into Greenplum tuples.
  • Loads the tuples into Greenplum Database.

The Greenplum Connector for Apache NiFi uses the Greenplum Streaming Server to load data in parallel into Greenplum Database. This facilitates higher concurrency and throughput during data ingestion compared to a JDBC-based NiFi processor, with less load on the Greenplum Database master host.

Next Steps