Accessing HDFS and Hive Data with PXF

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

Accessing HDFS and Hive Data with PXF

Data managed by your organization may already reside in external sources. The Greenplum Database PXF Extension Framework (PXF) provides access to this external data via built-in connectors that map an external data source to a Greenplum Database table definition.

PXF is installed with HDFS and Hive connectors. These connectors enable you to read external HDFS file system and Hive table data stored in text, Avro, RCFile, Parquet, SequenceFile, and ORC formats.

Note: PXF does not currently support filter predicate pushdown in the HDFS and Hive connectors.

The PXF Extension Framework includes a protocol C library and a Java service. After you configure and initialize PXF, you start a single PXF JVM process on each Greenplum Database segment host. This long-running process concurrently serves multiple query requests.

For detailed information about the architecture of and using the PXF Extension Framework, refer to the Using PXF with External Data documentation.