Registering PXF Library Dependencies
You use PXF to access data stored on external systems. Depending upon the external data store, this access may require that you install and/or configure additional components or services for the external data store.
PXF depends on JAR files and other configuration information provided by these additional components. The
$PXF_HOME/conf/pxf-private.classpath file identifies PXF internal JAR dependencies. In most cases, PXF manages the
pxf-private.classpath file, adding entries as necessary based on the connectors that you use.
Should you need to add additional JAR or native library dependencies, you must register these dependencies with PXF.
To add a JAR dependency for PXF, for example a MySQL driver JAR file, you must log in to the Greenplum Database master host, copy the JAR file to the PXF user configuration runtime library directory (
$PXF_CONF/lib), sync the PXF configuration to the Greenplum Database cluster, and then restart PXF on each segment host. For example:
$ ssh gpadmin@<gpmaster> gpadmin@gpmaster$ cp new_dependent_jar.jar $PXF_CONF/lib/ gpadmin@gpmaster$ pxf cluster sync gpadmin@gpmaster$ pxf cluster restart
PXF loads native libraries from the following directories in this order:
The directories that you specify in the
$PXF_CONF/conf/pxf-env.shuser configuration file
LD_LIBRARY_PATHsetting. Starting in PXF version 5.15.1, the
pxf-env.shfile includes this commented-out block:
# Additional native libraries to be loaded by PXF # export LD_LIBRARY_PATH=
The default PXF native library directory
The default Hadoop native library directory
When you register a native library dependency with PXF, for example Hadoop native libraries, you copy the native library to a location known to PXF or inform PXF of a custom location, and then you must synchronize and restart PXF.
You have three file location options when you register a native library with PXF:
- Copy the library to the default PXF native library directory,
$PXF_CONF/lib/native, on only the Greenplum Database master host. When you next synchronize PXF, PXF copies the native library to all hosts in the Greenplum cluster.
- Copy the library to the default Hadoop native library directory,
/usr/lib/hadoop/lib/native, on the Greenplum master, standby, and each segment host.
- Copy the library to the same, custom location on the Greenplum master, standby, and each segment host, and uncomment and add the directory to the
Copy the native library file to one of the following:
$PXF_CONF/lib/nativedirectory on the Greenplum Database master host. (You may need to create this directory.)
/usr/lib/hadoop/lib/nativedirectory on all Greenplum Database hosts.
- A user-defined location on all Greenplum Database hosts; note the file system location of the native library.
If you copied the native library to a custom location:
$PXF_CONF/conf/pxf-env.shfile in the editor of your choice, and uncomment the
# Additional native libraries to be loaded by PXF export LD_LIBRARY_PATH=
Specify the custom location in the
LD_LIBRARY_PATHsetting. For example, if you copied a library named
/usr/local/libon all Greenplum hosts, your
LD_LIBRARY_PATHsetting would look as follows:
Save the file and exit the editor.
Synchronize the PXF configuration from the Greenplum Database master host to the standby and segment hosts.
gpadmin@gpmaster$ pxf cluster sync
If you copied the native library to the
$PXF_CONF/lib/nativedirectory, this command copies the library to the same location on the Greenplum Database standby and segment hosts.
If you updated the
LD_LIBRARY_PATHsetting, this command copies the configuration change to the Greenplum Database standby and segment hosts.
Restart PXF on all segment hosts:
gpadmin@gpmaster$ pxf cluster restart