Pivotal Greenplum Platform Extension Framework 5.x Release Notes
The Pivotal Greenplum Platform Extension Framework (PXF) is included in the Pivotal Greenplum Database distribution in Greenplum versions including and older than 5.28 and 6.11. PXF for Redhat/CentOS and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with PXF version 5.13.0. You may need to download and install the PXF package to obtain the most recent version of this component.
The independent PXF distribution is compatible with these operating system platform and Greenplum versions:
|OS Version||Greenplum Version|
|RHEL 7.x, CentOS 7.x||5.21.2+, 6.x|
|RHEL 6.x, CentOS 6.x||5.21.2+|
cURLand instead loads the system-provided library. PXF requires
cURLversion 7.29.0 or newer. The officially-supported
cURLfor the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.
PXF is compatible with these Java and Hadoop component versions:
|PXF Version||Java Versions||Hadoop Versions||Hive Server Versions||HBase Server Version|
|5.15.x, 5.14, 5.13||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
Release Date: September 11, 2020
PXF 5.15.1 includes these changes:
- PXF bundles a new version of Tomcat, 7.0.105.
- PXF improves the performance of Parquet write operations (see Resolved Issue 30788, 30779) by:
- No longer splitting files that are over 128MB in size.
- Bundling Parquet version 1.11.1 libraries.
- Providing a new
ENABLE_DICTIONARYoption to enable/disable dictionary encoding when PXF writes Parquet data.
- Using the Parquet logical
int32types when writing dates. See Resolved Issue 174433819.
- When dictionary-encoding is enabled, the default
DICTIONARY_PAGE_SIZEthat PXF uses when writing Parquet data is now
1 * 1024 * 1024(it was previously
1 * 512 * 1024).
- PXF provides integrated native library registration support by exposing the new user configuration directory
$PXF_CONF/lib/nativeand a template for setting the
LD_LIBRARY_PATHoption. See Resolved Issue 264 and Registering PXF Library Dependencies.
PXF 5.15.1 resolves these issues:
|264||Resolves an issue where it was not clear how to register a native library with PXF. PXF now provides integrated native library registration support and related documentation.|
|30788, 30779||Improves PXF performance when writing Parquet data by not splitting files larger than 128MB, using newer parquet libraries, and exposing a new
|174433819||Resolves an issue where PXF used the
Release Date: August 25, 2020
PXF 5.15.0 includes these new and changed features:
- PXF bundles the
opencsvlibrary to satisfy a missing transitive dependency that is required when PXF reads Hive tables created with the
- PXF bundles newer
- PXF supports
xzcompression when reading from or writing to Avro files.
- PXF introduces a new option named
SKIP_HEADER_COUNT=<N>that you can use to instruct PXF to skip the first
Nlines in the first split of a text file.
- PXF includes improvements to Hive error handling and error surfacing.
- PXF no longer restricts operations using
bzip2compression to a single thread.
- PXF 5.15.0 deprecates and ignores the
THREAD-SAFEcustom option setting. All query and write operations on a PXF external table are now always thread-safe.
PXF 5.15.0 resolves these issues:
|30788||Resolves a PXF performance degradation issue that was encountered when writing very wide (greater than 1MB) rows.|
|30787||PXF did not surface a meaningful error when it encountered a problem accessing Hive 1.x. This issue is resolved.|
|30767||There was no way to instruct PXF to skip reading one or more lines at the beginning of a text file. This issue is resolved; PXF now exposes the
Release Date: July 7, 2020
PXF 5.14.0 includes these new and changed features:
- PXF supports the
snappycompression codecs when writing Avro data to an external data store. By default, PXF now compresses all Avro data with the
deflatecodec before writing it to the external store.
- Before writing Avro data, PXF converts
smallint-type columns to the
intdata type. You must specify an
int-type column in an external table definition to read this data.
PXF 5.14.0 resolves these issues:
|30708||PXF can now compress Avro data before writing it to an external data store.|
|30671||PXF fixes an issue where it did not correctly handle writing Avro data when the external table definition included a
Release Date: June 30, 2020
PXF 5.13.0 includes these new and changed features since PXF 5.12.0:
- PXF 5.13.0 is the first standalone release of PXF for RedHat/CentOS that is distributed separately from Greenplum Database.
PXF 5.13.0 resolves these issues:
|364||PXF fixes an issue where it did not correctly read from an external table when the
|30640||The use of the
Deprecated features may be removed in a future major release of PXF. PXF version 5.x deprecates:
THREAD-SAFEcustom option setting. All query and write operations are thread-safe (deprecated since PXF version 5.15.0).
PXF_KEYTABsettings in the
pxf-env.shfile. You can use the
pxf-site.xmlfile to configure Kerberos and impersonation settings for your new Hadoop server configurations (deprecated since PXF version 5.10.0).
pxf.impersonation.jdbcproperty setting in the
jdbc-site.xmlfile. You can use the
pxf.service.user.impersonationproperty to configure user impersonation for a new JDBC server configuration (deprecated since PXF version 5.10.0).
- The HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since PXF version 5.0.1). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.
PXF 5.x has these known issues and limitations:
|168957894||The PXF Hive Connector does not support using the
Workaround: Use the PXF JDBC Connector to access Hive.