Tanzu Greenplum Platform Extension Framework 5.x Release Notes
The Tanzu Greenplum Platform Extension Framework (PXF) is included in the Tanzu Greenplum distribution in Greenplum versions including and older than 5.28 and 6.11. PXF for Redhat/CentOS and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with PXF version 5.13.0.
You may need to download and install the PXF package to obtain the most recent version of this component.
The independent PXF distribution is compatible with these operating system platform and Greenplum versions:
|OS Version||Greenplum Version|
|RHEL 6.x, CentOS 6.x||5.21.2+|
|RHEL 7.x, CentOS 7.x||5.21.2+, 6.x|
|OEL 7.x, Ubuntu 18.04 LTS||6.x|
cURLand instead loads the system-provided library. PXF requires
cURLversion 7.29.0 or newer. The officially-supported
cURLfor the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.
PXF is compatible with these Java and Hadoop component versions:
|PXF Version||Java Versions||Hadoop Versions||Hive Server Versions||HBase Server Version|
|5.16.x, 5.15.x, 5.14, 5.13||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
Release Date: September 1, 2021
PXF 5.16.4 resolves this issue:
Release Date: August 20, 2021
PXF introduces a new property to the pxf-site.xml per-server configuration file to resolve issue 31657. PXF uses this property,
pxf.sasl.connection.retries, to specify the maximum number of times that it retries a SASL connection request to an external data source after a refused connection returns a
GSS initiate failed error.
PXF 5.16.3 resolves this issue:
|31657||Queries on an external table intermittently failed in some Kerberos-secured environments because the Hadoop NameNode erroneously detected a replay attack during Kerberos authentication. This issue is resolved by PR-681.)|
You can determine if this issue is the cause of your external table error when accessing Hadoop as follows:
Verify that you are accessing a Kerberos-secured Hadooop environment.
Examine the error message returned by the client and confirm that it contains this text:
org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
catalina.<date>.logon the segment host that reported the error and confirm that it contains this error message:
<date> com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
Inspect the Hadoop NameNode log(s) and confirm that it contains this message:
<date> INFO org.apache.hadoop.ipc.Server: Socket Reader #1 for port 8020: readAndProcess from client <address> threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34))]]
If you can confirm each point above, this issue is indeed the cause of your error. You must upgrade to PXF 5.16.3, and optionally perform certain configuration tasks:
(Required) Upgrade to PXF version 5.16.3.
(Recommended) Configure PXF to use a host-specific Kerberos principal for each segment host. If you specify the following
pxf.service.kerberos.principalproperty setting in the PXF server’s
pxf-site.xmlfile, PXF automatically replaces
_HOSTwith the FQDN of the segment host:
<property> <name>pxf.service.kerberos.principal</name> <value>gpadmin/_HOST@REALM.COM</value> </property>
(If you update
pxf-site.xml, be sure to
pxf cluster syncto propagate the changes to the Greenplum Database standby and all segment hosts in the cluster.)
Release Date: February 25, 2021
PXF 5.16.2 resolves these issues:
|31253||Resolves an issue where
|31219||Resolves an issue where an insert from a PXF external table defined with a
|176987367||Performance improvement that avoids a reverse DNS lookup.|
Release Date: February 5, 2021
PXF 5.16.1 resolves this issue:
|31105||Resolves an issue where PXF returned improperly formatted data from Hive when it accessed a nested struct that contained strings with escaped special characters. PXF now correctly resolves and escapes strings in nested complex data types that it reads from Hive.|
Release Date: November 6, 2020
PXF 5.16.0 includes these new and changed features:
- Version 5.16 is the first standalone PXF release that includes a
.debinstallation package for Ubuntu 18.04 LTS systems.
- PXF adds support for reading files from, and writing files to, a network file system mounted on each Greenplum Database segment host. See Accessing Files on a Network File System with PXF for prerequisites, configuration, and usage information for this feature.
- PXF disallows specifying relative paths and environment variables in the
CREATE EXTERNAL TABLE LOCATIONclause file path.
- PXF adds the new
pxf.fs.basePathproperty to the PXF
pxf-site.xmltemplate file. The property is commented out by default; you set this property to specify the base path from which PXF accesses the path that you specify in the
CREATE EXTERNAL TABLE LOCATIONclause. See About the pxf.fs.basePath Property for more information.
pxf.service.user.nameproperty in the PXF
pxf-site.xmltemplate file is now commented out by default, and the file now includes an enhanced description of the property. Additionally, the documentation has been enhanced to provide specific use cases and configuration scenarios for access to secured and non-secured Hadoop clusters. See secured cluster use cases and non-secured cluster use cases.
- The default value of the
jdbc.pool.property.maximumPoolSizeproperty in the
jdbc-site.xmltemplate file was increased from
15to better support out-of-the-box reads and writes of large amounts of data.
- PXF now supports reading from, and writing to, external tables altered by dropping columns.
- PXF adds the new
--skip-registerflag to the
pxf [cluster] initcommand. This flag instructs PXF to skip the initialization step that copies the PXF extension files to the Greenplum installation on the host(s).
- The PXF
Hiveprofile now supports column projection and predicate pushdown when you use it to access Hive tables
STORED AS Parquet.
- Column projection and predicate pushdown are now enabled by default when you use PXF and the
Hiveprofile to access Hive tables
- PXF now supports using the
Hiveprofile to read data from a Hive table
STORED AS Parquetwhen the underlying Parquet file(s) has a different column order than the defining Hive table.
- PXF now reduces and optimizes its memory usage during fragmentation for
PXF 5.16 resolves these issues:
|459||Resolves an issue where PXF failed to write NULL
|30987||PXF generated a buffer that exceeded a 1GB limit during fragmentation of a user query that specified the
|30953, 30855||Resolves an issue where PXF failed to release some resources when it encountered an error during filter execution.|
|30930||The use of the
|30905||Resolves an issue where PXF returned an error when it was used to access a readable external table in which one or more columns had been dropped. PXF now supports both reading from and writing to external tables that have had columns dropped.|
|30638||Resolves an issue where PXF failed to read a Hive table
Release Date: September 11, 2020
PXF 5.15.1 includes these changes:
- PXF bundles a new version of Tomcat, 7.0.105.
- PXF improves the performance of Parquet write operations (see Resolved Issue 30788, 30779) by:
- No longer splitting files that are over 128MB in size.
- Bundling Parquet version 1.11.1 libraries.
- Providing a new
ENABLE_DICTIONARYoption to enable/disable dictionary encoding when PXF writes Parquet data.
- Using the Parquet logical
int32types when writing dates. See Resolved Issue 174433819.
- When dictionary-encoding is enabled, the default
DICTIONARY_PAGE_SIZEthat PXF uses when writing Parquet data is now
1 * 1024 * 1024(it was previously
1 * 512 * 1024).
- PXF provides integrated native library registration support by exposing the new user configuration directory
$PXF_CONF/lib/nativeand a template for setting the
LD_LIBRARY_PATHoption. See Resolved Issue 264 and Registering PXF Library Dependencies.
PXF 5.15.1 resolves these issues:
|264||Resolves an issue where it was not clear how to register a native library with PXF. PXF now provides integrated native library registration support and related documentation.|
|30788, 30779||Improves PXF performance when writing Parquet data by not splitting files larger than 128MB, using newer parquet libraries, and exposing a new
|174433819||Resolves an issue where PXF used the
Release Date: August 25, 2020
PXF 5.15.0 includes these new and changed features:
- PXF bundles the
opencsvlibrary to satisfy a missing transitive dependency that is required when PXF reads Hive tables created with the
- PXF bundles newer
- PXF supports
xzcompression when reading from or writing to Avro files.
- PXF introduces a new option named
SKIP_HEADER_COUNT=<N>that you can use to instruct PXF to skip the first
Nlines in the first split of a text file.
- PXF includes improvements to Hive error handling and error surfacing.
- PXF no longer restricts operations using
bzip2compression to a single thread.
- PXF 5.15.0 deprecates and ignores the
THREAD-SAFEcustom option setting. All query and write operations on a PXF external table are now always thread-safe.
PXF 5.15.0 resolves these issues:
|30788||Resolves a PXF performance degradation issue that was encountered when writing very wide (greater than 1MB) rows.|
|30787||PXF did not surface a meaningful error when it encountered a problem accessing Hive 1.x. This issue is resolved.|
|30767||There was no way to instruct PXF to skip reading one or more lines at the beginning of a text file. This issue is resolved; PXF now exposes the
Release Date: July 7, 2020
PXF 5.14.0 includes these new and changed features:
- PXF supports the
snappycompression codecs when writing Avro data to an external data store. By default, PXF now compresses all Avro data with the
deflatecodec before writing it to the external store.
- Before writing Avro data, PXF converts
smallint-type columns to the
intdata type. You must specify an
int-type column in an external table definition to read this data.
PXF 5.14.0 resolves these issues:
|30708||PXF can now compress Avro data before writing it to an external data store.|
|30671||PXF fixes an issue where it did not correctly handle writing Avro data when the external table definition included a
Release Date: June 30, 2020
PXF 5.13.0 includes these new and changed features since PXF 5.12.0:
- PXF 5.13.0 is the first standalone release of PXF for RedHat/CentOS that is distributed separately from Greenplum Database.
PXF 5.13.0 resolves these issues:
|364||PXF fixes an issue where it did not correctly read from an external table when the
|30640||The use of the
Deprecated features may be removed in a future major release of PXF. PXF version 5.x deprecates:
THREAD-SAFEcustom option setting. All query and write operations are thread-safe (deprecated since PXF version 5.15.0).
PXF_KEYTABsettings in the
pxf-env.shfile. You can use the
pxf-site.xmlfile to configure Kerberos and impersonation settings for your new Hadoop server configurations (deprecated since PXF version 5.10.0).
pxf.impersonation.jdbcproperty setting in the
jdbc-site.xmlfile. You can use the
pxf.service.user.impersonationproperty to configure user impersonation for a new JDBC server configuration (deprecated since PXF version 5.10.0).
- The HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since PXF version 5.0.1). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.
PXF 5.x has these known issues and limitations:
|168957894||The PXF Hive Connector does not support using the
Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables.