Pivotal Greenplum 6.1 Release Notes

Pivotal Greenplum 6.1 Release Notes

Note: Greenplum Stream Server (GPSS) and Greenplum-Kafka Integration users: Do not upgrade to Greenplum Database 6.1 if you plan to re-submit Kafka load jobs that you initiated with GPSS in Greenplum 6.0.x. Due to a regression, GPSS may load duplicate Kafka messages into Greenplum. Refer to Known Issues and Limitations for more information.

This document contains pertinent release information about Pivotal Greenplum Database 6.1 releases. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.

Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 6 is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Release 6.1.0

Release Date: 2019-11-1

Pivotal Greenplum 6.1.0 is a minor release that includes new features and resolves several issues.

New Features

Greenplum Database 6.1.0 includes these new features:

  • Greenplum Stream Server 1.3 is included, which introduces new features and bug fixes.
    Note: Greenplum Stream Server (GPSS) and Greenplum-Kafka Integration users: Do not upgrade to Greenplum Database 6.1 if you plan to re-submit Kafka load jobs that you initiated with GPSS in Greenplum 6.0.x. Due to a regression, GPSS may load duplicate Kafka messages into Greenplum. Refer to Known Issues and Limitations for more information.

    New GPSS features include:

    • GPSS now supports log rotation, utilizing a mechanism that you can easily integrate with the Linux logrotate system. See Managing GPSS Log Files for more information.
    • GPSS has added the new INPUT:FILTER load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into Greenplum Database.
    • GPSS displays job progress by partition when you provide the --partition flag to the gpsscli progress command.
    • GPSS enables you to load Kafka data that was emitted since a specific timestamp into Greenplum Database. To use this feature, you provide the --force-reset-timestamp flag when you run gpsscli load, gpsscli start, or gpkafka load.
    • GPSS now supports update and merge operations on data stored in a Greenplum Database table. The load configuration file accepts MODE, MATCH_COLUMNS, UPDATE_COLUMNS, and UPDATE_CONDITION property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Greenplum Stream Server provides an example merge scenario.
    • GPSS supports Kerberos authentication to both Kafka and Greenplum Database.
    • GPSS supports SSL encryption between GPSS and Kafka.
    • GPSS supports SSL encryption on the data channel between GPSS and Greenplum Database.
  • The DataDirect JDBC and ODBC drivers were updated to versions (F000450.U000214) and 07.16.0334 (B0510, U0363), respectively.

    The DataDirect JDBC driver introduces support for the prepareThreshold connection parameter, which specifies the number of prepared statement executions that can be performed before the driver switches to using server-side prepared statements. This parameter defaults to 0, which preserves the earlier driver behavior of always using server-side prepare for prepared statements. Set a number greater than 1 to set a threshold after which server-side prepare is used.

    Note: ExecuteBatch() always uses server-side prepare for prepared statements. This matches the behavior of the Postgres open source driver.
    When the prepareThreshold value is greater than 1, parameterized operations do not send any SQL prepare calls with connection.prepareStatement(). The driver instead sends the query all at once, at execution time. Because of this limitation, the driver must determine the type of every column using the JDBC API before sending the query to the server. This determination works for many data types, but does not work for the following types that could be mapped to multiple Greenplum data types:
    • JSON

    You must set prepareThreshold to 0 before using parameterized operations with any of the above types. Examine the ResultSetMetaData object in advance to determine if any of the above types are used in a query. Also keep in mind that GPORCA does not support prepared statements that have parameterized values, and will fall back to using the Postgres Planner.

    See PrepareThreshold in the DataDirect documentation.

Resolved Issues

Pivotal Greenplum 6.1.0 is a minor release that resolves these issues:

8804 - Server
In some cases, running the EXPLAIN ANALYZE command on a sorted query in utility mode would cause the segment to crash. This issue is fixed. Greenplum Database no longer crashes in this situation.
8636 - Server
Some users encountered Error: unrecognized parameter "appendoptimized" while creating a partitioned table that specified the appendoptimized=true storage parameter. This issue is fixed; the Greenplum Database server now properly recognizes the appendoptimized parameter when it is specified on partition table creation.
26225 - gpcheckcat
The gpcheckcat utility failed to generate a summary report if there was an orphan TOAST table entry in one of the segments. This is fixed. The string "N/A" is reported when there is no relation OID to report.
29580 - Management and Monitoring
During Greenplum Database startup, an extra empty log file was produced ahead of the current date while performing time-based rotation of log files. For example, if Greenplum started at midnight September 2nd, two log files were generated, gpdb-2019-09-02_000000.csv and gpdb-2019-09-03_000000.csv. This issue has now been fixed.
29984 - Server
During startup, idle query executor (QE) processes can commit up to 16MB of memory each, but they are not tracked by the Linux virtual memory tracker. In a worst-case scenario, these idle processes could trigger OOM errors that were difficult to diagnose. To prevent these situations, Greenplum now hard-codes a startup memory cost to account for untracked QE processes.
30112 - Query Optimizer
For some queries against partitioned tables that contain a large amount of data, GPORCA generated a sub-optimal query plan because of inaccurate cardinality estimation. This issue has been resolved. GPORCA cardinality estimation has been improved.
30183, 30184 - analyzedb
When running the analyzedb command with the --skip_root_stats option, the command could take a long time to finish when analyzing a partitioned table with many partitions due to how the root partition statistics were handled when the partitions were analyzed. This issue has been resolved. Now, only partition statistics are updated.
Note: GPORCA uses root partition statistics. If you use --skip_root_stats option, you should ensure that root partition statistics are up to date so that GPORCA does not produce inferior query plans due to stale root partition statistics.
30149 - Query Execution
A query might fail and return an error with the message invalid seek in sequential BufFile when the server configuration file gp_workfile_compression is on and the query spills to temporary workfiles. The error was caused due to an issue working with workfiles that contain compressed data. The issue has been resolved by correctly handling the compressed workfile data.
30150 - Query Execution
A query might fail and return with the message AssignTransactionId() called by Segment Reader process when the server configuration parameter temp_tablespaces is set. The error was cause by an internal locking and transaction ID issue. This issue has been resolved by removing the requirement to acquire the lock.
30160 - Query Optimizer
GPORCA might return incorrect results when a the query contains a join predicate where one side is distributed on a citext column, and the other is not. GPORCA did not use the correct hash when generating a plan that redistributes the citext column. Now Greenplum Database falls back to the Postgres Planner for the specified type of query.
30183 - analyzedb
The analyzedb command could take a long time to finish when analyzing a table with many partitions. The command's performance has been greatly improved by waiting to update the root partition statistics until all leaf partitions of a table have been analyzed.
164823612 - gpss
GPSS incorrectly treated Kafka jobs that specified the same Kafka topic and Greenplum output schema name and output table name, but different database names, as the same job. This issue has been resolved. GPSS now includes the Greenplum database name when constructing a job definition.
167997441 - gpss
GPSS did not save error data to the external table error log when it encountered an incorrectly-formatted JSON or Avro message. This issue has been fixed; invoking gp_read_error_log() on the external table now displays the offending data.
168130147 - gpss
In some situations, specifying the --force-reset-earliest flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the --force-reset-xxx flags outside of an offset mismatch scenario is discouraged.)
168393571 - Query Optimizer
Certain queries with btree indexes on Append Optimized (AO) tables were unnecessarily slow due to GPORCA selecting a scan with high transformation and cost impact. This issue has been fixed by improving GPORCA handling of btree type indexes.
168393645 - Query Optimizer
In some situations, a query ran slow because GPORCA did not produce an optimal plan when it encountered a null-rejecting predicate where an operand could be false or null, but not true. This issue is fixed; GPORCA now produces a more optimal plan when evaluating null-rejecting predicates for AND and OR operands.
168705484 - Query Optimizer
For certain queries with a UNION operator over a large number of children, GPORCA query optimization required a long time. This issue has been addressed by adding the ability to derive scalar properties on demand.
168707515 - Query Optimizer
Some queries in GPORCA were consuming more memory than necessary due to suboptimal memory tracking. This has been fixed by optimizing memory accounting inside GPORCA.
169081574 - Interconnect
Greenplum Database might generate a PANIC when the server configuration parameter gp_interconnect_type is TCP due to an issue with memory management during interconnect setup. The issue has been resolved by properly managing the internal interconnect object memory.
169117536 - Execution
Greenplum Database might generate a PANIC when the server configuration parameter log_min_messages is set to debug5. Greenplum Database did not properly handle a debug5 message correctly. The issue is resolved.
169198230 - Plan Cache
A prepared statement might run slow because a cost model issue prevented Greenplum Database from generating a direct dispatch plan for the statement. This issue is fixed. Greenplum Database now introduces non-direct dispatch cost into the cost model only for cached plans, and tries to use direct dispatch for prepared statements when possible.

Upgrading to Greenplum 6.1

Note: Greenplum 6 supports direct upgrades, using gpupgrade, from Greenplum 5.x releases to Greenplum 6.x. For more information, see the gpupgrade documentation. See also Migrating Data from Greenplum 4.3 or 5 for guidelines and considerations for migrating existing Greenplum data to Greenplum 6, using standard backup and restore procedures.

See Upgrading from an Earlier Greenplum 6 Release to upgrade your existing Greenplum 6.x software to Greenplum 6.1.

Deprecated Features

Deprecated features will be removed in a future major release of Greenplum Database. Pivotal Greenplum 6.x deprecates:

  • The server configuration parameter gp_ignore_error_table (deprecated since 6.0).

    To avoid a Greenplum Database syntax error, set the value of this parameter to true when you run applications that execute CREATE EXTERNAL TABLE or COPY commands that include the now removed Greenplum Database 4.3.x INTO ERROR TABLE clause.

  • Specifying => as an operator name in the CREATE OPERATOR command (deprecated since 6.0).
  • The Greenplum external table C API (deprecated since 6.0).

    Any developers using this API are encouraged to use the new Foreign Data Wrapper API in its place.

  • Commas placed between a SUBPARTITION TEMPLATE clause and its corresponding SUBPARTITION BY clause, and between consecutive SUBPARTITION BY clauses in a CREATE TABLE command (deprecated since 6.0).

    Using this undocumented syntax will generate a deprecation warning message.

  • The timestamp format YYYYMMDDHH24MISS (deprecated since 6.0).

    This format could not be parsed unambiguously in previous Greenplum Database releases, and is not supported in PostgreSQL 9.4.

  • The createlang and droplang utilities (deprecated since 6.0).
  • The pg_resqueue_status system view (deprecated since 6.0).

    Use the gp_toolkit.gp_resqueue_status view instead.

  • The GLOBAL and LOCAL modifiers when creating a temporary table with the CREATE TABLE and CREATE TABLE AS commands (deprecated since 6.0).

    These keywords are present for SQL standard compatibility, but have no effect in Greenplum Database.

  • The Greenplum Platform Extension Framework (PXF) HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since 5.16).

    Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.

  • Using WITH OIDS or oids=TRUE to assign an OID system column when creating or altering a table (deprecated since 6.0).
  • Allowing superusers to specify the SQL_ASCII encoding regardless of the locale settings (deprecated since 6.0).

    This choice may result in misbehavior of character-string functions when data that is not encoding-compatible with the locale is stored in the database.

  • The @@@ text search operator (deprecated since 6.0).

    This operator is currently a synonym for the @@ operator.

  • The unparenthesized syntax for option lists in the VACUUM command (deprecated since 6.0).

    This syntax requires that the options to the command be specified in a specific order.

  • The plain pgbouncer authentication type (auth_type = plain) (deprecated since 4.x).

Migrating Data to Greenplum 6

Note: Greenplum 6 supports direct upgrades, using gpupgrade, from Greenplum 5.x releases to Greenplum 6.x. For more information, see the gpupgrade documentation.

See Migrating Data from Greenplum 4.3 or 5 for guidelines and considerations for migrating existing Greenplum data to Greenplum 6, using standard backup and restore procedures.

Known Issues and Limitations

Pivotal Greenplum 6 has these limitations:

  • Upgrading a Greenplum Database 4 or 5 release, or Greenplum 6 Beta release, to Pivotal Greenplum 6 is not supported.
  • MADlib, GPText, and PostGIS are not yet provided for installation on Ubuntu systems.
  • Greenplum 6 is not supported for installation on DCA systems.
  • Greenplum for Kubernetes is not yet provided with this release.

The following table lists key known issues in Pivotal Greenplum 6.x.

Table 1. Key Known Issues in Pivotal Greenplum 6.x
Issue Category Description
10216 ALTER TABLE, ALTER DOMAIN In some cases, heap table data is lost when performing concurrent ALTER TABLE or ALTER DOMAIN commands where one command alters a table column and the other rewrites or redistributes the table data. For example, performing concurrent ALTER TABLE commands where one command changes a column data type from int to text might cause data loss. This issue might also occur when altering a table column during the data distribution phase of a Greenplum system expansion. Greenplum Database did not correctly capture the current state of the table during command execution.

This issue is resolved in Pivotal Greenplum 6.9.0.

N/A PXF Starting in 6.x, Greenplum does not bundle cURL and instead loads the system-provided library. PXF requires cURL version 7.29.0 or newer. The officially-supported cURL for the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.

Workaround: Upgrade the operating system of your Greenplum Database 6 hosts to CentOS 7+ or RHEL 7+, which provides a cURL version suitable to run PXF.

29703 Loading Data from External Tables Due to limitations in the Greenplum Database external table framework, Greenplum Database cannot log the following types of errors that it encounters while loading data:
  • data type parsing errors
  • unexpected value type errors
  • data type conversion errors
  • errors returned by native and user-defined functions
LOG ERRORS returns error information for data exceptions only. When it encounters a parsing error, Greenplum terminates the load job, but it cannot log and propagate the error back to the user via gp_read_error_log().

Workaround: Clean the input data before loading it into Greenplum Database.

170824967 gpfidsts For Greenplum Database 6.x, a command that accesses an external table that uses the gpfdists protocol fails if the external table does not use an IP address when specifying a host system in the LOCATION clause of the external table definition.
169807372 Greenplum Stream Server GPSS version 1.3.0, shipped with Greenplum 6.1 and the Greenplum 6.1 Clients Packages, does not recognize internal history tables that were created with GPSS v1.2.6 and earlier. If you re-submit a load job that was originally initiated with the GPSS from a Greenplum Database 6.0.x distribution or Greenplum 6.0.x Clients Package, GPSS will read Kafka messages starting from the earliest available offset in the topic. This may cause GPSS to load duplicate messages into Greenplum Database.

Workaround: Do not upgrade to Greenplum Database 6.1; wait for a Greenplum release that includes GPSS v1.3.1 or later.

169806983 Greenplum Stream Server In some cases, reading from Kafka using the default MINIMAL_INTERVAL (0 seconds) causes GPSS to consume a large amount of CPU resources, even when no new messages exist in the Kafka topic.

Workaround: Specify a MINIMAL_INTERVAL in the load configuration YAML file when you submit the job; for example, specify a value of 2000 (2 seconds) or 10000 (10 seconds).

169200795 Greenplum Stream Server When loading Kafka data into Greenplum Database in UPDATE and MERGE modes, GPSS requires that a MAPPING exist for each column name identified in the MATCH_COLUMNS and UPDATE_COLUMNS lists.
168548176 gpbackup When using gpbackup to back up a Greenplum Database 5.7.1 or earlier 5.x release with resource groups enabled, gpbackup returns a column not found error for t6.value AS memoryauditor.
164791118 PL/R PL/R cannot be installed using the deprecated createlang utility, and displays the error:
createlang: language installation failed: ERROR:  
no schema has been selected to create in
Workaround: Use CREATE EXTENSION to install PL/R, as described in the documentation.
170202002 Greenplum-Kafka Integration Updating the METADATA:SCHEMA property and restarting a previously-run load job could cause gpkafka to re-read Kafka messages published to the topic, and load duplicate messages into Greenplum Database.
30437 JDBC Driver JDBC cached query plans do not store the partition selector parameter that is required for performing partition elimination. If you create a JDBC prepared statement that operates against a partitioned table, partition elimination is performed the first 5 times the query is executed. After that point, the JDBC driver may choose to cache the query plan, in which case partition elimination is no longer performed and the query may suffer from extremely degraded performance.

Workaround: Use the DataDirect JDBC driver version (F000450.U000214) or later, introduced in Greenplum 6.1, and set the prepareThreshold connection parameter to a very large value. For example: jdbc:Pivotal:greenplum://<ip>:<port>;DatabaseName=<name>;prepareThreshold=1000000000 See PrepareThreshold in the DataDirect documentation.

N/A Greenplum Client/Load Tools on Windows The Greenplum Database client and load tools on Windows have not been tested with Active Directory Kerberos authentication.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 6.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Integration
    • Greenplum Stream Server
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.