Pivotal Greenplum 6.0 Release Notes

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 6.x documentation.

Pivotal Greenplum 6.0 Release Notes

This document contains pertinent release information about Pivotal Greenplum Database 6.0 releases. For previous versions of the release notes for Greenplum Database, go to Pivotal Greenplum Database Documentation. For information about Greenplum Database end of life, see Pivotal Greenplum Database end of life policy.

Pivotal Greenplum 6 software is available for download from the Pivotal Greenplum page on Pivotal Network.

Pivotal Greenplum 6 is based on the open source Greenplum Database project code.

Important: Pivotal Support does not provide support for open source versions of Greenplum Database. Only Pivotal Greenplum Database is supported by Pivotal Support.

Release 6.0.1

Release Date: 2019-10-11

Pivotal Greenplum 6.0.1 is a maintenance release that includes changed features and resolves several issues.

Changed Features

Greenplum Database 6.0.1 includes these changed features:

  • The default value for the server configuration parameter optimizer_use_gpdb_allocators has been changed to true. Now, as the default, GPORCA uses Greenplum Database memory management when executing queries instead of GPORCA-specific memory management. Greenplum Database memory management has several enhancements when compared to GPORCA-specific memory management. See optimizer_use_gpdb_allocators.
  • Writing parquet data using the PXF Hadoop and object store connectors is no longer considered a Beta feature in this release.

Resolved Issues

Pivotal Greenplum 6.0.1 is a maintenance release that resolves these issues:

29712 - Query Execution
Greenplum Database writes unnecessary could not unlink file log messages for spill files when gp_enable_query_metrics is on and log_min_messages is set to INFO. This issue has been resolved. Logging has been improved to not write the log message.
30058 - Query Execution
An internal EXPLAIN function, cdbexplain_localExecStats, operated under the assumption that it was always executed by a query dispatcher (QD) process. However, certain queries could generate plans where the function was executed by a query executor (QE) process. Running such queries with EXPLAIN ANALYZE would cause all segments to crash with segment faults, and error messages referencing cdbexplain_localExecStats. This problem has been resolved.
30094 - Resource Management
In some cases Greenplum Database generated a PANIC when a query was terminated and the query involved catalog tables. The PANIC was caused when a backend process did not properly clean up shared memory before exiting. This issue has been resolved. Now backend process memory management has been improved for the specified situation.
30098 - COPY
Greenplum Database generated a PANIC when a COPY command attempted to write to a catalog table. This issue has been resolved, now Greenplum Database returns an error.
30120 - GRANT
The command GRANT ALL ON ALL TABLES IN SCHEMA <schema> TO <role> caused a PANIC when tables are partitioned or inherited by child tables. This issue has been fixed.
30130 - gpexpand
The gpexpand utility might have failed with a Cannot allocate memory error when system memory or swap space is low. The error was due to an issue with a python process library. This has been resolved by updating the python library.
165660593 - Resource Groups
When resource groups are enabled, Greenplum Database might return an out of memory error when executing a SET or SHOW command, or when executing a query when the Greenplum Database server configuration parameter gp_resource_group_bypass is set to true. The error is due to an issue with resource group memory accounting. This issue has been resolved, resource group memory accounting has been improved for individual statements.
167847839 - gpconfig
The code dispatched from the master to segments to set a configuration parameter enclosed the value in single quotes. This did not handle values containing embedded single quotes, and parameters with the GUC_LIST_QUOTE flag, such as search_path, ended up having different values on the segments than on the master. For example,
SELECT set_config('search_path', 'my_schema,public', false);
was dispatched as
SET search_path TO 'my_schema,public';
instead of
SET search_path TO my_schema,public;
This is fixed. The set_config() call is now dispatched to the segments as well, passing the (quoted) arguments directly so that the same code runs on the master and segments.
Known Issue 167851039 - PXF
pxf cluster reset did not reset PXF configuration on the standby master. This issue is fixed; the command now resets PXF configuration on all hosts in the Greenplum cluster, including the standby master.
Known Issue 167851065 - PXF
PXF allowed you to initialize PXF without setting PXF_CONF. This issue has been resolved. PXF now correctly checks for this setting before continuing with initialization.
Known Issue 167948506 - PXF
In some cases, accessing an S3 object store with PXF failed when PXF was configured for Kerberized Hadoop. This issue is fixed. PXF now handles token renewal appropriately on concurrent access to S3 and a Kerberized Hadoop cluster.
While not required, if you have previously instituted any of the workarounds identified for this issue that are described in Known Issues and Limitations, you may consider reverting the s3-site.xml modifications or removing the yarn-site.xml file from your S3 server directory.
168167337- gpinitsystem
When running gpinitsystem to initialize a Greenplum Database system with mirroring enabled, the utility configured the pg_hba.conf file in a manner that did not permit incremental recovery of the primary segment instances. This issue is fixed in release 6.0.1. The gpexpand, gprecoverseg, and gpaddmirrors utilities were also updated to ensure that primary and mirror segments always have compatible pg_hba.conf entries in place after performing their respective operations.
Known Issue 168271005 - PXF
The PXF JDBC Connector failed to access any database when PXF was configured to use the MapR Hadoop distribution, and MapR libraries were present in $PXF_CONF/lib. This issue has been resolved. A PXF MapR server configuration no longer affects JDBC access using PXF.
168759361 - psql
Greenplum Database was previously built and dynamically linked against libedit as a required dependency. However, the version of libedit that is available on Redhat 7 is not compatible with features such as tab-completion in psql. The readline library provides the same functionality as libedit, and all versions available across all supported platforms are compatible with the desired features. Therefore, instead of dynamically linking to libedit, Greenplum Database is now being built and dynamically linked to readline. This changes the required dependencies for the installers from libedit to readline.

Upgrading to Greenplum 6.0.1

Note: Greenplum 6 supports direct upgrades, using gpupgrade, from Greenplum 5.x releases to Greenplum 6.x. For more information, see the gpupgrade documentation. See also Migrating Data from Greenplum 4.3 or 5 for guidelines and considerations for migrating existing Greenplum data to Greenplum 6, using standard backup and restore procedures.

See Upgrading from an Earlier Greenplum 6 Release to upgrade your existing Greenplum 6.x software to Greenplum 6.0.1.

Release 6.0.0

Release Date: 2019-09-03

Pivotal Greenplum 6.0.0 is a major new release of Greenplum that includes new and changed features.

New Features

PostgreSQL Core Features

Pivotal Greenplum 6 incorporates several new features from PostgreSQL versions 8.4 through version 9.4.

INTERVAL Data Type Handling
PostgreSQL 8.4 improves the parsing of INTERVAL literals to align with SQL standards. This changes the output for queries that use INTERVAL labels between versions 5.x and 6.x. For example:
$ psql
psql (8.3.23)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
(1 row)
``` sql
$ psql
psql (9.2beta2)
Type "help" for help.

gpadmin=# select INTERVAL '1' YEAR;
 1 year
(1 row)

See Date/Time Types for more information.

Additional PostgreSQL Features
Greenplum Database 6.0 also includes these features and changes from PostgreSQL:
  • Support for user-defined I/O conversion casts. (PostgreSQL 8.4).
  • Support for column-level privileges (PostgreSQL 8.4).
  • The pg_db_role_setting catalog table, which provides support for setting server configuration parameters for a specific database and role combination (PostgreSQL 9.0).
  • Values in the relkind column of the pg_class catalog table were changed to match entries in PostgreSQL 9.3.
  • Support for GIN index method (PostgreSQL 8.3).
  • Postgres Planner support for the SP-GiST index access method (PostgreSQL 9.2). (GPORCA ignores SP-GiST indexes.)
  • Postgres Planner support for ordered-set aggregates and moving-aggregates (PostgreSQL 9.4).
  • Support for jsonb data type (PostgreSQL 9.4).
  • DELETE, INSERT, and UPDATE supports the WITH clause, CTE (common table expression) (PostgreSQL 9.1).
  • Collation support to specify sort order and character classification behavior for data at the column level (PostgreSQL 9.1).
    Note: GPORCA supports collation only when all columns in the query use the same collation. If columns in the query use different collations, then Greenplum uses the Postgres Planner.

Zstandard Compression Algorithm

Greenplum Database 6.0 adds support for zstd (Zstandard) compression for some database operations. See Enabling Compression.

Relaxed Rules for Specifying Table Distribution Columns

In previous releases, if you specified both a UNIQUE constraint and a DISTRIBUTED BY clause in a CREATE TABLE statement, then the DISTRIBUTED BY clause was required to be equal to or a left-subset of the UNIQUE columns. Greenplum 6.x relaxes this rule so that any subset of the UNIQUE columns is accepted.

This change also affects the rules for how Greenplum 6.x selects a default distribution key. If gp_create_table_random_default_distribution is off (the default) and you do not include a DISTRIBUTED BY clause, then Greenplum chooses the table distribution key based on the command:
  • If a LIKE or INHERITS clause is specified, then Greenplum copies the distribution key from the source or parent table.
  • If a PRIMARY KEY or UNIQUE constraints are specified, then Greenplum chooses the largest subset of all the key columns as the distribution key.
  • If neither constraints nor a LIKE or INHERITS clause is specified, then Greenplum chooses the first suitable column as the distribution key. (Columns with geometric or user-defined data types are not eligible as Greenplum distribution key columns.)

Resource Groups Features

Greenplum Database includes these new resource group features:
  • You no longer are required to specify a MEMORY_LIMIT when you configure a Greenplum Database resource group. When you specify MEMORY_LIMIT=0, Greenplum Database will use the resource group global shared memory pool to service queries running in the group.
  • When you specify MEMORY_SPILL_RATIO=0, Greenplum Database will now use the statement_mem server configuration parameter setting to identify the initial amount of query operator memory.

    When used together to configure a resource group (MEMORY_LIMIT=0 and MEMORY_SPILL_RATIO=0), these new capabilities provide a memory management scheme similar to that provided by Greenplum Database resource queues.

    The default values of the MEMORY_SHARED_QUOTA, MEMORY_SPILL_RATIO, and MEMORY_LIMIT attributes for the admin_group and default_group resource groups have been set to use this resource queue-like memory management scheme so that when you initially enable resource groups, your queries will run in a memory environment similar to before.

    Table 1. Resource Group Attribute Defaults
    Resource Group admin_group default_group

PL/pgSQL Procedural Language Enhancements

PL/pgSQL in Greenplum Database 6.0 includes support for the following new features:

  • Attaching DETAIL and HINT text to user-thrown error messages. You can also specify the SQLSTATE and SQLERRMSG codes to return on a user-thrown error (PostgreSQL 8.4).
  • The RETURN QUERY EXECUTE statement, which specifies a query to execute dynamically (PostgreSQL 8.4).
  • Conditional execution using the CASE statement (PostgreSQL 8.4). See Conditionals in the PostgreSQL documentation.

Replicated Table Data

The CREATE TABLE command supports DISTRIBUTED REPLICATED as a distribution policy. If this distribution policy is specified, Greenplum Database distributes all rows of the table to all segment instances in the Greenplum Database system.
Note: The hidden system columns (ctid, cmin, cmax, xmin, xmax, and gp_segment_id) cannot be referenced in user queries on replicated tables because they have no single, unambiguous value. Greenplum Database returns a column does not exist error for the query.

Concurrency Improvements in Greenplum 6

Greenplum Database 6 includes the following concurrency improvements:

  • Global Deadlock Detector - Previous versions of Greenplum Database prevented global deadlock by holding exclusive table locks for UPDATE and DELETE operations. While this strategy did prevent deadlocks, it came at the cost of poor performance on concurrent updates. Greenplum Database 6 includes a global deadlock detector. This backend process collects and analyzes lock waiting data in the Greenplum cluster. If the Global Deadlock Detector determines that deadlock exists, it breaks the deadlock by cancelling one or more backend processes. By default, the global deadlock detector is disabled and table-level exclusive locks are held for table updates. When the global deadlock detector is enabled, Greenplum Database holds row-level exclusive locks and concurrent updates are allowed. See Global Deadlock Detector.
  • Transaction Lock Optimization - Greenplum Database 6 optimizes transaction lock usage both when you BEGIN and COMMIT a transaction. This benefits highly concurrent mixed workloads.
  • Upstream PostgreSQL Features - Greenplum 6 includes upstream PostgreSQL features, including those for fastpath lock, which reduce lock contention. This benefits concurrent short queries and mixed workloads.
  • VACUUM can more easily skip pages it cannot lock. This reduces the frequency of a vacuum appearing to be "stuck," which occurs when VACUUM waits to lock a block for cleanup and another session has held a lock on the block for a long time. Now VACUUM skips a block it cannot lock and retries the block later.
  • VACUUM rechecks block visibility after it has removed dead tuples. If all remaining tuples in the block are visible to current and future transactions, the block is marked as all-visible.
  • The tables that are part of a partitioned table hierarchy, but that do not contain data, are age-frozen so that they do not have to be vacuumed separately and do not affect calculation of the number of remaining transaction IDs before wraparound occurs. These tables include the root and intermediate tables in the partition heirarchy and, if they are append-optimized, their associated meta-data tables. This makes it unnecessary to vacuum the root partition to reduce the table's age, and eliminates the possibly needless vacuuming of all of the child tables.

Additional Contrib Modules

Greenplum Database 6 is distributed with these additional PostgreSQL and Greenplum contrib modules:

PXF Version 5.8.1

Greenplum Database 6.0 includes PXF 5.8.1, which introduces the following new and changed features:

  • The PXF S3 Connector now supports accessing CSV and Parquet data on S3 using the Amazon S3 Select service. Refer to Reading CSV and Parquet Data on S3 Using S3 Select.
  • PXF bundles new and upgraded libraries to provide Java 11 support.
  • PXF has added support for the timestamptz type when writing Parquet data to an external data source.
  • PXF now provides a reset command to reset your local PXF server instance, or all PXF server instances in the cluster, to an uninitialized state. See Resetting PXF.
  • PXF no longer supports specifying a DELIMITER in the CREATE EXTERNAL TABLE command LOCATION URI.

Additional Greenplum Database Features

Greenplum Database 6.0 also includes these features and changes from version 5.x:
  • Recursive WITH Queries (Common Table Expressions) are no longer considered a Beta feature, and are now enabled by default. See WITH Queries (Common Table Expressions).
  • VACUUM was updated to more easily skip pages that cannot be locked. This change should greatly reduce the incidence of VACUUM getting "stuck" while waiting for other sessions.
  • appendoptimized alias for the appendonly table storage option.
  • New gp_resgroup_status_per_host and gp_resgroup_status_per_segment gp_toolkit views to display resource group CPU and memory usage on a per-host and/or per-segment basis.
  • The new gp_stat_replication view contains replication statistics when master or segment mirroring is enabled. The pg_stat_replication view contains only master replication statistics.
  • The gpfdists and psql programs in the Greenplum Client and Loader Tools package for Windows support OpenSSL encryption.
  • Greenplum 6 includes some PostgreSQL 9.6 aggregate-related performance improvements.
  • The gpload utility program provided in the Greenplum Client and Loader Tools package for Windows is compatible with Greenplum Database 5.

Beta Features

Because Pivotal Greenplum Database is based on the open source Greenplum Database project code, it includes several Beta features to allow interested developers to experiment with their use on development systems. Feedback will help drive development of these features, and they may become supported in future versions of the product.

Warning: Beta features are not recommended or supported for production deployments.
Key experimental features in Greenplum Database 6 include:
  • Storage plugin API for gpbackup and gprestore. Partners, customers, and OSS developers can develop plugins to use in conjunction with gpbackup and gprestore.

    For information about the storage plugin API, see Backup/Restore Storage Plugin API.

  • Using the Greenplum Platform Extension (PXF) connectors to write Parquet data is a Beta feature.

Changed Features

Greenplum Database 6 includes these feature changes:
  • The performance characteristics of Greenplum Database under heavy loads have changed in version 6 as compared to previous versions. In particular, you may notice increased I/O operations on primary segments for changes related to GPSS, WAL replication, and other features. All customers are encouraged to perform load testing with real-world data to ensure that the new Greenplum 6 cluster configuration meets their performance needs.
  • gpbackup and gprestore are no longer installed with Greenplum Database 6, but are available separately on Pivotal Network and can be upgraded separately from the core database installation.
  • Greenplum 6 uses a new jump consistent hash algorithm to map hashed data values to Greenplum segments. The new algorithm ensures that, after new segments are added to the Greenplum 6 cluster, only those rows that hash to the new segment need to be moved. Greenplum 6 hashing has performance characteristics similar to earlier Greenplum releases, but should enable faster database expansion. Note that the new algorithm is more CPU intensive than the previous algorithm, so COPY performance may degrade somewhat on CPU-bound systems.
  • The older, legacy hash functions are represented as non-default hash operator classes, named cdbhash_*_ops. The non-default operator classes are used when upgrading from Greenplum Database earlier than 6.0. The legacy operator classes are compatible with each other, but if you mix the legacy operator classes with the new ones, queries will require Redistribute Motions.

    The server configuration parameter gp_use_legacy_hashops controls whether the legacy or default hash functions are used when creating tables that are defined with a distribution column.

    The gp_distribution_policy system table now contains more information about Greenplum Database tables and the policy for distributing table data across the segments including the operator class of the distribution hash functions.

  • The gpcheck utility is no longer included in Greenplum Database 6.
  • The input file format for the gpmovemirrors, gpaddmirrors, gprecoverseg and gpexpand utilities has changed. Instead of using a colon character (:) as a separator, the new file format uses a pipe character (|). For example, in previous releases a line in a gpexpand input file would resemble:
    The updated file format is:
    In addition, gpaddmirrors removes the mirror prefix from lines in its input file. Whereas a line from the previous release might resemble:
    The revised format is:
  • Greenplum uses direct dispatch to target queries that use IS NULL, similar to queries that filter on the table distribution key column(s).
  • The gpinitsystem option to specify the standby master data directory changed from -F to -S. The -S option no longer specifies spread mirroring. A new gpinitsystem option is introduced to specify the mirroring configuration: --mirror-mode={group|spread}.
  • The default value of the server configuration parameter log_rotation_size has changed from 0 to 1GB. This changes the default log rotation behavior so that a new log file is opened when more than 1GB has been written to the current log file or when the current log file has been open for 24 hours.
  • The default value of the server configuration parameter effective_cache_size has changed from 512MB to 16GB.
  • The gpssh-exkeys utility now requires that you have already set up passwordless SSH from the master host to every other host in the cluster. Running gpssh-exkeys then sets up passwordless SSH from every host to every other host.
  • The gpstop smart shutdown behavior has changed. Previously, if you ran gpstop -M smart (or just gpstop), the utility exited with a message if there were any active client connections. Now, gpstop waits for current connections to finish before completing the shutdown. If any connections remain open after the timeout period, or if you interrupt with CTRL-C, gpstop lists the open connections and prompts whether to continue waiting for connections to finish, or to perform a fast or immediate shutdown. The default timeout period is 120 seconds and can be changed with the -t timeout_seconds option.
  • In the pg_stat_activity and pg_stat_replication system views, the procpid column was renamed to pid to match the associated change in PostgreSQL 9.2.
  • In the pg_proc system table, the proiswin column was renamed to proiswindow and relocated in the table to match the pg_proc system table in PostgreSQL 8.4.
  • Queries that use SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer necessarily return sorted output. Previously these queries always removed duplicate rows by using Sort/Unique processing. They now implement hashing to conform to behavior introduced in PostgreSQL 8.4; this method does not produce sorted output. If your application requires sorted output for these queries, alter the queries to use an explicit ORDER BY clause. Note that SELECT DISTINCT ON never uses hashing, so its behavior is unchanged from previous versions.
  • In the gp_toolkit schema, the gp_resgroup_config view no longer contains the columns proposed_concurrency, proposed_memory_limit, proposed_memory_shared_quota and proposed_memory_spill_ratio.
  • In the pg_resgroupcapability system table, the proposed column has been removed.
  • The pg_database system table datconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • The pg_aggregate system table aggordered column was removed, and several new columns were added to the table to support ordered-set aggregates and moving-aggregates with the Postgres Planner (PostgreSQL 9.4). The ALTER/CREATE/DROP AGGREGATE SQL command signatures have also been updated to reflect the pg_aggregate catalog changes.
  • The pg_authid system table rolconfig column was removed. Greenplum Database now uses the pg_db_role_setting system table to keep track of per-database and per-role server configuration settings (PostgreSQL 9.0).
  • When creating and altering a table that has a distribution column, you can now specify the hash function used to distribute data across segment instances.
  • Pivotal Greenplum Database 6 removes the RECHECK option from ALTER OPERATOR FAMILY and CREATE OPERATOR CLASS DDL (PostgreSQL 8.4). Greenplum now determines whether an index operator is "lossy" on-the-fly at runtime.
  • Operator-related system catalog tables are modified to support operator families, compatibility, and types (ordering or search).
  • System catalog table entries for HyperLogLog (HLL) functions, aggregates, and types are modified to prefix names with gp_. Renaming the HLL functions prevents name collisions with external Greenplum Database extensions that use HLL. Any user code written to use the built-in Greenplum Database HLL functions must be updated to use the new gp_ names.
  • The "legacy optimizer" from previous releases of Greenplum is now referred to as the Postgres planner in both the code and documentation.
  • The transaction isolation levels in Greenplum Database 6.0 are changed to align with PostgreSQL transaction isolation levels since the introduction of the serializable snapshot isolation (SSI) mode in PostgreSQL 9.1. The new SSI mode, which is not implemented in Greenplum Database, provides true serializability by monitoring concurrent transactions and rolling back transactions that could introduce a serialization anomaly. The existing snapshot isolation (SI) mode guarantees that transactions operate on a single, consistent snapshot of the database, but does not guarantee a consistent result when a set of concurrent transactions is executed in any given sequence.

    Greenplum Database 6.0 now allows the REPEATABLE READ keywords with SQL statements such as BEGIN and SET TRANSACTION. A SERIALIZABLE transaction in PostgreSQL 9.1 or later uses the new SSI mode. A SERIALIZABLE transaction in Greenplum Database 6.0 falls back to REPEATABLE READ, using the SI mode. The following table shows the SQL standard compliance for each transaction isolation level in Greenplum Database 6.0 and PostgreSQL 9.1.

    Table 2. Transaction Level Compliance with SQL Standard
    Requested Transaction Isolation Level Greenplum Database 6.0 Compliance PostgreSQL 9.1 Compliance
  • The CREATE TABLESPACE command has changed.
    • The command no longer requires a filespace created with the gpfilespace utility.
    • The FILESPACE clause has been removed.
    • The WITH clause has been added to allow specifying a tablespace location for a specific segment instance.
    • A primary-mirror pair sharing the same content ID must use the same tablespace location.
  • The ALTER SEQUENCE SQL command has new clauses START [WITH] start and OWNER TO new_owner (PostgreSQL 8.4). The START clause sets the start value that will be used by future ALTER SEQUENCE RESTART commands, but does not change the current value of the sequence. The OWNER TO clause changes the sequence's owner.
  • The ALTER TABLE SQL command has a SET WITH OIDS clause to add an oid system column to a table (PostgreSQL 8.4). Note that using oids with Greenplum Database tables is strongly discouraged.
  • The CREATE DATABASE SQL command has new parameters LC_COLLATE and LC_CTYPE to specify the collation order and character classification for the new database.
  • The CREATE FUNCTION SQL command has a new keyword WINDOW, which indicates that the function is a window function rather than a plain function (PostgreSQL 8.4).
  • Specifying the index name in the CREATE INDEX SQL command is now optional. Greenplum Database constructs a default index name from the table name and indexed columns.
  • In the CREATE TABLE command, the Greenplum Database parser allows commas to be placed between a SUBPARTITION TEMPLATE clause and its cooresponding SUBPARTITION BY clause, and between consecutive SUBPARTITION BY clauses. Using this undocumented syntax will generate a deprecation warning message.
  • Superuser privileges are now required to create a protocol. See CREATE PROTOCOL.
  • The CREATE TYPE SQL command has a new LIKE=type clause that copies the new type's representation (INTERNALLENGTH, PASSEDBYVALUE, ALIGNMENT, and STORAGE) from an existing type (PostgreSQL 8.4).
  • The GRANT SQL command has new syntax to grant privileges on truncate, foreign data wrappers, and foreign data servers (PostgreSQL 8.4).
  • The LOCK SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is locked without locking any tables that inherit from it.
  • Using the LOCK table statement outside of a transaction raises an error in Greenplum Database 6.0. In earlier releases, the statement executed, although it is only useful when executed inside of a transaction.
  • The SELECT and VALUES SQL commands support the SQL 2008 OFFSET and FETCH syntax (PostgreSQL 8.4). These clauses provide an alternative syntax for limiting the results returned by a query.
  • The FROM clause can be omitted from a SELECT command, but Greenplum Database no longer allows queries that omit the FROM clause and also reference database tables.
  • The ROWS and RANGE SQL keywords have changed from reserved to unreserved, and may be used as table or column names without quoting.
  • In Greenplum 6, a query on an external table with descendants will by default recurse into the descendant tables. This is a change from previous Greenplum Database versions, which never recursed into descendants. To get the previous behavior in Greenplum 6, you must include the ONLY keyword in the query to restrict the query to the parent table.
  • The default value for the optimizer_force_multistage_agg server configuration parameter has changed from true to false. GPORCA will now by default choose between a one-stage or two-stage aggregate plan for a scalar distinct qualified aggregate based on cost.
  • The TRUNCATE SQL command has an optional ONLY keyword (PostgreSQL 8.4). When specified, the table is truncated without truncating any tables that inherit from it.
  • The createdb command-line utility has new options -l (--locale), --lc-collate, and --lc-ctype to specify the locale and character classification for the database (PostgreSQL 8.4).
  • The pg_dump, pg_dumpall, and pg_restore utilities have a new --role=rolename option that instructs the utility to execute SET ROLE rolename after connecting to the database and before starting the dump or restore operation (PostgreSQL 8.4).
  • The pg_dump and pg_dumpall command-line utilities have a new option --lock-wait-timeout=timeout (PostgreSQL 8.4). When specified, instead of waiting indefinitely the dump fails if the utility cannot acquire shared table locks within the specified number of milliseconds.
  • The -d and -D command-line options are removed from the pg_dump and pg_dumpall utilities. The corresponding long versions, --inserts and --column-inserts are still supported. A new --binary-upgrade option is added, for use by in-place upgrade utilities.
  • The -w (--no-password) option was added to the pg_dump, pg_dumpall, and pg_restore utilities.
  • The -D option is removed from the gpexpand utility. The expansion schema will be created in the postgres database.
  • The gpstate utility has a new -x option, which displays details of an in-progress system expansion. gpstate -s and gpstate with no options specified also report if a system expansion is in progress.
  • The pg_restore utility has a new option -j (--number-of-jobs) parameter. This option can reduce time to restore a large database by running tasks such as loading data, creating indexes, and creating constraints concurrently.
  • The vacuumdb utility has a new -F (--freeze) option to freeze row transaction information.
  • ALTER DATABASE includes the SET TABLESPACE clause to change the default tablespace.
  • CREATE DATABASE includes the COLLATE and CTYPE options for setting the collation order and character classification of the new database.
  • In the gp_toolkit schema, the gp_workfile_* views have changed due to Greenplum Database 6 workfile enhancements. See Checking Query Disk Spill Space Usage for information about gp_workfile_* views.
  • The server configuration parameter gp_workfile_compress_algorithm has been changed to gp_workfile_compression. When workfile compression is enabled, Greenplum Database uses Zstandard compression.
  • The Oracle Compatibility Functions are now available in Greenplum Database as an extension, based on the PostgreSQL orafce project at https://github.com/orafce/orafce. Instead of executing a SQL script to install the compatibility functions in a database, you now execute the SQL command CREATE EXTENSION orafce. The Greenplum Database 6.0 orafce extension is based on the orafce 3.7 release. See Oracle Compatibility Functions for information about differences between the Greenplum Database compatibility functions and the PostgreSQL orafce extension.
  • Greenplum Database 6 supports specifying a table column of the citext data type as a distribution key.
  • Greenplum Database 6 provides a single client and loader tool package that you can download and install on a client system. Previous Greenplum releases provided separate client and loader packages. For more information about the Greenplum 6 Clients package, refer to Client Tools in the platform requirements documentation.
  • Greenplum Database 6 includes both PostgreSQL-sourced and Greenplum- sourced contrib modules. Most of these modules are now packaged as extensions, and you register an extension in Greenplum with the CREATE EXTENSION name command. Refer to Installing Additional Supplied Modules for more information about registering contrib modules in Greenplum Database 6.
  • When Greenplum Database High Availability is enabled, a primary segment instance is kept up to date with the mirror segment instance using Write-Ahead Logging (WAL)-based streaming replication. See Overview of Segment Mirroring.

    The gp_stat_replication view contains replication statistics when master or segment mirroring is enabled.

    In previous releases, segment mirroring employed a physical file replication scheme.

  • In the gp_segment_configuration table, the replication_port has been removed. The datadir column has been added to display the segment instance data directory. The mode column values are now s (synchronized) or n (not synchronized). Use the gp_stat_replication view to determine the synchronization state.
  • The Greenplum Database 6 Client and Loader Tools package for Windows does not support running the gpfdist program as a native Windows service.

Removed Features

Pivotal Greenplum Database 6 removes these features:
  • The gpseginstall utility is no longer included. You must install the Greenplum software RPM on each segment host, as described in Installing the Greenplum Database Software.
  • The gptransfer utility is no longer included; use gpcopy for all functionality that was provided with gptransfer.
  • The gp_fault_strategy system table is no longer used. Greenplum Database now uses the gp_segment_configuration system table to determine if mirroring is enabled.
  • Pivotal Greenplum Database 6 removes the gpcrondump, gpdbrestore, and gpmfr management utilities. Use gpbackup and gprestore to back up and restore Greenplum Database.
  • Pivotal Greenplum Database 6 no longer supports Veritas NetBackup.
  • Pivotal Greenplum Database 6 no longer supports the use of direct I/O to bypass the buffering of memory within the file system cache for backup.
  • Pivotal Greenplum Database 6 no longer supports the gphdfs external table protocol to access a Hadoop system. Use the Greenplum Platform Extension Framework (PXF) to access Hadoop in version 6. Refer to pxf:// Protocol for information about using the pxf external table protocol.
  • Pivotal Greenplum Database 6 no longer supports SSLv3.
  • Pivotal Greenplum Database 6 removes the following server configuration parameters:
    • gp_analyze_relative_error
    • gp_backup_directIO
    • gp_backup_directIO_read_chunk_mb
    • gp_connections_per_thread
    • gp_enable_sequential_window_plans
    • gp_idf_deduplicate
    • gp_snmp_community
    • gp_snmp_monitor_address
    • gp_snmp_use_inform_or_trap
    • gp_workfile_checksumming
  • The undocumented gp_cancel_query() function, and the configuration parameters gp_cancel_query_print_log and gp_cancel_query_delay_time, are removed in Greenplum Database 6.
  • The string_agg(expression) function is removed from Greenplum 6. The function concatenates text values into a string. The string_agg(expression, delimiter) function is still supported.
  • Pivotal Greenplum Database 6 no longer supports the ability to configure a Greenplum Database system to trigger SNMP (Simple Network Management Protocol) alerts or send email notifications to system administrators if certain database events occur. Use Pivotal Greenplum Command Center alerts to detect and respond to events that occur in a Greenplum system.
  • Pivotal Greenplum Database 6 removes the gpfilespace utility. The CREATE TABLESPACE command no longer requires a filespace created with the utility.
  • Pivotal Greenplum Database 6 no longer automatically casts text from the deprecated timestamp format YYYYMMDDHH24MISS. The format could not be parsed unambiguously in previous Greenplum Database releases. The format is not supported in PostgreSQL 9.4.
    For example, this command returns an error in Greenplum Database 6. In previous releases, a timestamp is returned.
    # select to_timestamp('20190905140000');
    In Greenplum Database 6, this command returns a timestamp.
    # select to_timestamp('20190905140000','YYYYMMDDHH24MISS');
  • Pivotal Greenplum Database 6 removes the --ignore-version option from the pg_dump, pg_dumpall, and pg_restore utilities.
  • The gpcheck utility is no longer included.

Differences Compared to Open Source Greenplum Database

Pivotal Greenplum 6.x includes all of the functionality in the open source Greenplum Database project and adds:
  • Product packaging and installation script.
  • Support for data connectors:
    • Greenplum-Spark Connector
    • Greenplum-Informatica Connector
    • Greenplum-Kafka Integration
    • Greenplum Stream Server
  • Data Direct ODBC/JDBC Drivers
  • gpcopy utility for copying or migrating objects between Greenplum systems.
  • Support for managing Greenplum Database using Pivotal Greenplum Command Center.
  • Support for full text search and text analysis using Pivotal GPText.

Deprecated Features

Deprecated features will be removed in a future major release of Greenplum Database. Pivotal Greenplum 6.x deprecates:

  • The server configuration parameter gp_ignore_error_table (deprecated since 6.0).

    To avoid a Greenplum Database syntax error, set the value of this parameter to true when you run applications that execute CREATE EXTERNAL TABLE or COPY commands that include the now removed Greenplum Database 4.3.x INTO ERROR TABLE clause.

  • Specifying => as an operator name in the CREATE OPERATOR command (deprecated since 6.0).
  • The Greenplum external table C API (deprecated since 6.0).

    Any developers using this API are encouraged to use the new Foreign Data Wrapper API in its place.

  • Commas placed between a SUBPARTITION TEMPLATE clause and its corresponding SUBPARTITION BY clause, and between consecutive SUBPARTITION BY clauses in a CREATE TABLE command (deprecated since 6.0).

    Using this undocumented syntax will generate a deprecation warning message.

  • The timestamp format YYYYMMDDHH24MISS (deprecated since 6.0).

    This format could not be parsed unambiguously in previous Greenplum Database releases, and is not supported in PostgreSQL 9.4.

  • The createlang and droplang utilities (deprecated since 6.0).
  • The pg_resqueue_status system view (deprecated since 6.0).

    Use the gp_toolkit.gp_resqueue_status view instead.

  • The GLOBAL and LOCAL modifiers when creating a temporary table with the CREATE TABLE and CREATE TABLE AS commands (deprecated since 6.0).

    These keywords are present for SQL standard compatibility, but have no effect in Greenplum Database.

  • The Greenplum Platform Extension Framework (PXF) HDFS profile names for the Text, Avro, JSON, Parquet, and SequenceFile data formats (deprecated since 5.16).

    Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for more information.

  • Using WITH OIDS or oids=TRUE to assign an OID system column when creating or altering a table (deprecated since 6.0).
  • Allowing superusers to specify the SQL_ASCII encoding regardless of the locale settings (deprecated since 6.0).

    This choice may result in misbehavior of character-string functions when data that is not encoding-compatible with the locale is stored in the database.

  • The @@@ text search operator (deprecated since 6.0).

    This operator is currently a synonym for the @@ operator.

  • The unparenthesized syntax for option lists in the VACUUM command (deprecated since 6.0).

    This syntax requires that the options to the command be specified in a specific order.

  • The plain pgbouncer authentication type (auth_type = plain) (deprecated since 4.x).

Migrating Data to Greenplum 6

Note: Greenplum 6 supports direct upgrades, using gpupgrade, from Greenplum 5.x releases to Greenplum 6.x. For more information, see the gpupgrade documentation.

See Migrating Data from Greenplum 4.3 or 5 for guidelines and considerations for migrating existing Greenplum data to Greenplum 6, using standard backup and restore procedures.

Known Issues and Limitations

Pivotal Greenplum 6 has these limitations:

  • Upgrading a Greenplum Database 4 or 5 release, or Greenplum 6 Beta release, to Pivotal Greenplum 6 is not supported.
  • MADlib, GPText, and PostGIS are not yet provided for installation on Ubuntu systems.
  • Greenplum 6 is not supported for installation on DCA systems.
  • Greenplum for Kubernetes is not yet provided with this release.

The following table lists key known issues in Pivotal Greenplum 6.x.

Table 3. Key Known Issues in Pivotal Greenplum 6.x
Issue Category Description
N/A PXF Starting in 6.x, Greenplum does not bundle cURL and instead loads the system-provided library. PXF requires cURL version 7.29.0 or newer. The officially-supported cURL for the CentOS 6.x and Red Hat Enterprise Linux 6.x operating systems is version 7.19.*. Greenplum Database 6 does not support running PXF on CentOS 6.x or RHEL 6.x due to this limitation.

Workaround: Upgrade the operating system of your Greenplum Database 6 hosts to CentOS 7+ or RHEL 7+, which provides a cURL version suitable to run PXF.

29703 Loading Data from External Tables Due to limitations in the Greenplum Database external table framework, Greenplum Database cannot log the following types of errors that it encounters while loading data:
  • data type parsing errors
  • unexpected value type errors
  • data type conversion errors
  • errors returned by native and user-defined functions
LOG ERRORS returns error information for data exceptions only. When it encounters a parsing error, Greenplum terminates the load job, but it cannot log and propagate the error back to the user via gp_read_error_log().

Workaround: Clean the input data before loading it into Greenplum Database.

170824967 gpfidsts For Greenplum Database 6.x, a command that accesses an external table that uses the gpfdists protocol fails if the external table does not use an IP address when specifying a host system in the LOCATION clause of the external table definition.
168548176 gpbackup When using gpbackup to back up a Greenplum Database 5.7.1 or earlier 5.x release with resource groups enabled, gpbackup returns a column not found error for t6.value AS memoryauditor.
168167337 gpinitsystem When running gpinitsystem to initialize a Greenplum Database system with mirroring enabled, the utility configures the pg_hba.conf file in a manner that does not permit incremental recovery of the primary segment instances. This issue is fixed in release 6.0.1.

Also, for Greenplum Database 6.0.0, the IP addresses of the primary and mirror hosts must be within the same subnet.

164791118 PL/R PL/R installation does not work using the deprecated createlang utility, and displays the error:
createlang: language installation failed: ERROR:  
no schema has been selected to create in
Workaround: Use CREATE EXTENSION to install PL/R, as described in the documentation.
167851039 PXF The pxf cluster reset command does not reset PXF configuration on the standby master host.

Workaround: Log in to the standby master host and run pxf reset locally.

Resolved in Greenplum Database 6.0.1.

167851065 PXF PXF allows you to initialize PXF without setting PXF_CONF, and incorrectly creates a PXF configuration directory named $HOME/NOT_INITIALIZED.

Workaround: Remove the $HOME/NOT_INITIALIZED directory, and re-run pxf cluster init, this time specifying a valid PXF_CONF value.

Resolved in Greenplum Database 6.0.1.

167948506 PXF S3 object store access may fail when PXF is configured for Kerberized Hadoop.

Workaround: Perform one of the following:

  • Add the YARN principal property to the s3-site.xml file in your S3 server configuration. For example:
  • Copy yarn-site.xml from the Hadoop cluster to your S3 server configuration directory.
  • Disable token renewal per-S3-bucket by adding a mapreduce.job.hdfs-servers.token-renewal.exclude property/value block to s3-site.xml for each bucket accessed with PXF. For example:

Resolved in Greenplum Database 6.0.1.

168271005 PXF The PXF JDBC Connector fails to access any database when PXF is configured to use the MapR Hadoop distribution, and MapR libraries are present in $PXF_CONF/lib.

Resolved in Greenplum Database 6.0.1.

170202002 Greenplum-Kafka Integration Updating the METADATA:SCHEMA property and restarting a previously-run load job could cause gpkafka to re-read Kafka messages published to the topic, and load duplicate messages into Greenplum Database.
N/A Greenplum Client/Load Tools on Windows The Greenplum Database client and load tools on Windows have not been tested with Active Directory Kerberos authentication.