VMware Tanzu Greenplum Streaming Server 1.5 Release Notes

VMware Tanzu Greenplum Streaming Server 1.5 Release Notes

This document contains pertinent release information about the VMware Tanzu Greenplum Streaming Server version 1.5 release. The Greenplum Streaming Server (GPSS) is included in certain Tanzu Greenplum 5.x and 6.x distributions. GPSS for Redhat/CentOS 6 and 7 is also updated and distributed independently of Greenplum Database. You may need to download and install the GPSS distribution from VMware Tanzu Network to obtain the most recent version of this component.

Supported Platforms

Tanzu Greenplum Streaming Server 1.5.x is compatible with these Tanzu Greenplum versions:

  • Tanzu Greenplum 5.17.0 and later
  • Tanzu Greenplum 6.0.0 and later

Release 1.5.0

Release Date: December 2, 2020

Greenplum Streaming Server 1.5.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.0.

New and Changed Features

Greenplum Streaming Server 1.5.0 includes these new and changed features:

  • The load configuration file ERROR_LIMIT property, previously mandatory, is now optional. The default value for the property is zero (0); GPSS disables error logging and stops a load operation upon encountering the first error.
  • GPSS includes out-of-the-box Prometheus integration, enabling you to use the tool to monitor your gpss server instances. Refer to Monitoring GPSS Service Instances for more information on enabling and using this integration.
  • New configuration properties in the gpss.json server configuration file include:
    • The DebugPort configuration property. You can use this property to identify the port number on which GPSS starts a debug server for the gpss server instance. Refer to Pulling Information from the Debug Server for more information.
    • The MinTLSVersion configuration property. You use this property to specify the minimum TLS version that GPSS requests on encrypted connections.
    • The Logging configuration property block. You can use these configuration properties to set the front-end and back-end logging levels for GPSS commands. See About GPSS Logging.
    • The JobStore configuration property block. Use the configuration property in this block to specify a local directory in which GPSS maintains job status information. This allows a GPSS server instance to (re)start any in-progress jobs when the instance first starts up. See About GPSS Job Management.
    • The Monitor configuration property block. You use this property to enable GPSS Prometheus integration.
  • GPSS no longer generates and assigns a unique identifier as the job name when you invoke the gpsscli submit or gpsscli load commands without specifying the --name option. GPSS now assigns the base name of the load configuration file as the default job name.
  • GPSS exposes a new load configuration property for Kafka data sources named PARTITIONS. Use this property to specify the specific partition numbers from which you want GPSS to load Kafka messages from the topic. (This property is not supported for the Kafka version 1 configuration file format.)
  • GPSS supports specifying template parameters for load configuration file properties. When you specify the {{template_var}} value syntax in the file, GPSS substitutes template_var with a value that you specify via the -p | --property template_var=value option when you submit or load the job.
  • GPSS supports SSL encryption on the control channel between GPSS and the Greenplum Database master, and ships with an updated pq library to support this feature. See Configuring SSL for the Control Channel for configuration information.
  • The gpsscli start, stop, and remove subcommands now support a --all flag. When you specify this flag, GPSS: starts all submitted jobs, stops all running jobs, or removes all stopped jobs.
  • The gpsscli submit and gpsscli load commands can now operate on one or more YAML load configuration files.
  • GPSS exposes the new SAVE_FAILING_BATCH load configuration property. When you set this property to true, GPSS also writes loading data to a backup table. When GPSS encounters expression evaluation errors, this backup table aids in the recovery of the load operation. See Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for additional information.
  • GPSS 1.5.0 introduces a new Beta feature, the version 3 load configuration file format. This format introduces a new YAML organization and keywords, and more closely aligns with the GPSS gRPC Streaming Job API. Refer to gpsscli-v3.yaml (Beta) for the version 3 syntax.

Resolved Issues

Greenplum Streaming Server 1.5.0 resolves these issues:

In some cases when GPSS reused external tables for jobs, it did not update the external table that it uses internally for load operations when the target Greenplum table definition was modified.
Resolves an issue where GPSS was unable to cancel a batch write operation when it encountered an error, and left a lingering session.

Deprecated Features

Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.5.x deprecates:

  • The gpkafka Version 1 configuration file format (deprecated since 1.4.0).
  • The gpkafka.yaml (versions 1 and 2) POLL block, including the POLL:BATCHSIZE and POLL:TIMEOUT properties (deprecated since 1.3.5).

Known Issues and Limitations

Greenplum Streaming Server 1.5.x has these known issues:

The Greenplum Streaming Server may consume a very large amount of system memory when you use it to load a huge (hundreds of GBs) file, in some cases causing the Linux kernel to kill the GPSS server process. Do not use GPSS to load very large files; instead, use gpfdist.
Due to limitations in the Greenplum Database external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression EXPRESSION: (jdata->>'id')::int in your load configuration file, and the content of jdata->>'id' is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user via gp_read_error_log().
Workarounds for Kafka:
  • Set the SAVE_FAILING_BATCH load configuration property to true, and then manually load any data batch that included expression errors.
  • Skip the bad Kafka message by specifying a --force--reset-xxx flag on the job start or load command.
  • Correct the message and publish it to another Kafka topic before loading it into Greenplum Database.