VMware Tanzu Greenplum Streaming Server 1.5 Release Notes

VMware Tanzu Greenplum Streaming Server 1.5 Release Notes

This document contains pertinent release information about the VMware Tanzu Greenplum Streaming Server version 1.5 release. The Greenplum Streaming Server (GPSS) is included in certain Tanzu Greenplum 5.x and 6.x distributions. GPSS for Redhat/CentOS 6 and 7 is also updated and distributed independently of Greenplum Database. You may need to download and install the GPSS distribution from VMware Tanzu Network to obtain the most recent version of this component.

Supported Platforms

Tanzu Greenplum Streaming Server 1.5.x is compatible with these Tanzu Greenplum versions:

  • Tanzu Greenplum 5.17.0 and later
  • Tanzu Greenplum 6.0.0 and later

Release 1.5.3

Release Date: April 15, 2021

Greenplum Streaming Server 1.5.3 resolves an issue.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.3.

Resolved Issues

Greenplum Streaming Server 1.5.3 resolves this issue:

31357
Resolves an issue where GPSS did not correctly handle CUSTOM_OPTION properties specified in a load configuration file. GPSS now supports using the NAME and PARAMSTR properties to specify a custom formatter user-defined function.

Release 1.5.2

Release Date: March 5, 2021

Greenplum Streaming Server 1.5.2 resolves several issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.2.

Changed Features

Greenplum Streaming Server 1.5.2 includes this change:

  • GPSS omits the end time in its output error hints. Resolved issue 31287 provides more information.

Resolved Issues

Greenplum Streaming Server 1.5.2 resolves these issues:

N/A
Resolves an issue where GPSS logged the message execInsert and err: nil because it did not check for an error before logging.
31287
Resolves an issue where GPSS did not always display the correct end time in the output error hint by removing the end time condition.
177153850
Resolves an issue where a GPSS query returned a syntax error from Greenplum Database because MATCH COLUMNS was empty. GPSS now requires and checks that this field includes at least one column when you submit a load job that specifies UPDATE or MERGE mode.
177133400
Resolves an issue where GPSS stopped a Kafka job unexpectedly and did not return an error when it encountered a batch that contained only a control message.
177077055
Resolves an issue where the --all option was incorrectly displayed in the help output of the gpsscli load command.
177077007
GPSS consumed a large mount of memory caching Kafka messages when it ran many concurrent jobs that read from multiple partitions. This issue is resolved; GPSS now specifies a less aggressive default value for the librdkafka queued.max.messages.kbytes property when the user does not explicitly configure it.
177014072
Resolves an issue where GPSS incorrectly returned the error gpkafka load show job progress fail, err: job progress is nil when it failed to start a Kafka job. GPSS now returns the more meaningful error gpkafka load start job failed in this situation.
176842005
Resolves an issue where GPSS submitted a job with the wrong name when a gpsscli load *.yaml command operated on more than one load job.

Release 1.5.1

Release Date: February 5, 2021

Greenplum Streaming Server 1.5.1 includes changes and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.1.

Changed Features

Greenplum Streaming Server 1.5.1 includes these changes:

  • Version 1.5.1 is the first standalone GPSS release that includes a .deb installation package for Ubuntu 18.04 LTS systems.
  • The gpsscli subcommands now consistently return zero (0) on success and non-zero when GPSS encounters an error.
  • GPSS improves the error message that it returns when it encounters a mismatched extension or formatter version.
  • GPSS bundles a patched version of the libserdes library to fix an issue that can arise when the SCHEMA_REGISTRY_ADDRS property value includes a trailing slash. See resolved issue 31137.
  • GPSS now registers the gp_read_persistent_error_log() function when you register the GPSS extension in a database. Resolved issue 31201 provides more information.
  • The progress log file name format has changed; the new format retains the complete job name rather than truncating it to 8 characters.

Resolved Issues

Greenplum Streaming Server 1.5.1 resolves these issues:

31201
Resolves an issue where GPSS returned a permission denied for language c error when it attempted, at runtime, to register an internal function as the Greenplum Database user that started GPSS, and this user did not have the privileges required to create such functions. GPSS now registers this internal function when you create the GPSS extension in a database.
31137
Due to a bug in the dependent library libserdes, GPSS did not correctly handle a trailing slash when specified in the first address in a list of SCHEMA_REGISTRY_ADDRs. This issue is resolved; GPSS 1.5.1 bundles a patched version of the libserdes library that can handle such addresses.
176136800
Resolves an issue where GPSS returned an error when it interpreted and parsed the SAVE_FAILING_BATCH property and value in a (deprecated) version 1 load configuration file, when version 1 of the file does not support this property. GPSS now displays a warning message when it encounters a property that is not supported in a version 1 configuration file.
176068963
GPSS reported an offset gap when it read Kafka messages using the read_committed isolation level, the job was restarted, and the topic retention period had expired. This issue is resolved; GPSS now records control message offsets.
175867685
Resolves an issue where the -i | --edit-in-place option was displayed in the help output of subcommands that did not support the option. GPSS now correctly displays the option only for the gpsscli convert command.
175867670
Resolves an issue where the gpsscli subcommands did not return consistent values. gpsscli now returns zero (0) on success and non-zero on failure.
n/a
Resolves an issue where GPSS did not correctly validate a filesource.yaml load configuration file before submitting the job.

Release 1.5.0

Release Date: December 2, 2020

Greenplum Streaming Server 1.5.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.0.

New and Changed Features

Greenplum Streaming Server 1.5.0 includes these new and changed features:

  • The load configuration file ERROR_LIMIT property, previously mandatory, is now optional. The default value for the property is zero (0); GPSS disables error logging and stops a load operation upon encountering the first error.
  • GPSS includes out-of-the-box Prometheus integration, enabling you to use the tool to monitor your gpss server instances. Refer to Monitoring GPSS Service Instances for more information on enabling and using this integration.
  • New configuration properties in the gpss.json server configuration file include:
    • The DebugPort configuration property. You can use this property to identify the port number on which GPSS starts a debug server for the gpss server instance. Refer to Pulling Information from the Debug Server for more information.
    • The MinTLSVersion configuration property. You use this property to specify the minimum TLS version that GPSS requests on encrypted connections.
    • The Logging configuration property block. You can use these configuration properties to set the front-end and back-end logging levels for GPSS commands. See About GPSS Logging.
    • The JobStore configuration property block. Use the configuration property in this block to specify a local directory in which GPSS maintains job status information. This allows a GPSS server instance to (re)start any in-progress jobs when the instance first starts up. See About GPSS Job Management.
    • The Monitor configuration property block. You use this property to enable GPSS Prometheus integration.
  • GPSS no longer generates and assigns a unique identifier as the job name when you invoke the gpsscli submit or gpsscli load commands without specifying the --name option. GPSS now assigns the base name of the load configuration file as the default job name.
  • GPSS exposes a new load configuration property for Kafka data sources named PARTITIONS. Use this property to specify the specific partition numbers from which you want GPSS to load Kafka messages from the topic. (This property is not supported for the Kafka version 1 configuration file format.)
  • GPSS supports specifying template parameters for load configuration file properties. When you specify the {{template_var}} value syntax in the file, GPSS substitutes template_var with a value that you specify via the -p | --property template_var=value option when you submit or load the job.
  • GPSS supports SSL encryption on the control channel between GPSS and the Greenplum Database master, and ships with an updated pq library to support this feature. See Configuring SSL for the Control Channel for configuration information.
  • The gpsscli start, stop, and remove subcommands now support a --all flag. When you specify this flag, GPSS: starts all submitted jobs, stops all running jobs, or removes all stopped jobs.
  • The gpsscli submit and gpsscli load commands can now operate on one or more YAML load configuration files.
  • GPSS exposes the new SAVE_FAILING_BATCH load configuration property. When you set this property to true, GPSS also writes loading data to a backup table. When GPSS encounters expression evaluation errors, this backup table aids in the recovery of the load operation. See Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for additional information. (This property is not supported for the Kafka version 1 configuration file format.)
  • GPSS 1.5.0 introduces a new Beta feature, the version 3 load configuration file format. This format introduces a new YAML organization and keywords, and more closely aligns with the GPSS gRPC Streaming Job API. Refer to gpsscli-v3.yaml (Beta) for the version 3 syntax.
  • GPSS 1.5.0 supports the persisent error log feature of Greenplum Database when you are running against Greenplum version 5.26+ or 6.6+. For more details about the persisent error log, refer to the CREATE EXTERNAL TABLE SQL reference page in the Greenplum Database documentation.

Resolved Issues

Greenplum Streaming Server 1.5.0 resolves these issues:

30332
In some cases when GPSS reused external tables for jobs, it did not update the external table that it uses internally for load operations when the target Greenplum table definition was modified.
171299427
Resolves an issue where GPSS was unable to cancel a batch write operation when it encountered an error, and left a lingering session.

Deprecated Features

Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.5.x deprecates:

  • The gpkafka Version 1 configuration file format (deprecated since 1.4.0).
  • The gpkafka.yaml (versions 1 and 2) POLL block, including the POLL:BATCHSIZE and POLL:TIMEOUT properties (deprecated since 1.3.5).

Known Issues and Limitations

Greenplum Streaming Server 1.5.x has these known issues:

N/A
The SAVE_FAILING_BATCH and PARTITIONS configuration properties are not supported when you use the version 1 configuration file format to load data.
N/A
The Greenplum Streaming Server may consume a very large amount of system memory when you use it to load a huge (hundreds of GBs) file, in some cases causing the Linux kernel to kill the GPSS server process. Do not use GPSS to load very large files; instead, use gpfdist.
30503
Due to limitations in the Greenplum Database external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression EXPRESSION: (jdata->>'id')::int in your load configuration file, and the content of jdata->>'id' is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user via gp_read_error_log().
Workarounds for Kafka:
  • Set the SAVE_FAILING_BATCH load configuration property to true, and then manually load any data batch that included expression errors.
  • Skip the bad Kafka message by specifying a --force--reset-xxx flag on the job start or load command.
  • Correct the message and publish it to another Kafka topic before loading it into Greenplum Database.