The Pivotal Greenplum Streaming Server (GPSS) is included in the Pivotal Greenplum Database distribution. GPSS for Redhat/CentOS 6 and 7 is also updated and distributed independently of Greenplum Database starting with version 1.3.1. You may need to download and install the GPSS distribution to obtain the most recent version of this component.
Pivotal Greenplum Streaming Server is compatible with these Greenplum Database versions:
- Pivotal Greenplum Database 5.17.0 and later
- Pivotal Greenplum Database 6.0.0 and later
Release Date: December 19, 2019
Greenplum Streaming Server version 1.3.1 is the first standalone release of GPSS. GPSS 1.3.1 is also included in the Greenplum Database version 5.24 and 6.2 distributions.
Greenplum Streaming Server 1.3.1 is a maintenance release that resolves several issues.
Greenplum Streaming Server 1.3.1 resolves these issues:
- In some cases, reading from Kafka using the default MINIMAL_INTERVAL (0 seconds) caused GPSS to consume a large amount of CPU resources, even when no new messages existed in the Kafka topic. This issue is resolved.
- 169807372, 169831558
- GPSS 1.3.0 did not recognize internal history tables that were created with GPSS 1.2.6 and earlier. In some cases, this caused GPSS to load duplicate messages into Greenplum Database. This issue is resolved.
Release Date: November 1, 2019
Greenplum Streaming Server version 1.3.0 is included in the Greenplum Database version 5.23 and 6.1 distributions.
Greenplum Streaming Server 1.3.0 is a minor release that includes new and changed features and resolves several issues.
New and Changed Features
Greenplum Streaming Server 1.3.0 includes these new and changed features:
- GPSS now supports log rotation, utilizing a mechanism that you can easily integrate with the Linux logrotate system. See Managing GPSS Log Files for more information.
- GPSS has added the new INPUT:FILTER load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into Greenplum Database.
- GPSS displays job progress by partition when you provide the --partition flag to the gpsscli progress command.
- GPSS enables you to load Kafka data that was emitted since a specific timestamp into Greenplum Database. To use this feature, you provide the --force-reset-timestamp flag when you run gpsscli load, gpsscli start, or gpkafka load.
- GPSS now supports update and merge operations on data stored in a Greenplum Database table. The load configuration file accepts MODE, MATCH_COLUMNS, UPDATE_COLUMNS, and UPDATE_CONDITION property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Streaming Server provides an example merge scenario.
- GPSS supports Kerberos authentication to both Kafka and Greenplum Database.
- GPSS supports SSL encryption between GPSS and Kafka.
- GPSS supports SSL encryption on the data channel between GPSS and Greenplum Database.
Greenplum Streaming Server 1.3.0 is a minor release that resolves these issues:
- In some situations, specifying the --force-reset-earliest flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the --force-reset-xxx flags outside of an offset mismatch scenario is discouraged.)
- GPSS did not save error data to the external table error log when it encountered an incorrectly-formatted JSON or Avro message. This issue has been fixed; invoking gp_read_error_log() on the external table now displays the offending data.
- GPSS incorrectly treated Kafka jobs that specified the same Kafka topic and Greenplum output schema name and output table name, but different database names, as the same job. This issue has been resolved. GPSS now includes the Greenplum database name when constructing a job definition.
Greenplum Streaming Server 1.3.x has these known issues:
- Due to a regression in GPSS 1.3.0, GPSS no longer immediately dispatches Kafka data to Greenplum Database as it receives the data. GPSS now buffers and sends a batch of data to Greenplum as specified by the job COMMIT configuration, or when an application invokes the Close service.
- Updating the METADATA:SCHEMA property and restarting a previously-run load job could cause gpkafka to re-read Kafka messages published to the topic, and load duplicate messages into Greenplum Database.
- When loading Kafka data into Greenplum Database in UPDATE and MERGE modes, GPSS requires that a MAPPING exist for each column name identified in the MATCH_COLUMNS and UPDATE_COLUMNS lists.
- GPSS version 1.3.0 does not recognize internal history tables that
were created with GPSS v1.2.6 and earlier. If you re-submit a load job
that was originally initiated with the GPSS from a Greenplum Database
6.0.x or 5.22 or earlier distribution or Greenplum 6.0.x Clients
Package, GPSS will read Kafka messages starting from the earliest available
offset in the topic.
This may cause GPSS to load duplicate messages into Greenplum
Workaround: Do not upgrade to Greenplum Database 6.1 or 5.23; wait for a Greenplum or GPSS release that includes GPSS v1.3.1 or later.Resolved in GPSS 1.3.1.
- In some cases, reading from Kafka using the default
MINIMAL_INTERVAL (0 seconds) causes GPSS to consume
a large amount of CPU resources, even when no new messages exist in
the Kafka topic.
Workaround: Specify a MINIMAL_INTERVAL in the load configuration YAML file when you submit the job; for example, specify a value of 2000 (2 seconds) or 10000 (10 seconds).Resolved in GPSS 1.3.1.