This topic presents best practices to follow when you use the Greenplum Streaming Server Kafka Integration.
Choosing a Commit Threshold
gpkafka supports two mechanisms to control how and when it commits data to Greenplum Database: a time period or a number of rows. You specify one or both of MINIMAL_INTERVAL or MAX_ROW in the Kafka load configuration file.
For best results, try various settings of MINIMAL_INTERVAL to determine what value works best in your environment.
When message flow is heavy, GPSS may receive and buffer many messages during the MINIMAL_INTERVAL time period. In this situation, also providing a MAX_ROW setting may mitigate any high memory usage scenarios.