Configuring the Connector
You use the Apache NiFi user interface to configure a dataflow that uses the Greenplum Connector for Apache NiFi
PutGreenplumRecord processor to load record-oriented data from any source into Greenplum Database.
PutGreenplumRecord processor accepts record-based FlowFiles, sending the data to the Greenplum Streaming Server to write to Greenplum Database. When you configure the processor, you must identify the type and instance of the
RecordReader that corresponds to the format of the data contained in incoming FlowFiles, the Greenplum connection specifics, and the Greenplum schema and table.
The default load mode for the Connector is to insert data into Greenplum. You can configure the processor to merge or update data instead, and configuration properties for field to column translation and mappings allow you further specify these operations.
You configure the
PutGreenplumRecord processor via the Configure Processor dialog. This dialog includes SETTINGS, SCHEDULING, PROPERTIES, and COMMENTS tabs.
The SETTINGS tab specifies FlowFile routing and timeouts for the processor. You can also use this tab to change the name of the processor and enable/disable the processor.
Settings Tab in the Apache NiFi User Guide describes the configuration options on this tab.
The SCHEDULING tab specifies the scheduling strategy, run schedule, and concurrency options for the processor.
When you set Concurrent Tasks to a value greater than one, the processor runs with the specified number of threads. The single
PutGreenplumRecord processor instance will process multiple flow files concurrently, each managed by its own session.
Scheduling Tab in the Apache NiFi User Guide describes the configuration properties on this tab.
The PROPERTIES tab of the Configure Processor dialog identifies the
PutGreenplumRecord processor configuration properties.
The Connector utilizes default values for many of the
PutGreenplumRecord properties. You are required to set the
Greenplum Adapter, and Greenplum
Table Name property values.
PutGreenplumRecord processor configuration properties are listed and further described in the table and topics below:
|Property Name||Description||Default Value|
|Record Reader||The controller service that deserializes the input FlowFile. Required.|
|Greenplum Adapter||The controller service that identifies and manages the Greenplum Database and Greenplum Streaming Server connection parameters. Required.|
|Schema Name||The name of the Greenplum Database schema in which the target table resides. Required.||public|
|Table Name||The name of the target Greenplum table in which to load the data. Required.|
|Operation Type||The type of load operation: INSERT, UPDATE, or MERGE. Required.||INSERT|
|Match Columns||The Greenplum table columns to match with the FlowFile record data. Required for the UPDATE and MERGE operation types.|
|Translate Field Names||Boolean value that specifies if the Connector translates input FlowFile field names to Greenplum table column names. When
|Unmatched Field Behavior||Specifies the Connector’s behavior when an incoming FlowFile record has a field that does not map to a column in the Greenplum table.||Ignore Unmatched Fields|
|Unmatched Column Behavior||Specifies the Connector’s behavior when an incoming FlowFile record does not have a field mapping for every one of the Greenplum table columns.||Fail on Unmatched Columns|
|Rollback On Failure||Boolean value that specifies whether or not the Connector should roll back when it encounters an error processing a FlowFile.||false|
|Maximum Record Batch Size||Specifies the maximum number of records in each batch of data that the Connector will write to Greenplum. The Connector stores the batch in memory until it reaches this size.||0 (write all records in a single transaction)|
The Connector supports inserting, merging, and updating records from a FlowFile into a Greenplum Database table. You use the
Operation Type property to specify the load mode:
|INSERT||Insert records as new rows into the Greenplum table (the default mode).|
|Use operation.type Attribute||Obtain the load mode from an
Operation Type is UPDATE or MERGE, you must specify one or more
Match Columns, a comma-separated list of column names that uniquely identifies a row in the Greenplum table. The Connector ignores the
Match Columns property when the
Operation Type is INSERT.
The Connector exposes properties that allow you to choose how you want the Connector to map FlowFile record fields to Greenplum Database table columns.
Translate Field Name property is a boolean value that specifies if the Connector translates field names in the FlowFile record into column names in the Greenplum table. The default value is
true; the processor uses case-insensitive matching and ignores underscores when it translates field names into column names. When the value is
false, the FlowFile field names must match the Greenplum table column names exactly, or the column value will not be updated.
When an incoming FlowFile record has a field that does not map to any of the columns in the Greenplum table, set the
Unmatched Field Behaviour property to specify how the Connector should handle the situation:
Ignore Unmatched Fields- (the default) The Connector ignores any field in the FlowFile record that cannot be mapped to a column in the Greenplum table.
Fail on Unmatched Fields- The Connector routes the FlowFile to the failure relationship when the record has any field that cannot be mapped to a column in the table.
- Reference Parameter
If an incoming FlowFile record does not have a field mapping for every one of the columns in the Greenplum table, set the
Unmatched Column Behavior property to specify how the Connector should handle the situation:
Ignore Unmatched Columns- The Connector assumes that a column in the table that does not have a matching field in the record is not required.
Warn on Unmatched Columns- The Connector assumes that a column in the table that does not have a matching field in the record is not required, and the Connector logs a warning.
Fail on Unmatched Columns- (the default) A flow fails when a column exists in the table and there is no matching field in the record. The Connector also logs an error.
- Reference Parameter
The Connector distinguishes between the transient and the non-recoverable errors that it encounters. Transient errors are those that may succeed on a later retry, such as a connection attempt to Greenplum Database. Conversely, a FlowFile that contained bad input data would continue to fail when retried.
The Connector applies success or failure at the FlowFile level. That is, the Connector considers a write operation successful if all records in a single FlowFile are written to the Greenplum Database table with no errors. If a single record in the FlowFile fails to write for some reason (say the data is malformed), none of the records in the FlowFile are written to Greenplum, and the Connector considers the operation failed.
Rollback On Failure is a boolean property that specifies whether or not the Connector rolls back the NiFi session when it encounters a failure processing a FlowFile.
Rollback On Failure setting is
false. When the Connector encounters an error while processing a FlowFile, the FlowFile is routed to the
retry relationship based on the error type, and the processor continues processing the next FlowFile.
Rollback On Failure is
true, the Connector:
- Stops further processing a FlowFile when it encounters an error,
- Rolls back the NiFi session; this penalizes the FlowFile and returns it to the incoming queue, and
- Continues processing the next FlowFile.
The rolled back FlowFile may be processed repeatedly by the Connector until it is processed successfully or removed by other means.
Be sure to set an adequate SETTINGS
Yield Duration for the processor to avoid retrying too frequently.
For each FlowFile it receives, the Connector:
- Opens and prepares the table for writing,
- Performs one or more writes, and
- Closes/commits the write.
The maximum number of records in a write call that the Connector makes to the Greenplum Streaming Server is determined by the
Maximum Record Batch Size that you specify for the processor.
The default value is zero (0); there is no limit on the batch size, and the Connector accumulates all FlowFile content in memory before it writes to Greenplum in a single transaction.