Overview of Greenplum Database High Availability
A Greenplum Database system can be made highly available by providing a fault-tolerant hardware platform, by enabling Greenplum Database high-availability features, and by performing regular monitoring and maintenance procedures to ensure the health of all system components.
Hardware components will eventually fail, whether due to normal wear or an unexpected circumstance. Loss of power can lead to temporarily unavailable components. A system can be made highly available by providing redundant standbys for components that can fail so that services can continue uninterrupted when a failure does occur. In some cases, the cost of redundancy is higher than users’ tolerance for interruption in service. When this is the case, the goal is to ensure that full service is able to be restored, and can be restored within an expected timeframe.
With Greenplum Database, fault tolerance and data availability is achieved with:
- Hardware level RAID storage protection
- Data storage checksums
- Greenplum segment mirroring
- Master mirroring
- Dual clusters
- Database backup and restore
A best practice Greenplum Database deployment uses hardware level RAID to provide high performance redundancy for single disk failure without having to go into the database level fault tolerance. This provides a lower level of redundancy at the disk level.
Greenplum Database uses checksums to verify that data loaded from disk to memory has not been corrupted on the file system.
Greenplum Database has two kinds of storage for user data: heap and append-optimized. Both storage models use checksums to verify data read from the file system and, with the default settings, they handle checksum verification errors in a similar way.
Greenplum Database master and segment database processes update data on pages in the memory they manage. When a memory page is updated and flushed to disk, checksums are computed and saved with the page. When a page is later retrieved from disk, the checksums are verified and the page is only permitted to enter managed memory if the verification succeeds. A failed checksum verification is an indication of corruption in the file system and causes Greenplum Database to generate an error, cancelling the transaction.
The default checksum settings provide the best level of protection from undetected disk corruption propagating into the database and to mirror segments.
Heap checksum support is enabled by default when the Greenplum Database cluster is initialized with the
gpinitsystem management utility. Although it is strongly discouraged, a cluster can be initialized without heap checksum support by setting the
HEAP_CHECKSUM parameter to off in the
gpinitsystem cluster configuration file. See gpinitsystem.
Once initialized, it is not possible to change heap checksum support for a cluster without reinitializing the system and reloading databases.
You can check the read-only server configuration parameter data_checksums to see if heap checksums are enabled in a cluster:
$ gpconfig -s data_checksums
When a Greenplum Database cluster starts up, the
gpstart utility checks that heap checksums are consistently enabled or disabled on the master and all segments. If there are any differences, the cluster fails to start. See gpstart.
In cases where it is necessary to ignore heap checksum verification errors so that data can be recovered, setting the ignore_checksum_failure system configuration parameter to on causes Greenplum Database to issue a warning when a heap checksum verification fails, but the page is then permitted to load into managed memory. If the page is updated and saved to disk, the corrupted data could be replicated to the mirror segment. Because this can lead to data loss, setting
ignore_checksum_failure to on should only be done to enable data recovery.
For append-optimized storage, checksum support is one of several storage options set at the time an append-optimized table is created with the
CREATE TABLE command. The default storage options are specified in the
gp_default_storage_options server configuration parameter. The
checksum storage option is enabled by default and disabling it is strongly discouraged.
If you choose to disable checksums for an append-optimized table, you can either
- change the
gp_default_storage_optionsconfiguration parameter to include
checksum=falsebefore creating the table, or
- add the
checksum=falseoption to the
WITH storage\_optionsclause of the
Note that the
CREATE TABLE statement allows you to set storage options, including checksums, for individual partition files.
Greenplum Database stores data in multiple segment instances, each of which is a Greenplum Database PostgreSQL instance. The data for each table is spread between the segments based on the distribution policy that is defined for the table in the DDL at the time the table is created. When segment mirroring is enabled, for each segment instance there is a primary and mirror pair. The mirror segment is kept up to date with the primary segment using Write-Ahead Logging (WAL)-based streaming replication. See Overview of Segment Mirroring.
The mirror instance for each segment is usually initialized with the
gpinitsystem utility or the
gpexpand utility. As a best practice, the mirror runs on a different host than the primary instance to protect from a single machine failure. There are different strategies for assigning mirrors to hosts. When choosing the layout of the primaries and mirrors, it is important to consider the failure scenarios to ensure that processing skew is minimized in the case of a single machine failure.
There are two master instances in a highly available cluster, a primary and a standby. As with segments, the master and standby should be deployed on different hosts so that the cluster can tolerate a single host failure. Clients connect to the primary master and queries can be run only on the primary master. The standby master is kept up to date with the primary master using Write-Ahead Logging (WAL)-based streaming replication. See Overview of Master Mirroring.
If the master fails, the administrator runs the
gpactivatestandby utility to have the standby master take over as the new primary master. You can configure a virtual IP address for the master and standby so that client programs do not have to switch to a different network address when the current master changes. If the master host fails, the virtual IP address can be swapped to the actual acting master.
An additional level of redundancy can be provided by maintaining two Greenplum Database clusters, both storing the same data.
Two methods for keeping data synchronized on dual clusters are “dual ETL” and “backup/restore.”
Dual ETL provides a complete standby cluster with the same data as the primary cluster. ETL (extract, transform, and load) refers to the process of cleansing, transforming, validating, and loading incoming data into a data warehouse. With dual ETL, this process is run twice in parallel, once on each cluster, and is validated each time. It also allows data to be queried on both clusters, doubling the query throughput. Applications can take advantage of both clusters and also ensure that the ETL is successful and validated on both clusters.
To maintain a dual cluster with the backup/restore method, create backups of the primary cluster and restore them on the secondary cluster. This method takes longer to synchronize data on the secondary cluster than the dual ETL strategy, but requires less application logic to be developed. Populating a second cluster with backups is ideal in use cases where data modifications and ETL are performed daily or less frequently.
Making regular backups of the databases is recommended except in cases where the database can be easily regenerated from the source data. Backups should be taken to protect from operational, software, and hardware errors.
Use the gpbackup utility to backup Greenplum databases.
gpbackup performs the backup in parallel across segments, so backup performance scales up as hardware is added to the cluster.
When designing a backup strategy, a primary concern is where to store the backup data. The data each segment manages can be backed up on the segment’s local storage, but should not be stored there permanently—the backup reduces disk space available to the segment and, more importantly, a hardware failure could simultaneously destroy the segment’s live data and the backup. After performing a backup, the backup files should be moved from the primary cluster to separate, safe storage. Alternatively, the backup can be made directly to separate storage.
Using a Greenplum Database storage plugin with the
gprestore utilities, you can send a backup to, or retrieve a backup from a remote location or a storage appliance. Greenplum Database storage plugins support connecting to locations including Amazon Simple Storage Service (Amazon S3) locations and Dell EMC Data Domain storage appliances.
Using the Backup/Restore Storage Plugin API you can create a custom plugin that the
gprestore utilities can use to integrate a custom backup storage system with the Greenplum Database.
For information about using
gprestore, see Parallel Backup with gpbackup and gprestore.