gpupgrade Initialize Phase

During this phase you run the gpupgrade initialize command. This phase prepares the source cluster for the upgrade and initializes the target cluster. Before proceeding, ensure you have reviewed and completed the pre-upgrade phase tasks.

Perform the initialize phase during a scheduled downtime. Users should receive sufficient notice that the Greenplum Database cluster will be off-line for an extended period. Send a maintenance notice a week or more before you plan to start the execute phase, and then a reminder notice before you begin.

The following table summarises the cluster state before and after gpupgrade initialize:

Before Initialize After Initialize
Source Target Source Target
Master UP Non Existent UP Initialized but DOWN
Standby UP Non Existent UP Non Existent
Primaries UP Non Existent UP Initialized but DOWN
Mirrors UP Non Existent UP Non Existent

Preparing to Initialize the Upgrade

**IMPORTANT** The minimum supported 5.x Greenplum Database major version is 5.28. Upgrade the source cluster to 5.28 or the latest 5.x version.

  1. Check if gpupgrade initialize has been run in the past by checking if the $HOME/.gpupgrade directory already exists on all Greenplum hosts. If it exists, clean up the environment by running gpupgrade revert on the master host; the command will remove all previously created directories from all source Greenplum hosts.

  2. Ensure that the source Greenplum cluster is in a healthy state, with standby host and mirrors in their preferred roles. If they are not, gpupgrade initialize will fail during the consistency checks. For further details, see gpstate. Verify the cluster state:

    $ gpstate -e
    

    For incremental recovery, run:

    $ gprecoverseg -a
    

    For full recovery:

    $ gprecoverseg -F
    

    To rebalance:

    $ gprecoverseg -r
    
  3. Review and generate the gpupgrade migration scripts. The gpupgrade utility package includes SQL migration scripts to help resolve potential migration issues from Greenplum 5.x to 6.x.

In a gpupgrade initialize maintenance window, run the gpupgrade-migration-sql-executor.bash pre-initialize script before proceeding with the gpupgrade initialize command. For more details on the migration scripts, see gpupgrade Migration Scripts.

Running gpupgrade Initialize

Edit the gpupgrade configuration file

The gpupgrade initialize command requires a configuration file as an input. Review an example gpupgrade_config file in the directory where you extracted the downloaded gpupgrade utility.

Copy the example file to the /home/gpadmin/ location and make edits according to your environment: bash $ cp /usr/local/bin/greenplum/gpupgrade/gpupgrade_config /home/gpadmin/

The following table summarizes the file parameters. source_gphome, target_gphome, and source_master_port are blank and must be filled with your environment’s values. The remaining parameters are commented-out and have default values. Change these values as necessary for your upgrade scenario.

Parameter  Description
source_gphome Path to the $GPHOME directory of the source Greenplum installation.
target_gphome Path to the $GPHOME directory of the target Greenplum installation.
mode copy or link, copy is used by default. Uncomment to use link mode.
disk_free_ratio Ratio of free space needed to run gpupgrade, based on the mode. Ranges from 0.0 to 1.0. Default is 0.6 (60%) free space for copy mode; 0.2 (20%) for link mode.
use_hba_hostnames Whether to use host names or IP addresses in gpinitsystem and other utilities. Should match the HBA_HOSTNAMES configuration parameter configured in the gpinitsystem_config file, used during the Greenplum Database initialization. By default set to false. For more information about HBA_HOSTNAMES, see gpinitsystem.
source_master_port The master port of the source Greenplum cluster. Provide your source cluster value.
temp_port_range Set of ports to use when initializing the target cluster. Default is 50432-65535 and allocation starts from 50432 onwards.
hub_port Master node port, that gpupgrade uses; set to 7527 by default.
agent_port Segment nodes port, that the agent process uses; set to 6416 by default.

See the gpupgrade_config file reference page for further details.

WARNING: If using link mode, and the source Greenplum cluster does not have a standby host and mirrors, gpupgrade generates a warning:
The source cluster does not have standby and/or mirror segments.
After "gpupgrade execute" has been run, there will be no way to
return the cluster to its original state using "gpupgrade revert".

Run Initialize

Run:

gpupgrade initialize --file | -f PATH/TO/gpupgade_config [--verbose | -v] [--automatic | -a]

Where:

  • --file | -f specifies the configuration file location
  • --verbose | -v is the flag for verbose output
  • --automatic | -a suppress summary and confirmation dialog

For example:

$ gpupgrade initialize --file /home/gpdamin/gpupgrade/gpupgrade_configuration_file --verbose

The utility displays a summary message and waits for user confirmation before proceeding:

You are about to initialize a major-version upgrade of Greenplum.
This should be done only during a downtime window.

...

Before proceeding, ensure the following have occurred:
 - Take a backup of the source Greenplum cluster
 - [Generate] and [execute] the data migration "start" scripts
 - Run gpcheckcat to ensure the source catalog has no inconsistencies
 - Run gpstate -e to ensure the source cluster's segments are up and in preferred roles

To skip this summary, use the --automatic | -a  flag.

Continue with gpupgrade initialize?  Yy|Nn:

The utility proceeds through various background steps, and displays its progress on the screen:

Initialize in progress.

Starting gpupgrade hub process...                                  [COMPLETE]
Saving source cluster configuration...                             [COMPLETE]
Starting gpupgrade agent processes...                              [COMPLETE]
Checking disk space...                                             [COMPLETE]
Generating target cluster configuration...                         [COMPLETE]
Creating target cluster...                                         [COMPLETE]
Stopping target cluster...                                         [COMPLETE]
Backing up target master...                                        [COMPLETE]
Running pg_upgrade checks...                                       [COMPLETE]

Initialize completed successfully.

NEXT ACTIONS
------------
To proceed with the upgrade, run "gpupgrade execute"
followed by "gpupgrade finalize".

To return the cluster to its original state, run "gpupgrade revert".

The status of each step can be COMPLETE, FAILED, SKIPPED, or IN PROGRESS. SKIPPED indicates that the command has been run before and the step has already been executed.

These steps are further described below:

  • Creating directories: Creates gpupgrade state directories, used only by the gpupgrade utility. They reside under $HOME/.gpupgrade/. The log files for the individual gpupgrade commands are created in the $HOME/gpAdminLogs/gpupgrade directory, for example $HOME/gpAdminLogs/gpupgrade/initialize.log.
  • Generating upgrade configuration: Collects the source cluster configuration details and generates gpupgrade state files to hold the source configuration.
  • Starting gpupgrade hub process: Starts up the gpupgrade hub process on the master node.
  • Retrieving source cluster configuration: Populates the gpupgrade state files with the source cluster details.
  • Starting gpupgrade agent processes: Starts up agents on the standby master and segment hosts.
  • Checking disk space: Checks for available disk space. The default requirement is 60% free disk space. If ‑‑link is specified, the requirement is 20%. Can be altered by providing a different ratio with ‑‑disk-free-ratio.
  • Generating target cluster configuration: Populates the gpupgrade state files with the target cluster details.
  • Backing up target master: Creates a backup copy of target master, to be used during execute if any issues occur.
  • Creating target cluster: Initializes the target master and segment hosts, in order to run pg_upgrade on the postgres instances. See Creating Target Cluster Directories for a description of the target cluster data directories.
  • Stopping target cluster: Shuts down the target cluster.
  • Running pg_upgrade checks: Runs a thorough list of Greenplum Database checks, see Initialize Phase pg_upgrade Checks.

The gpupgrade initialize command will create the $HOME/.gpupgrade directory on each host.

To resolve any [FAILED] steps, review the screen error comments and recommendations, the server log files in the $HOME/gpAdminLogs directory, including the gpupgrade initialize log file in the gpAdminLogs/gpupgrade/ directory, and discuss with the VMware Greenplum team that’s supporting you during the upgrade.

Creating Target Cluster Directories

When the gpupgrade initialize command creates the target Greenplum cluster, it creates data directories for the target master segment instance and primary segment instances on the master and segment hosts, alongside the source cluster data directories. This applies both to copy or link mode.

The target cluster data directory names have this format:

<segment-prefix>.<hash-code>.<content-id>

Where:

  • <segment-prefix> is the segment prefix string specified when the source Greenplum Database system was initialized. This is typically gpseg.
  • <hash-code> is a 10-character string generated by gpupgrade. The hash code is the same for all segment data directories belonging to the new target Greenplum cluster. In addition to distinguishing target directories from the source data directories, the unique hash code tags all data directories belonging to the current gpupgrade instance.
  • <content-id> is the database content id for the segment. The master segment instance content id is always −1. The primary segment content ids are numbered consecutively from 0 to the number of primary segments.

For example, if the $MASTER_DATA_DIRECTORY environment variable value is /data/master/gpseg-1/, the data directory for the target master is /data/master/gpseg.AAAAAAAAAA.-1, where AAAAAAAAAA is the hash code gpupgrade generated for this target cluster. Primary segment data directories for the target cluster are located on the same host and at the same path as their source cluster counterparts. If the first primary segment for the source cluster is on host sdw1 in the directory /data/primary/gpseg0, the target cluster segment directory is on the same host at /data/primary/gpseg.AAAAAAAAAA.0.

When the gpugprade finalize command has completed, source cluster data directory names are renamed as:

<segment-prefix>.<hash-code>.<content-id>.old

and the target cluster data directory names are renamed to the original source directory names:

<segment-prefix><content-id>

Troubleshooting the Initialize Phase

The gpupgrade hub process runs on the Greenplum Database master host and logs messages to the gpAdminLogs/gpupgrade/initialize.log file, in the gpadmin user’s home directory.

Could not create the gpAdminLogs/gpupgrade/initialize.log file
Make sure you are logged in as gpadmin and that all files in the gpAdminLogs and .gpupgrade directories are owned by gpadmin and are writable by gpadmin.

Missing extensions
If your Greenplum 5.x cluster has installed extensions, like MADlib, PL/Container or PostGIS, the gpupgrade initialize checks will fail until you reinstall the missing extensions on the target Greenplum Database.

Missing source cluster configuration
Ensure that the source Greenplum Database is running before running initialize.

Next Steps

After successfully running gpupgrade initialize, continue with the gpupgrade Execute Phase or gpupgrade revert.