gpupgrade Pre-upgrade Phase

This topic covers the required preparation steps before running the gpupgrade utility commands. Review these steps early in the process, to help you understand the time needed to prepare the source cluster for a successful upgrade.

**IMPORTANT** The minimum supported 5.x Greenplum Database major version is 5.28. Upgrade the source cluster to 5.28 or the latest 5.x version.

Pre-upgrade Checklist

Review the following pre-upgrade checklist, preferably a few weeks before the upgrade project.

Prepare the Source Cluster

Certain components of Greenplum 5.x cannot be upgraded by the gpupgrade utility. There are also certain configuration items that are not supported from 5.x to 6.x. Follow the recommendations below to prepare the source Greenplum cluster.

Upgrade the source Greenplum Cluster
Upgrade the source Greenplum cluster from your current 5.x version to the latest version you downloaded as part of the Review Pre-upgrade Checklist. For upgrade instructions and process, see the release notes of the version you are upgrading to, for example Tanzu Greenplum 5.28.0 Release notes.

Remove Optional Greenplum Software
Uninstall any previously installed extensions or packages, such as MADlib, PXF, PL/Container, or GPCC. gpupgrade does not upgrade installed extensions. You can re-install the optional software on the target Greenplum system after the upgrade. See Uninstalling Optional Greenplum Software for more information.

Generate the Migration scripts
The gpupgrade utility package includes bash and SQL migration scripts to help resolve potential migration issues from Greenplum 5.x to 6.x. Review the process and generate the SQL script files, ready to execute them during the gpupgrade initialize maintenance window. For more details on the migration scripts, see gpupgrade Migration Scripts.

Review pg_upgrade consistency checks
Review the pg_upgrade Consistency Checks that are run during gpupgrade initialize and gpupgrade execute, and check the source cluster against each one. If any of the scenarios apply, perform the resolution before continuing with the upgrade process. To validate the source cluster against some of these checks, see gpupgrade Migration Scripts.

Remove custom objects/functions
Any custom shared object libraries (.so files) and User Defined Functions (UDFs) need to be noted, removed and re-installed at the target cluster. gpupgrade does not upgrade any custom libraries, functions, or utilities to the target environment.

Perform catalog health check
Run gpcheckcat to ensure that the source catalog is in a healthy state. See the gpcheckcat reference page for further details.

Prepare test queries
Prepare test queries you can use after gpupgrade execute and during the post-upgrade phase, to test and verify that the new installation runs as expected. Your test queries should not create new tables or data.

Review hub/agents ports
During gpupgrade initialize, the utility starts the hub and agent processes on the hosts. The master host hub port defaults to 7527 and the segment hosts agent port defaults to 6416. Ensure that there is no firewall blocking these ports so the hub and agents can communicate with each other. Refer to gpupgrade Architecture for the hub and agent roles during gpupgrade. If the reserved hub and agent ports are already used by any of your applications, assign a different port in the gpupgrade configuration file. See Edit the gpupgrade Configuration file.

Download Required Packages

Download the gpupgrade utility from VMware Tanzu Network. Copy the downloaded utility in all hosts of the source Greenplum Database cluster.

Download the latest 5.x and 6.x Greenplum Database packages. Copy the downloaded .rpm or .deb files to all hosts in your development environment. Install Greenplum 5 using the latest Greenplum 5.x installation guidelines. To install Greenplum 6, see Install the Target Version of Greenplum Database.

Install the gpupgrade Utility

To install the gpupgrade utility on all hosts of the source Greenplum cluster:

  1. Download the gpupgrade file at a location of your choice.

  2. Use yum install (requires root permission) to install in the default /usr/local/bin/ location. Alternatively use rmp as gpadmin to install in a user specified location.

    $ sudo yum install gpugrade.<version>.el7.x86_64.rpm
    

    or

    $ rpm  --prefix=<USER_DIRECTORY> -ivh gpupgrade.<version>.el7.x86_64.rpm
    
  3. (Optional) Copy the gpupgrade configuration file to the gpadmin home directory:

   $ cp /usr/local/bin/greenplum/gpupgrade/gpupgrade_config  /home/gpadmin/
  1. (Optional) Install bash completion for ease of use:
   yum install bash-completion

gpupgrade Log Files

gpupgrade initialize creates the $HOME/.gpupgrade directory on each host to save the gpupgrade state.

The log files for each gpupgrade command are saved in the gpAdminLogs/gpupgrade/ directory in the gpadmin user’s home directory, for example ~/gpAdminLogs/gpupgrade/initialize.log or ~/gpAdminLogs/gpupgrade/execute.log.

Install the Target Version of Greenplum Database

Note: During the gpupgrade Beta program, upgrade to the latest Greenplum Database 5.x first, before proceeding with the Greenplum 6.x installation.

Install the target Greenplum Database package on each Greenplum system host, using the system’s package manager software. For example, for RHEL/CentOS systems, execute the yum command with sudo (or as root):

$ sudo yum install ./greenplum-db-<version>-<platform>.rpm

Change the owner and group of the installed files to gpadmin:

$ sudo chown -R gpadmin:gpadmin /usr/local/greenplum*

gpupgrade supports two upgrade modes, link and copy, with copy being the default. The upgrade data storage capacity requirement depends on the mode selection. Edit your selection in the gpupgrade configuration file, before running gpupgrade initialize.

Copy

This is the default option and has the following characteristics:

  • The data files are copied from source to target cluster and then modified.

  • During an upgrade, the original primary and mirror files remain untouched, therefore a manual recovery to the source cluster is easier and faster. If the upgrade errors during gpupgrade execute, the source cluster can simply point back to the original primaries and mirrors and be brought back up.

  • It is slower since it copies the source data stores to the target cluster.

  • It requires more free disk space (60%) than link mode.

Link

You need to manually specify link mode in the gpupgrade configuration file.

It has the following characteristics:

  • It creates hard links from the target cluster data stores to the source cluster data stores. It then modifies the target data files in place.

  • It’s faster than copy mode, as it does not copy any data from source to target cluster.

  • It requires less free disk space (20%). This space is used to recreate the catalog, which is not hard-linked.

WARNING: In link mode, gpupgrade generates a warning if the source Greenplum cluster does not have a standby host and mirrors:
The source cluster does not have standby and/or mirror segments.
After "gpupgrade execute" has been run, there will be no way to
return the cluster to its original state using "gpupgrade revert".

Next Steps

After reviewing the pre-upgrade topic, downloading and installing all the required software, continue with the gpupgrade Initialize Phase.