Migration Scripts

About the Migration Scripts

Using the optional migration scripts speeds up the upgrade process before you start the gpupgrade initialize phase. The gpupgrade migration scripts are two types, bash and SQL.

The migration bash scripts,gpupgrade-migration-sql-generator.bash and gpupgrade-migration-sql-executor.bash, identify catalog inconsistencies between source and target cluster, and generate the SQL scripts to resolve them. The gpupgrade-migration-sql-generator.bash can be run outside your upgrade downtime window.

The migration SQL scripts are run by gpupgrade-migration-sql-executor.bash during the upgrade downtime window, and resolve a subset of the issues that are found when the gpupgrade initialize command runs the pg_upgrade check command.

The gpupgrade-migration-sql-executor.bash output directory has three subdirectories:

  • pre-initialize - Run the SQL scripts in the pre-initialize subdirectory just before you run the gpupgrade initialize command. Execute these scripts as the first step after you start the downtime scheduled to perform the upgrade.
  • post-finalize - The SQL scripts in the post-finalize subdirectory are run after gpupgrade finalize has finished. These scripts perform tasks such as creating indexes that were dropped by the pre-initialize scripts.
  • post-revert - The SQL scripts in the post-revert subdirectory reverse the changes made by the pre-initialize scripts and are to be executed only if you revert the upgrade. Run them after the gpupgrade revert command has finished.

The gpupgrade-migration-sql-executor.bash script runs all of the SQL scripts in the directory you specify: pre-initialize, post-finalize, or post-revert. See Executing the SQL Migration Scripts.

If you have used yum install to install gpupgrade, the bash scripts are located in /usr/local/bin and the SQL scripts in /usr/local/bin/greenplum/gpupgrade/data-migration-scripts. If you used rpm -prefix=<USER-DIRECTORY> during gpupgrade installation, the bash scripts are in /<USER-DIRECTORY> and the SQL scripts in /<USER-DIRECTORY>/greenplum/gpupgrade/data-migration_scripts.

The `gpupgrade-migration-sql-executor.bash` script must be run during the upgrade downtime window.

Migration issues that are not handled with the generated SQL scripts will be caught by the pg_ugprade check command when you run gpupgrade initialize, and you may have to manually resolve them. See gpupgrade Initialize pg_upgrade Checks for information about the issues that pg_upgrade check detects and potential workarounds.

Issues resolved by the SQL Migration Scripts

Currently, the data migration scripts detect these issues:

  • external tables used in partitioning

  • indexes on partitioned tables

  • tsquery data type. Greenplum Database 5.x contains the tsquery data type. The internal representation of this data type changed between Greenplum Database 5.x and 6.x, so the script changes the tsquery data type to text before proceeding with the upgrade.

  • gphdfs external tables. The gphdfs external table protocol is removed in Greenplum Database 6. These tables must be converted to use PXF. The gpupgrade-migration-sql-generator.bash script generates pre-initialize and post-revert SQL scripts. The pre-initialize scripts drop external gphdfs tables from the databases. The post-revert script recreates them.

  • gphdfs role. The gpupgrade-migration-sql-generator.bash script generates a pre-initialize script that drops the gphdfs role and a post-revert script to restore the role.

  • invalid name user columns. The gpupgrade-migration-sql-generator.bash script generates pre-initialize and post-revert scripts. Because the internal alignment of the name data type changed between the source and target Greenplum versions, user tables with name data type columns cannot be upgraded. The generated pre-initialize SQL script alters the data type of such columns to varchar(63). However, the script cannot handle some cases, such as a name column used as a distribution key or partitioning key, or a name column used as an index key. The post-revert script alters the varchar(63) columns back to name columns.

  • abstime, reltime, tinterval data types. Columns of these data types may not be used in a table’s distribution key. The gpupgrade-migration-sql-generator.bash script generates pre-initialize and post-revert scripts to alter the column type.

  • unique or primary key contraints The gpupgrade-migration-sql-generator.bash script generates pre-initialize, post-finalize, and post-revert SQL scripts. Unique and primary key constraints must be dropped before upgrading and recreated after upgrading. The pre-initialize scripts drop the constraints. The post-finalize scripts recreate them on the source cluster after finalization. The post-revert scripts recreate constraints on the source cluster. The scripts cannot handle the case where a constraint is found on a child of a heap table. In this case you must back up the heap table and recreate it after the upgrade is completed. To restore constraints on child tables as they were in the source cluster, you must create the constraints manually.

Generating the SQL Migration Scripts

Run the gpupgrade-migration-sql-generator.bash script, to generate the SQL scripts:

$ gpupgrade-migration-sql-generator.bash <GPHOME> <PGPORT> <OUTPUT_DIR> [--help | -h]

Where:

  • <GPHOME> is the path to the source Greenplum Database installation directory.
  • <PGPORT> is the source Greenplum system port number.
  • <OUTPUT_DIR> is the user-defined directory where the SQL scripts are created. If the directory does not exist, it is created. If the directory already exists, all existing .sql and .sh files are removed from the pre-initialize, post-finalize, and post-revert subdirectories.
  • -h or --help display help information, for example:

    $ gpupgrade-migration-sql-generator.bash -h
    
    Identifies catalog inconsistencies between the source and target Greenplum versions
    and generates SQL scripts to resolve them. This command should be run prior to "gpupgrade".
    
    Usage: gpupgrade-migration-sql-generator.bash <GPHOME> <PGPORT> <OUTPUT_DIR>
         <GPHOME>     : the path to the source Greenplum installation directory
         <PGPORT>     : the source Greenplum system port number
         <OUTPUT_DIR> : the user-defined directory where the SQL scripts are created
    
    The output directory structure is:
         <output directory>
         + pre-initialize  drop and alter objects prior to "gpupgrade initialize"
         + post-finalize   restore and recreate objects following "gpupgrade finalize"
         + post-revert     restore objects following "gpupgrade revert"
    
    After running gpupgrade-migration-sql-generator.bash, run gpupgrade-migration-sql-executor.bash.
    Run gpupgrade-migration-sql-executor.bash -h for more information.
    

Example

Run the gpupgrade-migration-sql-generator.bash script.

$ gpupgrade-migration-sql-generator.bash $GPHOME 5432 /home/gpadmin/migration-output
Output files are located in: /home/gpadmin/migration-output/pre-initialize
Output files are located in: /home/gpadmin/migration-output/post-finalize
Output files are located in: /home/gpadmin/migration-output/post-revert

You can review the SQL scripts in the output directory to see what changes will be applied when you execute the scripts.

Executing the SQL Migration Scripts

Run gpupgrade-migration-sql-executor.bash to execute the generated pre-initialize, post-finalize, or post-revert SQL scripts. The shell script runs all SQL scripts in the specified directory and its subdirectories, so it is important to specify the pre-initialize, post-finalize, or post-revert subdirectory.

$ gpupgrade-migration-sql-executor.bash <GPHOME> <PGPORT> <INPUT_DIR>

Where:

  • <GPHOME> is the path to the source Greenplum Database installation directory.
  • <PGPORT> is the source Greenplum System port number.
  • <INPUT_DIR> is the directory containing the scripts you want to execute. This is the path to the pre-initialize, post-finalize, or post-revert subdirectory in <OUTPUT_DIR>. Do not specify the <OUTPUT_DIR> without a subdirectory.

After you execute the generated SQL scripts using gpupgrade-migration-sql-generator.bash you can regenerate SQL scripts into the same <OUTPUT_DIR>. Regenerating the scripts in the same output directory will archive the previous run scripts in <OUTPUT_DIR>/archive/. The scripts will have to differentiate each run.

Examples:

  • Run the pre-initialize migration scripts. Caution: Only run the pre-initialize scripts after you have started the gpupgrade downtime window.

    $ gpupgrade-migration-sql-executor.bash $GPHOME 5432 /home/gpadmin/migration-output/pre-initialize
    
  • If you choose to revert the upgrade: After the gpupgrade revert command has completed, run the generated post-revert SQL scripts.

    $ gpupgrade-migration-sql-executor.bash $GPHOME 5432 /home/gpadmin/migration-output/post-revert
    
  • If you choose to finalize the upgrade: After the gpupgrade finalize command has completed, run the post-finalize migration scripts.

    $ gpupgrade-migration-sql-executor.bash $GPHOME 5432 /home/gpadmin/migration-output/post-finalize