I have written a python program to parallel restore greenplum backup in dissimilar number of segment instances. This script has all required steps mentioned in this pivotal documentation.
- Restoring backup in DCA with different configuration
- Old method
- Restoring pg_dump backup (Single Thread)
- ~350 Gb/Hour (Max)
- Backup size: 29TB
- Restore time: ~84 Hours
- Old method
- Taking parallel backup using gpcrondump
- Restoring multiple backup files parallel
- 1.8 TB / Hour
- Backup Size: 29 TB
- Restore time: ~16 Hours
Point to note
- We should have all the uncompressed backup files in single directory. (Data Domain is best option for this)
For testing this script, I created a sample function. This function takes two arguments. One is string and another is integer. As you might have understood this function internally runs
pg_sleep function on second(integer) argument.
1 2 3 4 5 6 7 8 CREATE FUNCTION test_pg_sleep(TEXT, INTEGER) RETURNS VOID AS $$ DECLARE name ALIAS for $1; delay ALIAS for $2; BEGIN PERFORM pg_sleep(delay); END; $$ LANGUAGE 'plpgsql' STRICT;
Below are the backup files which contains statements for above function statements. I have given various values in function to sleep.
Below are the two sessions, In first session I’m running restore script and in second session I’m monitoring pg_stat_activity table.
Below is live recording of above two sessions. You may want to click on fullscreen for better view.
This option is to specify the target database. If target database doesn’t exist in the environment, Script exits immediately.
This option is to specify the target host. The default value is
This option is to specify the backup key generated by
gpcrondump utility. This helps to fetch the backup file list.
This option is to specify the number of parallel processes to run. The default value is 1.
This is to specify the backup files directory.
This script stores its logs in
/home/gpadmin/gpAdminLogs and it generates multiple log files.
This is main log file which stores script progress
This is standard log files of master backup restore
This is error log file of master backup file restore. It is always recommended to check this log after restore.
This is standard log file for segment backup file restore.
This is error log file for segment backup file restore. It is always recommended to check this log after restore.
This is standard log file for post data backup file
I’m not sharing this script here and I have my reasons for it. I’ll try to share it here ASAP.