Setup Continuous Archiving and Point-in-Time Recovery (PITR) with PostgreSQL DB.
PostgreSQL is a powerful open-source relational database management system. It is widely used in various industries and applications due to its reliability, security, and scalability. One of its most useful features is Point-In-Time-Recovery (PITR), which allows you to recover your database to a specific point in time. PITR is essential in disaster recovery scenarios because it can minimize data loss.
Here are the steps to set up PITR with PostgreSQL:
Enable Archiving In the postgresql.conf file, set the archive_mode configuration parameter to on. This will enable archiving.
Configure Archive Command
In the postgresql.conf file, set the archive_command configuration parameter to a command that will copy the WAL files to a safe location. For example, you can use the following command:
# PostgreSQL Version
PostgreSQL 13.10
# directory structure, I used in this tutorial
/u01 - PostgreSQL Data Directory Mount
/u02 - Wal Archives Mount
/backup - Backups Mount
# data dir
/u01/h1_dev_db/data
# create directory for wal_archives
mkdir -p /u02/hl_dev_db/wal_archive
chown postgres:postgres -R /u02/hl_dev_db/wal_archive
vim /u01/h1_dev_db/data/postgresql.conf
...
archive_mode = on
archive_command = 'cp %p /u02/hl_dev_db/wal_archive/%f'
...
This command will copy the WAL file to /u02/hl_dev_db/wal_archive/ directory.
Test PITR
- Create a Full Backup Take a base backup of your database using the following command:
export DATESTAMP=$(date +%Y-%m-%d)
mkdir -p /backup/hl_dev_db_full_bkp_$DATESTAMP
chown postgres:postgres -R /backup
# Create full backup using pg_basebackup
pg_basebackup -D /backup/hl_dev_db_full_bkp_2023-03-09 -Ft -z -P -Xs
# -F format = TAR ( t )
# -z compression = gzip
# -P Progress
# -Xs wal method = stream
This will create a backup folder at /backup/hl_dev_db_full_bkp_2023–03–09.
- Generate Dummy Data
# Get Current Timestamp
psql -U postgres -c "SELECT CURRENT_TIMESTAMP;"
current_timestamp
----------------------------------
2023-03-08 15:43:15.644345+05:30
# NOTE: We started the work on db at 15:43
# Generate Some Data
---dummy_data_01.sh
set -e
set -x
psql -v ON_ERROR_STOP=1 -U postgres <<-EOSQL
/* employees table */
CREATE TABLE EMPLOYEES (id SERIAL PRIMARY KEY, first_name VARCHAR(50), last_name VARCHAR(50), email VARCHAR(50), mobile_no BIGINT, date_of_birth DATE);
/* random series function */
CREATE FUNCTION get_random_string() RETURNS TEXT LANGUAGE SQL AS \$\$ SELECT STRING_AGG ( SUBSTR ( 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', CEIL (RANDOM() * 52)::integer, 1), '') FROM GENERATE_SERIES(1, 10)\$\$;
/* insert record function */
CREATE FUNCTION insert_record() RETURNS VOID LANGUAGE PLPGSQL AS \$\$DECLARE first_name TEXT= INITCAP(get_random_string());
DECLARE last_name TEXT= INITCAP(get_random_string());
DECLARE email TEXT= LOWER(CONCAT(first_name, '.', last_name, '@gmail.com'));
DECLARE mobile_no BIGINT=CAST(1000000000 + FLOOR(RANDOM() * 9000000000) AS BIGINT);
DECLARE date_of_birth DATE= CAST( NOW() - INTERVAL '100 year' * RANDOM() AS DATE);
BEGIN
INSERT INTO EMPLOYEES (first_name, last_name, email, mobile_no, date_of_birth) VALUES (first_name, last_name, email, mobile_no, date_of_birth);
END;
\$\$;
/* generate 1000000 rows of data */
SELECT insert_record() FROM GENERATE_SERIES(1, 1000000);
EOSQL
---
bash dummy_data_01.sh
# Get Counts
psql -U postgres -c "SELECT COUNT(*) from EMPLOYEES;"
count
---------
1000000
(1 row)
# DB SIZE
psql -U postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
133MB
# Relation SIZE
psql -U postgres -c "SELECT pg_size_pretty( pg_total_relation_size('employees'));"
126MB
psql -U postgres -c "SELECT CURRENT_TIMESTAMP;"
current_timestamp
----------------------------------
2023-03-08 15:44:13.541959+05:30
# NOTE: At 15:45 we only have employees table with size 126MB
# Now create a new database and data to it
psql -U postgres -c "create database test_db;"
# dummy_data_02
psql -U postgres test_db -c "CREATE TABLE peoples (
id BIGINT GENERATED ALWAYS AS IDENTITY,
PRIMARY KEY(id),
hash_firstname TEXT NOT NULL,
hash_lastname TEXT NOT NULL,
gender VARCHAR(6) NOT NULL CHECK (gender IN ('male', 'female'))
);
INSERT INTO peoples (hash_firstname, hash_lastname, gender)
SELECT md5(RANDOM()::TEXT), md5(RANDOM()::TEXT), CASE WHEN RANDOM() < 0.5 THEN 'male' ELSE 'female' END FROM generate_series(1, 1000000);"
psql -U postgres test_db -c "SELECT COUNT(*) from peoples;"
count
---------
1000000
(1 row)
psql -U postgres -c "SELECT CURRENT_TIME;"
current_time
-----------------------
15:46:05.012252+05:30
(1 row)
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
test_db | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | | 137 MB | pg_default |
postgres | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | | 133 MB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(4 rows)
- Database Recovery
CASE1: Restore the Recovery Target Time
# create new pg_data dir
mkdir -p /u01/hl_dev_db_recovery/data
chown postgres:postgres -R /u01/hl_dev_db_recovery/data
chmod 0750 -R /u01/hl_dev_db_recovery/data
# Untar the backup
tar -xvf /backup/hl_dev_db_full_bkp_2023-03-09/base.tar.gz -C /u01/hl_dev_db_recovery/data/
tar -xvf /backup/hl_dev_db_full_bkp_2023-03-09/pg_wal.tar.gz -C /u01/hl_dev_db_recovery/data/pg_wal/
# For Instance, you want to restore the database for Timestamp 15:45
# at 15:45 : we only have employees table in postgres db
# add the recovery settings to postgresql
# Adding recovery settings
vim /u01/hl_dev_db_recovery/data/postgresql.conf
---
restore_command = 'cp /u02/hl_dev_db/wal_archive/%f "%p"'
recovery_target_time = '2023-03-08 15:45:00'
---
# start the database
pg_ctl start -D /u01/hl_dev_db_recovery/data -w -t 300 -l logfile
# check logs for recovery process
tail -200f /u01/hl_dev_db_recovery/data/log/postgresql-Thu.log
# check database
psql -U postgres -c "\l+"
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
postgres | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | | 133 MB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(4 rows)
# Get Counts
psql -U postgres -c "SELECT COUNT(*) from EMPLOYEES;"
count
---------
1000000
(1 row)
# DB SIZE
psql -U postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
133MB
# Relation SIZE
psql -U postgres -c "SELECT pg_size_pretty( pg_total_relation_size('employees'));"
126MB
CASE2 : Restore to Recovery Target
# stop database
pg_ctl stop -D /u01/hl_dev_db_recovery/data
# change time in postgresql.conf
vim /u01/hl_dev_db_recovery/data/postgresql.conf
---
recovery_target = 'immediate'
---
# start the database
pg_ctl start -D /u01/hl_dev_db_recovery/data -w -t 300 -l logfile
# check logs for recovery process
tail -200f /u01/hl_dev_db_recovery/data/log/postgresql-Thu.log
# check counts
psql -U postgres test_db -c "SELECT COUNT(*) from peoples;"
count
---------
1000000
(1 row)
# check database
psql -U postgres -c "\l+"
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
test_db | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | | 137 MB | pg_default |
postgres | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | | 133 MB | pg_default | default administrative connection database
template0 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | en_IN.UTF-8 | en_IN.UTF-8 | =c/postgres +| 7777 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(4 rows)
That’s it! You have now set up PITR with PostgreSQL. This feature is very helpful in case of a disaster. You can recover your database to a specific point in time and minimize the data loss.
Find something useful? Hold down the 👏 to support and help others find this article. Thanks for reading!!