其工作原理跟Oracle备库一样, Postgresql数据库也支持将所有操作记录到日志文件(write-ahead log, WAL), 支持定时归档日志文件, 传输到备库服务器上, 然后在备库服务器上应用归档日志, 连续进行基于时间点的恢复(Point-In-Time Recovery), 这种高可用功能被称为warm standby或log shipping.
1. 设置服务器之间ssh认证, 用于scp拷归档日志
2. 主库pgdb01开启日志归档/传输功能
archive_mode:on表示开启归档功能,off表示关闭. 重启后此参数才能生效
3. 检查日志传输情况
4. 备份主库, 恢复到备库服务器上
备库pgdb02停止数据库实例
pg_ctl stop -m fast
删除data目录
rm -rf /var/lib/pgsql/data
pgdb01备份主库, 传输至备库
psql -c "checkpoint;select pg_start_backup('dbbackup');"
tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
psql -c "select pg_stop_backup();"
$ psql -c "checkpoint;select pg_start_backup('dbbackup');"
pg_start_backup
-----------------
2/820000B0
(1 row)
$ tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
$ ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
$ psql -c "select pg_stop_backup();"
pg_stop_backup
----------------
2/82000118
(1 row)
备份产生的.backup文件记录了备份信息
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 65624
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres 244 Oct 12 07:56 000000010000000200000082.000000B0.backup
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 000000010000000200000083
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000084
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000085
drwxr-xr-x 2 postgres postgres 4096 Oct 12 07:56 archive_status
/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.000000B0.backup.done
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.done
$ cat /var/lib/pgsql/data/pg_xlog/000000010000000200000082.000000B0.backup
START WAL LOCATION: 2/820000B0 (file 000000010000000200000082)
STOP WAL LOCATION: 2/82000118 (file 000000010000000200000082)
CHECKPOINT LOCATION: 2/820000B0
START TIME: 2009-10-12 07:55:13 UTC
LABEL: dbbackup
STOP TIME: 2009-10-12 07:56:28 UTC
表示是从000000B0这个位置开始备份的, 解释见24.3.2. Making a Base Backup
5. 配置备库
备库上不需要归档, 所以修改备库配置文件, 关闭归档
sed -i '/^archive_/s/^/#/g' /var/lib/pgsql/data/postgresql.conf
安装postgresql contrib软件包, 使用其中的pg_standby辅助工具
sudo rpm -Uvh http://ftp9.us.postgresql.org/pub/mirrors/postgresql/binary/v8.3.7/linux/rpms/redhat/rhel-5-x86_64/postgresql-contrib-8.3.7-1PGDG.rhel5.x86_64.rpm
编辑恢复配置文件recovery.conf
cat >/var/lib/pgsql/data/recovery.conf <<'EOF'
restore_command = 'pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/archlog %f %p %r 2>>pg_log/standby.log.$(date +%%Y%%m%%d)'
EOF
-l: 创建归档日志的符号链接, 而不是拷贝
-d: 打开调试信息
-s 2: 每2秒检查一次是否有新的归档日志
-t /tmp/pgsql.trigger.5432: 一旦发现存在触发器文件, 将终止恢复过程, 然后打开备库
各选项解释见F.23. pg_standby
启动备库
pg_ctl start
日志显示已恢复了所有归档日志, 等待着下一个日志
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 00000001.history
WAL file path : /var/lib/pgsql/archlog/00000001.history
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 000000000000000000000000 and later
running restore : OK
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 000000010000000200000082.000000B0.backup
WAL file path : /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 000000000000000000000000 and later
running restore : OK
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 000000010000000200000082
WAL file path : /var/lib/pgsql/archlog/000000010000000200000082
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082" "pg_xlog/RECOVERYXLOG"
Keep archive history : 000000000000000000000000 and later
running restore : OK
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 000000010000000200000083
WAL file path : /var/lib/pgsql/archlog/000000010000000200000083
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000083" "pg_xlog/RECOVERYXLOG"
Keep archive history : 000000010000000200000082 and later
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
...
...
...
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 8
drwxr-xr-x 2 postgres postgres 4096 Oct 12 07:55 archive_status
lrwxrwxrwx 1 postgres postgres 63 Oct 12 08:11 RECOVERYHISTORY -> /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup
/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
[postgres@pgdb02.cloud101.bqa4.blurdev.com ~]$ ls -lR /var/lib/pgsql/archlog/
/var/lib/pgsql/archlog/:
total 65620
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres 244 Oct 12 07:56 000000010000000200000082.000000B0.backup
pg_xlog目录下RECOVERYHISTORY文件不知有什么用
archlog目录下归档文件由postgresql维护, 成功恢复后, 会自动删除一些旧的
6. 打开备库
备库处在恢复状态时不能进行任何操作, 必须打开数据库后才能查询SQL
$ psql
psql: FATAL: the database system is starting up
当pg_standby检测到触发器文件, 就会终止恢复过程, 打开数据库
touch /tmp/pgsql.trigger.5432
日志:
...
...
...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...trigger file found
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 000000010000000200000085
WAL file path : /var/lib/pgsql/archlog/000000010000000200000085
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000085" "pg_xlog/RECOVERYXLOG"
Keep archive history : 000000000000000000000000 and later
running restore : OK
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 00000002.history
WAL file path : /var/lib/pgsql/archlog/00000002.history
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 000000000000000000000000 and later
running restore : OK
Trigger file : /tmp/pgsql.trigger.5432
Waiting for WAL file : 00000001.history
WAL file path : /var/lib/pgsql/archlog/00000001.history
Restoring to... : pg_xlog/RECOVERYHISTORY
Sleep interval : 2 seconds
Max wait interval : 0 forever
Command for restore : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history : 000000000000000000000000 and later
running restore : OK
恢复过程停止后, recovery.conf被自动改名为recovery.done
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 16412
lrwxrwxrwx 1 postgres postgres 47 Oct 12 09:28 000000020000000200000085 -> /var/lib/pgsql/archlog/000000010000000200000085
-rw------- 1 postgres postgres 16777216 Oct 12 09:28 000000020000000200000086
-rw------- 1 postgres postgres 74 Oct 12 09:28 00000002.history
drwxr-xr-x 2 postgres postgres 4096 Oct 12 09:28 archive_status
/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 09:28 00000002.history.ready
$ ls -lR /var/lib/pgsql/data/recovery.*
-rw-r--r-- 1 postgres postgres 142 Oct 12 08:11 /var/lib/pgsql/data/recovery.done
要注意的是, 一旦打开就无法再次进入恢复状态, 只能重建备库
外部链接:
Chapter 25. High Availability, Load Balancing, and Replication
Chapter 24. Backup and Restore
Simple HA with PostgreSQL Point-In-Time Recovery
F.23. pg_standby
Using pg_standby for high availability of Postgresql
7. 记录级别的日志传送(Record-based Log Shipping)
前面实现的是文件级别的日志传送, 以文件为单位进行恢复, 即使设置了定时强制归档, 恢复窗口也太大.
Postgresql支持记录级别的日志传送, 见24.4.4. Record-based Log Shipping, 但没提供相应的工具. 所幸的是Skype公司开发了一套SkyTools, 给广大数据库管理员带去了福音, 其中一款工具walmgr可以方便的创建/管理备库, 并实现了记录级别的日志传送.
8. 安装SkyTools
主备库服务器上安装SkyTools
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/compat-postgresql-libs-4-1PGDG.rhel5.x86_64.rpm
yum -y install python-psycopg2.x86_64
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/skytools-2.1.10-1.rhel5.x86_64.rpm
chmod +x /usr/bin/*.py
9. 配置walmgr
说明见walmgr(1)
主库:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name = pgdb01.cloud101_walgmr_master
logfile = /var/lib/pgsql/walmgr/logs/master.log
pidfile = /var/lib/pgsql/walmgr/logs/master.pid
master_db = dbname=template1
master_data = /var/lib/pgsql/data
master_config = /var/lib/pgsql/data/postgresql.conf
slave = pgdb02:/var/lib/pgsql/walmgr/shipping
slave_config = /var/lib/pgsql/walmgr/slave.ini
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
config_backup = %(slave)s/config.backup
loop_delay = 10.0
use_xlog_functions = 1
compression = 1
EOF
备库:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name = pgdb02.cloud101_walgmr_slave
logfile = /var/lib/pgsql/walmgr/logs/slave.log
slave_data = /var/lib/pgsql/data
slave_stop_cmd = sudo /etc/init.d/postgresql stop
slave_start_cmd = sudo /etc/init.d/postgresql start
slave_config_dir = /var/lib/pgsql/data
slave = /var/lib/pgsql/walmgr/shipping
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
config_backup = %(slave)s/config.backup
keep_backups = 5
EOF
10. 配置主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini setup
2009-10-12 10:09:58,499 29592 INFO Configuring WAL archiving
2009-10-12 10:09:58,502 29592 WARNING database must be restarted to enable archiving
2009-10-12 10:09:58,506 29592 INFO Sending SIGHUP to postmaster
2009-10-12 10:09:59,222 29592 INFO Done
运行完setup,配置文件增加了如下2条配置项
archive_mode = 'on'
archive_command = '/usr/bin/walmgr.py /var/lib/pgsql/walmgr/master.ini xarchive %p %f'
重启使修改生效
pg_ctl restart -m fast
11. 备份主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini backup
2009-10-12 10:27:05,706 18300 INFO Backup lock obtained.
2009-10-12 10:27:05,706 18300 INFO got SystemExit(0), exiting
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,268 18323 INFO First useful WAL file is: 00000001000000020000008B
2009-10-12 10:27:09,744 18329 INFO Backup lock released.
2009-10-12 10:27:09,785 29813 INFO Full backup successful
数据库备份到了备库服务器指定目录上
数据文件放在/var/lib/pgsql/walmgr/shipping/data.master
配置文件放在/var/lib/pgsql/walmgr/shipping/config.backup
日志:
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,785 29813 INFO Full backup successful
2009-10-12 10:27:09,852 29825 INFO {count: 1, duration: 0.387887001038}
2009-10-12 10:27:10,724 29829 INFO {count: 1, duration: 0.725953102112}
2009-10-12 10:27:11,272 29832 INFO {count: 1, duration: 0.396477937698}
2009-10-12 10:27:12,056 29835 INFO {count: 1, duration: 0.628087997437}
12. 启用备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
Stopping postgresql service: [ OK ]
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
Starting postgresql service: [ OK ]
这样备库就建好了
日志:
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
2009-10-12 10:28:37,094 18511 INFO 00000001.history: not found, ignoring
2009-10-12 10:28:37,094 18511 INFO got SystemExit(1), exiting
2009-10-12 10:28:37,233 18512 INFO 00000001000000020000008B.00000020.backup: Found
2009-10-12 10:28:37,243 18512 INFO {count: 1}
2009-10-12 10:28:37,387 18514 INFO 00000001000000020000008B: Found
2009-10-12 10:28:37,489 18514 INFO {count: 1}
13. 手工同步
在主库上运行
$ walmgr.py /var/lib/pgsql/walmgr/master.ini sync
2009-10-12 10:30:29,591 29862 INFO sent to slave: 00000001000000020000008B @ 0 +16777216
14. 打开备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini boot
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode
日志:
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode
2009-10-12 10:31:39,485 18516 INFO 00000001000000020000008C: not found, stopping
2009-10-12 10:31:39,485 18516 INFO got SystemExit(1), exiting
2009-10-12 10:31:39,628 18536 INFO 00000001000000020000008B: Found
2009-10-12 10:31:39,737 18536 INFO {count: 1}
2009-10-12 10:31:39,881 18538 INFO 00000002.history: not found, ignoring
2009-10-12 10:31:39,882 18538 INFO got SystemExit(1), exiting
2009-10-12 10:31:40,024 18539 INFO 00000001.history: not found, ignoring
2009-10-12 10:31:40,024 18539 INFO got SystemExit(1), exiting
15. 恢复备库
备库打开后无法切换回恢复状态, 必须重新从备份重建. 操作同步骤12
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master
16. 自动定时同步
主库上运行
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &
按loop_delay指定的时间间隔同步
17. 切换主备库
假设pgdb01是主库, pgdb02是备库, 切换主备库角色
1) 按新的主备库角色编辑配置文件
pgdb02:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name = pgdb02.cloud101_walgmr_master
logfile = /var/lib/pgsql/walmgr/logs/master.log
pidfile = /var/lib/pgsql/walmgr/logs/master.pid
master_db = dbname=template1
master_data = /var/lib/pgsql/data
master_config = /var/lib/pgsql/data/postgresql.conf
slave = pgdb01:/var/lib/pgsql/walmgr/shipping
slave_config = /var/lib/pgsql/walmgr/slave.ini
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
config_backup = %(slave)s/config.backup
loop_delay = 10.0
use_xlog_functions = 1
compression = 1
EOF
pgdb01:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name = pgdb01.cloud101_walgmr_slave
logfile = /var/lib/pgsql/walmgr/logs/slave.log
slave_data = /var/lib/pgsql/data
slave_stop_cmd = sudo /etc/init.d/postgresql stop
slave_start_cmd = sudo /etc/init.d/postgresql start
slave_config_dir = /var/lib/pgsql/data
slave = /var/lib/pgsql/walmgr/shipping
completed_wals = %(slave)s/logs.complete
partial_wals = %(slave)s/logs.partial
full_backup = %(slave)s/data.master
config_backup = %(slave)s/config.backup
keep_backups = 5
EOF
2) 停止应用程序, 将应用程序的数据库配置指向pgdb02
3) pgdb01强制同步一次
walmgr.py /var/lib/pgsql/walmgr/master.ini sync
4) 停止pgdb01的后台同步进程
kill $(cat /var/lib/pgsql/walmgr/logs/master.pid)
5) 打开pgdb02
walmgr.py /var/lib/pgsql/walmgr/slave.ini boot
6) 配置pgdb02, 启用归档
walmgr.py /var/lib/pgsql/walmgr/master.ini setup
pg_ctl restart -m fast
7) 现在pgdb02是主库, 启动应用程序, 连接pgdb02
8) 备份pgdb02
walmgr.py /var/lib/pgsql/walmgr/master.ini backup
9) 恢复到pgdb01
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master
10) pgdb02启用后台自动定时同步
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &
11) 完成
外部链接:
walmgr
walmgr(1)
Testing PostgreSQL replication solutions: Log shipping with walmgr
Core Services 2.0 skytools setup
18. 其它复制/集群/高可用方案
介绍:
Replication, Clustering, and Connection Pooling
PostgreSQL Replication Solutions - Presentation
PostgreSQL Replication
The Four Horsemen of Replication 1 2 3
Getting Smart about the New World of PostgreSQL Replication
本人认为比较简单好用的几种:
- Warm Standby: 基于日志的异步复制, 复制时备库不可用
- Hot Standby: 备库可读. postgresql当前版本不支持此功能, 8.5版本开始支持
- Mammoth Replicator: Postgresql的修改版, 支持主从异步复制, , 使用上有很多限制, 比如只支持复制一个数据库(schema)
- Tungsten Replicator: 支持主从复制, 基于日志分析, 性能很差, 目前只支持Mysql,Oracle复制, 貌似很有前途, 值得一用
链接:
Continuent Tungsten Replicator Guide
The Scale-Out Blog
- RubyRep: Ruby脚本编写的, 支持Mysql和Postgresql, 跟据主键或唯一索引比较2个数据库中表的不同, 进行同步. 可以实现主从或主主复制. 估计性能不行
例子:Yet Another PostgreSQL Replication Tool - RubyRep
19. Postgresql数据库下载地址
http://www.postgresql.org/ftp/
https://projects.commandprompt.com/public/pgcore/wiki/direct_download
-fin-