Locations of visitors to this page

Friday, October 30, 2009

how to resync a broken master-master replication

how to resync a broken master-master replication

[still in draft]

db01:3306 and db03:3306 are setting up with master-master replicaition.

in db03 the slave process is accidentally stopped for quite a long time that caused by a wrong configuration, after resolving the problem, dual master replication seems to start working, but unfortunately binary logs were delete older than x days, many tables has got out of date or even lost in db03, recreating the slave db is not acceptable, we have to find a more efficient way to make thier data resync again.


1. stop replication from db03 to db01
login db01
show slave status\G
stop slave;
show slave status\G


2. dump sql script for recreating table structure later
login db01
DB_LIST=$(mysql -u root --batch --skip-column-names -e 'show databases;'|sed '/^information_schema$\|^mysql$\|^test$/d'|tr '\n' ' ')
mysqldump -u root --no-data --skip-add-drop-table --databases $DB_LIST >$HOME/db01-3306-db.sql

login db03
DB_LIST=$(mysql -u root --batch --skip-column-names -e 'show databases;'|sed '/^information_schema$\|^mysql$\|^test$/d'|tr '\n' ' ')
mysqldump -u root --no-data --skip-add-drop-table --databases $DB_LIST >$HOME/db03-3306-db.sql


3. recreate lost tables
login db03
scp db01:db01-3306-db.sql $HOME/
mysql -u root --force <$HOME/db01-3306-db.sql

login db01
scp db03:db03-3306-db.sql $HOME/
mysql -u root --force <$HOME/db03-3306-db.sql


4. solve slave sql issues in db03, if any
login db03
show slave status\G
set global sql_slave_skip_counter=1;
start slave;
show slave status\G

now db01 and db03 are actually working in master-slave replication.


5. check data consistency between db01 and db03
run mk-table-checksum against db01, it will execute sql queries to calculate crc checksum of tables in db01, then in db02 the same queries will run too, and calculate checksum in db02, finally we will get checksums of both databases.

mk-table-checksum --ignore-databases=mysql,test --replicate mysql.checksum --empty-replicate-table --create-replicate-table h=db01,P=3306,u=webuser


6. sync table records from db01 to db03
run mk-table-sync with dry-run or print option, see what will happen in db03, it will not actually take any actions.

mk-table-sync --dry-run --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser
mk-table-sync --print --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser

actually sync table data
mk-table-sync --execute --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser


7. known issue
"Can't make changes on the master: no unique index exists at /usr/bin/mk-table-sync line 3775. while doing dbtest.t"
solution: create unique index on the table
it is not a perfect solution, because there may be no suitable columns for defining an unique index, or there may be duplicate records in the table.


8. check to see if all data in both databases are identical
mk-table-checksum --ignore-databases=mysql,test -P 3306 -u webuser db01 db03|mk-checksum-filter


9. start replication from db03 to db01
login db01
show slave status\G
start slave;
show slave status\G


10. external links:

mk-table-checksum
mk-table-sync
How to sync tables in master-master MySQL replication
MySQL Master-Master replication table sync
Maatkit Options for Restoring a Slave or Master
Tip of the Day — Checking Your Tables — Part I
Sync up a MySQL slave with mk-table-sync
mk-table-sync and small tables



-fin-

Tuesday, October 27, 2009

how to generate sequence in bash

how to generate sequence in bash
BASH脚本如何产生序号


1. BASH FOR循环结构(Looping Constructs)
$ for ((a=1;a<=5;a++));do echo $a;done
1
2
3
4
5
这种用法只能在FOR循环中使用


2. BASH大括号扩展机制(Brace Expansion)
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10
$ echo {z..a}
z y x w v u t s r q p o n m l k j i h g f e d c b a
bash第4版增加了功能,可指定步长

开头补零:
$ printf "%02d " {6..12}
06 07 08 09 10 11 12 


3. seq命令
seq命令是属于coreutils包的一个工具, 用于生成一组数字序列号, 如:
$ seq 1 2 10
1
3
5
7
9

开头补零:
$ seq -f %02.0f 6 12
06
07
08
09
10
11
12


4. awk
$ awk 'BEGIN {for (i=1;i<=5;i++) print i}'
1
2
3
4
5


5. jot
BSD系统中提供, 可以产生序列,随机数
jot - UNIX Power Tools, 2nd Edition
Using BSD jot
jot (BSD)
JOT(1)



-fin-

Monday, October 26, 2009

monitoring io statistics by process

monitoring io statistics by process
监控进程的IO统计信息


1. I/O accounting
Linux内核2.6.20增加了IO审计功能, 要使用这个功能需要首先在编译内核时打开TASK_DELAY_ACCT和TASK_IO_ACCOUNTING参数
RedHat Linux在内核2.6.18-144之后也加入了该功能, 并在RHEL5.4版本(kernel 2.6.18-164)中发布, 可以直接使用

一些基于IO审计的监控工具:
1)IOtop
Iotop is a Python program with a top like UI used to show of behalf of which process is the I / O going on. It requires Python ≥ 2.5 and a Linux kernel ≥ 2.6.20 with the TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled. It requires Python ≥ 2.5 and ≥ Linux kernel 2.6.20 with the TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled.

2)pidstat(SYSSTAT工具之一)
Report I/O statistics (kernels 2.6.20 and later only)

3)Dstat: Versatile resource statistics tool
0.6.7 版本后支持显示topio,topbio

4)htop - an interactive process viewer for Linux
显示RBYTES,WBYTES,IO_RATE等列

5)collectl - Process I/O Stats
2.4.0版本之后支持显示io统计信息, 见Process I/O Stats

外部链接
Red Hat backported I/O accounting to RHEL5
Red Hat Enterprise Linux 5.4 - Release Notes


2. block_dump
设置内核参数block_dump为非零值, 内核将报告所有磁盘读写操作和脏块刷新的信息

下载iodump脚本, 用于分析处理内核报告
wget http://maatkit.googlecode.com/svn/trunk/util/iodump

打开block_dump
echo 1 > /proc/sys/vm/block_dump

通过iodump分析dmesg输出的内核报告
while true; do sleep 1; dmesg -c; done | perl iodump

运行一会儿, 按Ctrl-c停止输出, 得到类似下面的结果
# while true; do sleep 1; dmesg -c; done | perl iodump
# Caught SIGINT.
TASK                   PID      TOTAL       READ      WRITE      DIRTY DEVICES
kjournald              478       1294          0       1294          0 sda2
pdflush              21422        857          0        857          0 sda5, sdb6
kjournald             1752        365          0        365          0 sda5
kjournald             1758        174          0        174          0 sdb6
kjournald             1756         52          0         52          0 sdb5
syslogd              26752         28          0         28          0 sdb6
firefox              12811         27         27          0          0 sdb5
perl                  3003         25         25          0          0 sda2
process_perfdat       3026         12         12          0          0 sda5
bash                 14749          3          3          0          0 sda2
firefox              12654          3          3          0          0 sdb5
bash                  3004          2          2          0          0 sda2
bash                  3017          1          1          0          0 sda2
squid                25555          1          1          0          0 sda2
check_nrpe            3001          1          1          0          0 sda2


最后关闭block_dump
echo 0 > /proc/sys/vm/block_dump

外部链接:
block_dump
How to find per-process I/O statistics on Linux
Monitoring filesystem activity under Linux with block_dump
How can I record what process or kernel activity is using the disk in GNU/Linux?
How to identify what processes are generating IO Wait load.



3. blktrace
支持block trace的内核版本:
– Patch for Linux 2.6.14rc3
(or later, up to 2.6.17)
– Linux 2.6.17 (or later) – built in

安装
yum -y install blktrace

挂载debugfs文件系统
mount -t debugfs debugfs /sys/kernel/debug

运行blktrace
blktrace -d /dev/sda2 -o - | blkparse -i - -s

运行一段时间后, CTRL-C退出, 显示统计信息:
...
...
...
  8,2    0     1576     0.618648628   478  U   N [kjournald] 1
  8,2    0     1577     0.618669180     0  C   W 375869 + 128 [0]
  8,2    0     1578     0.618763920   478  Q   W 376045 + 8 [kjournald]
  8,2    0     1579     0.618765468   478  G   W 376045 + 8 [kjournald]
  8,2    0     1580     0.618766125   478  P   N [kjournald]
  8,2    0     1581     0.618766440   478  I   W 376045 + 8 [kjournald]
  8,2    0     1582     0.618767278   478  U   N [kjournald] 1
  8,2    0     1583     0.618768643   478  D   W 376045 + 8 [kjournald]
  8,2    0     1584     0.618824993     0  C   W 376045 + 8 [0]
blktrace (5609)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        2,        8KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        2,       12KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
kjournald (478)
 Reads Queued:           0,        0KiB  Writes Queued:         330,    1,320KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       42,    1,320KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        4,       76KiB
 Read Merges:            0,        0KiB  Write Merges:          288,    1,152KiB
 IO unplugs:            12               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
nagios (5574)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        1,       16KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
pdflush (776)
 Reads Queued:           0,        0KiB  Writes Queued:         126,      504KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       33,      132KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       33,      136KiB
 Read Merges:            0,        0KiB  Write Merges:            1,        4KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
swapper (0)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      131,      536KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      233,    2,404KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0

CPU0 (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:         456,    1,824KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      208,    1,996KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      273,    2,644KiB
 Read Merges:            0,        0KiB  Write Merges:          289,    1,156KiB
 Read depth:             0               Write depth:            20
 IO unplugs:            11               Timer unplugs:           0
CPU2 (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            20
 IO unplugs:             1               Timer unplugs:           0

Total (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:         456,    1,824KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      208,    1,996KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      273,    2,644KiB
 Read Merges:            0,        0KiB  Write Merges:          289,    1,156KiB
 IO unplugs:            12               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 4,278KiB/s
Events (8,2): 1,585 entries
Skips: 0 forward (0 -   0.0%)

最后卸载debug文件系统
umount /sys/kernel/debug

外部链接:
Block I/O Layer Tracing: blktrace
blktrace User Guide
Tracing I/O usage on Linux


4. atop
对内核源码打补丁, 能够支持显示每个进程的IO


5. systemtap
例子见http://bpineau.livejournal.com/


6. DTrace
Solaris专用
iotop脚本见Top Ten DTrace (D) Scripts
iotop - display top disk I/O events by process.


7. 其它
pio and topio: for solaris,hp-ux,windows
Psio + other disk I/O by process tools: for solaris only



-fin-

Tuesday, October 20, 2009

setting up pitrtools

setting up pitrtools

pitrtools也是一种管理postgresql备库的工具集, 由2个python编写的小脚本组成
cmd_archiver, 调用rsync复制归档日志文件, 支持向多个备用库服务器传输文件, 这里简单介绍了传输文件的流程PITRTools: Multiple slave support
cmd_standby, 将数据库从主库备份到备库, 调用pg_standby在备库上应用归档日志


1. 测试环境:
primary(master) db: pgdb01
standby(slave) db: pgdb02
db version: Postgresql 8.3.7
os version: CentOS 5.3


2. 配置ssh登录
略, 见前篇博客setting up postgresql warm standby


3. 安装Pitrtools
登录pgdb01,pgdb02, 下载安装
wget -O - http://files.commandprompt.com/pitrtools/pitrtools-1.2.tar.bz2|tar -C $HOME -jxf -
chmod +x $HOME/pitrtools-1.2/{cmd_archiver,cmd_standby}


4. 主库配置archiver

编辑配置文件
cat >$HOME/pitrtools-1.2/cmd_archiver.ini <<EOF
[DEFAULT]
state: online
pgdata: /var/lib/pgsql/data
r_archivedir: /var/lib/pgsql/archive
l_archivedir: /var/lib/pgsql/archive
rsync_bin: /usr/bin/rsync
rsync_version = 2
slaves: pgdb02
user: postgres
timeout: 10
notify_ok: echo OK
notify_warning:  echo WARNING
notify_critical: echo CRITICAL
debug: on
ssh_debug: off
EOF

初始化建归档目录
$HOME/pitrtools-1.2/cmd_archiver -C $HOME/pitrtools-1.2/cmd_archiver.ini -I
We are initializing queues, one moment.

NOTICE: init_env_func()
NOTICE: generate_slave_list_func()
NOTICE: Your slaves are: ['pgdb02']


为备库建归档日志目录
ssh pgdb02 "mkdir -p /var/lib/pgsql/archive"

为cmd_standby创建辅助函数
psql -U postgres <$HOME/pitrtools-1.2/cmd_standby.sql


5. 主库打开归档模式
sed -i "
\$a \
archive_mode = 'on'\\
archive_command = '\$HOME/pitrtools-1.2/cmd_archiver -C \$HOME/pitrtools-1.2/cmd_archiver.ini -F %p \>\>\$HOME/pitrtools-1.2/cmd_archiver.\$(date \"+%%Y%%m%%d\").log'
/^archive_timeout/s/^/#/g
/^archive_mode\|^archive_command/d
" /var/lib/pgsql/data/postgresql.conf

pg_ctl restart -m fast


6. 备库配置standby
cat >$HOME/pitrtools-1.2/cmd_standby.ini <<EOF
[DEFAULT]
pgversion: 8.3
numarchives: 10
ssh: /usr/bin/ssh
rsync: /usr/bin/rsync
pg_standby: /usr/bin/pg_standby
pg_ctl: /usr/bin/pg_ctl
r_psql: /usr/bin/psql
port: 5432
master_public_ip: pgdb01
master_local_ip: 127.0.0.1
user: postgres
debug: off
ssh_timeout: 30
archivedir: /var/lib/pgsql/archive
pgdata: /var/lib/pgsql/data
postgresql_conf: $HOME/pitrtools-1.2/postgresql.conf
pg_hba_conf: $HOME/pitrtools-1.2/pg_hba.conf
notify_critical: echo critical.
notify_warning: echo warning.
notify_ok: echo ok.
action_failover: echo failover.
EOF

初始化创建备库目录
scp pgdb01:/var/lib/pgsql/data/postgresql.conf $HOME/pitrtools-1.2/
scp pgdb01:/var/lib/pgsql/data/pg_hba.conf $HOME/pitrtools-1.2/
sed -i '/^archive_/s/^/#/g' $HOME/pitrtools-1.2/postgresql.conf
rm -rf /var/lib/pgsql/data
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -I


7. 备份主库到备库
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -B


8. 启动备库
cp -p $HOME/pitrtools-1.2/{postgresql.conf,pg_hba.conf} /var/lib/pgsql/data/
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -S
至此备库配置完成


9. 测试主库是否连通
$ $HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -P
SUCCESS: Master returned: 1


10. 打开备库
$ $HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -F999
waiting for server to shut down... done
server stopped
server starting
failover.
NOTICE: Statistics are not replicated in warm standy mode.
HINT: Execute ANALYZE on your databases


11. 总结
同Walmgr相比, pitrtools用户必须手工创建一些目录和修改配置文件, 不够自动化, 不能保留主库备份, 程序也写得略显粗糙, 不够专业, 有不少有待完善的地方




外部链接:
Pitr Made Easy



-fin-

Tuesday, October 13, 2009

setting up postgresql warm standby

setting up postgresql warm standby
配置Postgresql备库

其工作原理跟Oracle备库一样, Postgresql数据库也支持将所有操作记录到日志文件(write-ahead log, WAL), 支持定时归档日志文件, 传输到备库服务器上, 然后在备库服务器上应用归档日志, 连续进行基于时间点的恢复(Point-In-Time Recovery), 这种高可用功能被称为warm standby或log shipping.


测试环境:
主库服务器: pgdb01
备库服务器: pgdb02
数据库版本: Postgresql 8.3.7
操作系统版本: CentOS 5.3

1. 设置服务器之间ssh认证, 用于scp拷归档日志
pgdb01上生成公私密钥文件, 配置认证信息, 并将同样的密钥文件和配置信息复制到pgdb02
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
[ -f ~/.ssh/authorized_keys ] && sed -i '/'$(whoami)'@'$(hostname)'/d' ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
scp -pr $HOME/.ssh pgdb02:

测试一下
$ ssh pgdb02 "hostname -s"
pgdb02
$ ssh pgdb02 "ssh pgdb01 'hostname -s'"
pgdb01


2. 主库pgdb01开启日志归档/传输功能
备库pgdb02上创建归档日志存放目录:
mkdir -p /var/lib/pgsql/archlog/

编辑主库pgdb01配置文件
vi /var/lib/pgsql/data/postgresql.conf
增加如下配置项
archive_mode = on
archive_command = 'scp -C -o ConnectTimeout=30 "%p" pgdb02:/var/lib/pgsql/archlog/"%f"'
archive_timeout = 1800
archive_mode:on表示开启归档功能,off表示关闭. 重启后此参数才能生效
archive_command:归档命令, 调用scp拷贝归档文件至备库
archive_timeout:表示每隔多少秒强制归档一次

重启数据库, 使配置生效
pg_ctl restart -m fast
或用immediate参数停
pg_ctl restart -m immediate


3. 检查日志传输情况
调用pg_switch_xlog手工切换几次日志
$ psql -c "select pg_switch_xlog();"
 pg_switch_xlog
----------------
 2/21000088
(1 row)

如果当前日志中没有任何事务, pg_switch_xlog不会切换日志, 见9.23. System Administration Functions

查看pg_xlog目录
$ psql -c "select pg_current_xlog_location();"
 pg_current_xlog_location
--------------------------
 2/82000000
(1 row)

$ ls -lR /var/lib/pgsql/data/pg_xlog
/var/lib/pgsql/data/pg_xlog:
total 49216
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081
drwxr-xr-x 2 postgres postgres     4096 Oct 12 07:32 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 07:27 00000001000000020000007F.done
-rw------- 1 postgres postgres 0 Oct 12 07:31 000000010000000200000080.done
-rw------- 1 postgres postgres 0 Oct 12 07:32 000000010000000200000081.done
archive_status目录下.done文件表示已传输成功日志的编号, 由postgresql维护,自动删除

查看pgdb02, 归档日志已传到指定目录
$ ls -l /var/lib/pgsql/archlog/
total 49212
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081



4. 备份主库, 恢复到备库服务器上
备库pgdb02停止数据库实例
pg_ctl stop -m fast
删除data目录
rm -rf /var/lib/pgsql/data

pgdb01备份主库, 传输至备库
psql -c "checkpoint;select pg_start_backup('dbbackup');"
tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
psql -c "select pg_stop_backup();"
$ psql -c "checkpoint;select pg_start_backup('dbbackup');"
 pg_start_backup
-----------------
 2/820000B0
(1 row)

$ tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
$ ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
$ psql -c "select pg_stop_backup();"
 pg_stop_backup
----------------
 2/82000118
(1 row)


备份产生的.backup文件记录了备份信息
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 65624
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres      244 Oct 12 07:56 000000010000000200000082.000000B0.backup
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 000000010000000200000083
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000084
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000085
drwxr-xr-x 2 postgres postgres     4096 Oct 12 07:56 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.000000B0.backup.done
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.done
$ cat /var/lib/pgsql/data/pg_xlog/000000010000000200000082.000000B0.backup
START WAL LOCATION: 2/820000B0 (file 000000010000000200000082)
STOP WAL LOCATION: 2/82000118 (file 000000010000000200000082)
CHECKPOINT LOCATION: 2/820000B0
START TIME: 2009-10-12 07:55:13 UTC
LABEL: dbbackup
STOP TIME: 2009-10-12 07:56:28 UTC
表示是从000000B0这个位置开始备份的, 解释见24.3.2. Making a Base Backup


5. 配置备库
备库上不需要归档, 所以修改备库配置文件, 关闭归档
sed -i '/^archive_/s/^/#/g' /var/lib/pgsql/data/postgresql.conf

安装postgresql contrib软件包, 使用其中的pg_standby辅助工具
sudo rpm -Uvh http://ftp9.us.postgresql.org/pub/mirrors/postgresql/binary/v8.3.7/linux/rpms/redhat/rhel-5-x86_64/postgresql-contrib-8.3.7-1PGDG.rhel5.x86_64.rpm

编辑恢复配置文件recovery.conf
cat >/var/lib/pgsql/data/recovery.conf <<'EOF'
restore_command = 'pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/archlog %f %p %r 2>>pg_log/standby.log.$(date +%%Y%%m%%d)'
EOF
-l: 创建归档日志的符号链接, 而不是拷贝
-d: 打开调试信息
-s 2: 每2秒检查一次是否有新的归档日志
-t /tmp/pgsql.trigger.5432: 一旦发现存在触发器文件, 将终止恢复过程, 然后打开备库
各选项解释见F.23. pg_standby

启动备库
pg_ctl start

日志显示已恢复了所有归档日志, 等待着下一个日志
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001.history
WAL file path           : /var/lib/pgsql/archlog/00000001.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000082.000000B0.backup
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000082
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000082
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000083
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000083
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000083" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000010000000200000082 and later
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
...
...
...

$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 8
drwxr-xr-x 2 postgres postgres 4096 Oct 12 07:55 archive_status
lrwxrwxrwx 1 postgres postgres   63 Oct 12 08:11 RECOVERYHISTORY -> /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
[postgres@pgdb02.cloud101.bqa4.blurdev.com ~]$ ls -lR /var/lib/pgsql/archlog/
/var/lib/pgsql/archlog/:
total 65620
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres      244 Oct 12 07:56 000000010000000200000082.000000B0.backup
pg_xlog目录下RECOVERYHISTORY文件不知有什么用
archlog目录下归档文件由postgresql维护, 成功恢复后, 会自动删除一些旧的


6. 打开备库
备库处在恢复状态时不能进行任何操作, 必须打开数据库后才能查询SQL
$ psql
psql: FATAL:  the database system is starting up


当pg_standby检测到触发器文件, 就会终止恢复过程, 打开数据库
touch /tmp/pgsql.trigger.5432
日志:
...
...
...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...trigger file found

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000085
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000085
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000085" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000002.history
WAL file path           : /var/lib/pgsql/archlog/00000002.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001.history
WAL file path           : /var/lib/pgsql/archlog/00000001.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

恢复过程停止后, recovery.conf被自动改名为recovery.done
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 16412
lrwxrwxrwx 1 postgres postgres       47 Oct 12 09:28 000000020000000200000085 -> /var/lib/pgsql/archlog/000000010000000200000085
-rw------- 1 postgres postgres 16777216 Oct 12 09:28 000000020000000200000086
-rw------- 1 postgres postgres       74 Oct 12 09:28 00000002.history
drwxr-xr-x 2 postgres postgres     4096 Oct 12 09:28 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 09:28 00000002.history.ready
$ ls -lR /var/lib/pgsql/data/recovery.*
-rw-r--r-- 1 postgres postgres 142 Oct 12 08:11 /var/lib/pgsql/data/recovery.done

要注意的是, 一旦打开就无法再次进入恢复状态, 只能重建备库



外部链接:
Chapter 25. High Availability, Load Balancing, and Replication
Chapter 24. Backup and Restore
Simple HA with PostgreSQL Point-In-Time Recovery
F.23. pg_standby
Using pg_standby for high availability of Postgresql



7. 记录级别的日志传送(Record-based Log Shipping)
前面实现的是文件级别的日志传送, 以文件为单位进行恢复, 即使设置了定时强制归档, 恢复窗口也太大.
Postgresql支持记录级别的日志传送, 见24.4.4. Record-based Log Shipping, 但没提供相应的工具. 所幸的是Skype公司开发了一套SkyTools, 给广大数据库管理员带去了福音, 其中一款工具walmgr可以方便的创建/管理备库, 并实现了记录级别的日志传送.


8. 安装SkyTools
主备库服务器上安装SkyTools
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/compat-postgresql-libs-4-1PGDG.rhel5.x86_64.rpm
yum -y install python-psycopg2.x86_64
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/skytools-2.1.10-1.rhel5.x86_64.rpm
chmod +x /usr/bin/*.py


9. 配置walmgr
说明见walmgr(1)

主库:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name             = pgdb01.cloud101_walgmr_master
logfile              = /var/lib/pgsql/walmgr/logs/master.log
pidfile              = /var/lib/pgsql/walmgr/logs/master.pid
master_db            = dbname=template1
master_data          = /var/lib/pgsql/data
master_config        = /var/lib/pgsql/data/postgresql.conf
slave                = pgdb02:/var/lib/pgsql/walmgr/shipping
slave_config         = /var/lib/pgsql/walmgr/slave.ini
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
loop_delay           = 10.0
use_xlog_functions   = 1
compression          = 1
EOF

备库:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name             = pgdb02.cloud101_walgmr_slave
logfile              = /var/lib/pgsql/walmgr/logs/slave.log
slave_data           = /var/lib/pgsql/data
slave_stop_cmd       = sudo /etc/init.d/postgresql stop
slave_start_cmd      = sudo /etc/init.d/postgresql start
slave_config_dir     = /var/lib/pgsql/data
slave                = /var/lib/pgsql/walmgr/shipping
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
keep_backups         = 5
EOF


10. 配置主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini setup
2009-10-12 10:09:58,499 29592 INFO Configuring WAL archiving
2009-10-12 10:09:58,502 29592 WARNING database must be restarted to enable archiving
2009-10-12 10:09:58,506 29592 INFO Sending SIGHUP to postmaster
2009-10-12 10:09:59,222 29592 INFO Done

运行完setup,配置文件增加了如下2条配置项
archive_mode = 'on'
archive_command = '/usr/bin/walmgr.py /var/lib/pgsql/walmgr/master.ini xarchive %p %f'

重启使修改生效
pg_ctl restart -m fast


11. 备份主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini backup
2009-10-12 10:27:05,706 18300 INFO Backup lock obtained.
2009-10-12 10:27:05,706 18300 INFO got SystemExit(0), exiting
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,268 18323 INFO First useful WAL file is: 00000001000000020000008B
2009-10-12 10:27:09,744 18329 INFO Backup lock released.
2009-10-12 10:27:09,785 29813 INFO Full backup successful
数据库备份到了备库服务器指定目录上
数据文件放在/var/lib/pgsql/walmgr/shipping/data.master
配置文件放在/var/lib/pgsql/walmgr/shipping/config.backup

日志:
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,785 29813 INFO Full backup successful
2009-10-12 10:27:09,852 29825 INFO {count: 1, duration: 0.387887001038}
2009-10-12 10:27:10,724 29829 INFO {count: 1, duration: 0.725953102112}
2009-10-12 10:27:11,272 29832 INFO {count: 1, duration: 0.396477937698}
2009-10-12 10:27:12,056 29835 INFO {count: 1, duration: 0.628087997437}


12. 启用备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
Stopping postgresql service:                               [  OK  ]
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
Starting postgresql service:                               [  OK  ]
这样备库就建好了

日志:
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
2009-10-12 10:28:37,094 18511 INFO 00000001.history: not found, ignoring
2009-10-12 10:28:37,094 18511 INFO got SystemExit(1), exiting
2009-10-12 10:28:37,233 18512 INFO 00000001000000020000008B.00000020.backup: Found
2009-10-12 10:28:37,243 18512 INFO {count: 1}
2009-10-12 10:28:37,387 18514 INFO 00000001000000020000008B: Found
2009-10-12 10:28:37,489 18514 INFO {count: 1}


13. 手工同步
在主库上运行
$ walmgr.py /var/lib/pgsql/walmgr/master.ini sync
2009-10-12 10:30:29,591 29862 INFO sent to slave: 00000001000000020000008B @ 0 +16777216


14. 打开备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini boot
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode

日志:
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode
2009-10-12 10:31:39,485 18516 INFO 00000001000000020000008C: not found, stopping
2009-10-12 10:31:39,485 18516 INFO got SystemExit(1), exiting
2009-10-12 10:31:39,628 18536 INFO 00000001000000020000008B: Found
2009-10-12 10:31:39,737 18536 INFO {count: 1}
2009-10-12 10:31:39,881 18538 INFO 00000002.history: not found, ignoring
2009-10-12 10:31:39,882 18538 INFO got SystemExit(1), exiting
2009-10-12 10:31:40,024 18539 INFO 00000001.history: not found, ignoring
2009-10-12 10:31:40,024 18539 INFO got SystemExit(1), exiting


15. 恢复备库
备库打开后无法切换回恢复状态, 必须重新从备份重建. 操作同步骤12
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master


16. 自动定时同步
主库上运行
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &
按loop_delay指定的时间间隔同步


17. 切换主备库
假设pgdb01是主库, pgdb02是备库, 切换主备库角色

1) 按新的主备库角色编辑配置文件
pgdb02:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name             = pgdb02.cloud101_walgmr_master
logfile              = /var/lib/pgsql/walmgr/logs/master.log
pidfile              = /var/lib/pgsql/walmgr/logs/master.pid
master_db            = dbname=template1
master_data          = /var/lib/pgsql/data
master_config        = /var/lib/pgsql/data/postgresql.conf
slave                = pgdb01:/var/lib/pgsql/walmgr/shipping
slave_config         = /var/lib/pgsql/walmgr/slave.ini
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
loop_delay           = 10.0
use_xlog_functions   = 1
compression          = 1
EOF

pgdb01:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name             = pgdb01.cloud101_walgmr_slave
logfile              = /var/lib/pgsql/walmgr/logs/slave.log
slave_data           = /var/lib/pgsql/data
slave_stop_cmd       = sudo /etc/init.d/postgresql stop
slave_start_cmd      = sudo /etc/init.d/postgresql start
slave_config_dir     = /var/lib/pgsql/data
slave                = /var/lib/pgsql/walmgr/shipping
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
keep_backups         = 5
EOF

2) 停止应用程序, 将应用程序的数据库配置指向pgdb02

3) pgdb01强制同步一次
walmgr.py /var/lib/pgsql/walmgr/master.ini sync

4) 停止pgdb01的后台同步进程
kill $(cat /var/lib/pgsql/walmgr/logs/master.pid)

5) 打开pgdb02
walmgr.py /var/lib/pgsql/walmgr/slave.ini boot

6) 配置pgdb02, 启用归档
walmgr.py /var/lib/pgsql/walmgr/master.ini setup
pg_ctl restart -m fast

7) 现在pgdb02是主库, 启动应用程序, 连接pgdb02

8) 备份pgdb02
walmgr.py /var/lib/pgsql/walmgr/master.ini backup

9) 恢复到pgdb01
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master

10) pgdb02启用后台自动定时同步
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &

11) 完成



外部链接:
walmgr
walmgr(1)
Testing PostgreSQL replication solutions: Log shipping with walmgr
Core Services 2.0 skytools setup



18. 其它复制/集群/高可用方案

介绍:
Replication, Clustering, and Connection Pooling
PostgreSQL Replication Solutions - Presentation
PostgreSQL Replication
The Four Horsemen of Replication 1 2 3
Getting Smart about the New World of PostgreSQL Replication

本人认为比较简单好用的几种:
  1. Warm Standby: 基于日志的异步复制, 复制时备库不可用
  2. Hot Standby: 备库可读. postgresql当前版本不支持此功能, 8.5版本开始支持
  3. Mammoth Replicator: Postgresql的修改版, 支持主从异步复制, , 使用上有很多限制, 比如只支持复制一个数据库(schema)
  4. Tungsten Replicator: 支持主从复制, 基于日志分析, 性能很差, 目前只支持Mysql,Oracle复制, 貌似很有前途, 值得一用
    链接:
    Continuent Tungsten Replicator Guide
    The Scale-Out Blog
  5. RubyRep: Ruby脚本编写的, 支持Mysql和Postgresql, 跟据主键或唯一索引比较2个数据库中表的不同, 进行同步. 可以实现主从或主主复制. 估计性能不行
    例子:Yet Another PostgreSQL Replication Tool - RubyRep


19. Postgresql数据库下载地址
http://www.postgresql.org/ftp/
https://projects.commandprompt.com/public/pgcore/wiki/direct_download



-fin-

Monday, October 12, 2009

sort file by length of lines

sort file by length of lines
按每行长度对文件内容排序

比如文件包含一些人名, 想按名称长度进行排序
cat >myfile <<EOF
Andre Anichanov
Andrew Mogrelia
Antonio Vivaldi
Antoni Wit
Capella Istropolitana
Christian Kohn
Cologne Chamber Orchestra
Daniel Rothert
Daniel Taylor
Einar Steen-Nokleberg
Elena Papandreou
Eteri Andjaparidze
EOF

awk得到每行的长度, 按长度排序后显示输出
awk '{print length,$0}' myfile|sort -rn|cut -d" " -f2-
或者将管道写在print后
awk '{print length,$0|"sort -rn|cut -d\" \" -f2-"}' myfile

运行结果
Cologne Chamber Orchestra
Einar Steen-Nokleberg
Capella Istropolitana
Eteri Andjaparidze
Elena Papandreou
Antonio Vivaldi
Andrew Mogrelia
Andre Anichanov
Daniel Rothert
Christian Kohn
Daniel Taylor
Antoni Wit



-fin-

Website Analytics

Followers