Locations of visitors to this page

Thursday, December 24, 2009

questions about database link

questions about database link
dblink问题

Q1:
我现在有两个生产库,A库的部分表通过DBLINK提供给B库使用,B库的应用访问B库,并通过B库查询A库的这些DBLINK链接的表,现在有一个问题:
检查发现 B库的连接数比较多,进一步检查发现有50多个连接是来自A库,即使不用这些DBLINK的表,但连接还是建立的,通过DBLINK建立了8个表的连接,但实际上有50个连接,
是否应用不访问,那么DBLINK的表就没有连接?


A1:
1. 是长连接, 一般, 不用的不会释放
2. 跟会话有关, 不同会话之间不共享数据库连接
3. 可调整open_links参数, 控制打开的连接数

Q2:
奶奶的,会不会出问题? 生产A 上通过DBLINK建立了 B库一个表,并建立同义词, A上一直往这个同义词上插入数据,导致如下的SQL建立的连接达到50多个,全部都是一样的,现在都停止插入了,但进程依然存在,刚才查了open_links参赛,都是4,但为什么会有这么多连接呢?怎么去控制?插入进程应该早就停止了并释放连接,但两个数据库之间的连接到底什么时候释放呢?
INSERT  INTO "SM_SEND_SM_LIST" ("SERIALNO","SERVICEID","SMCONTEN
T","SENDTARGET","PRIORITY","RCOMPLETETIMEBEGIN","RCOMPLETETIMEEN
D","RCOMPLETEHOURBEGIN","RCOMPLETEHOUREND","REQUESTTIME","ROADBY
","SENDTARGETDESC","FEEVALUE","LINKID","PAD1","PAD2","PAD3","PAD
4","PAD5") VALUES (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:1
4,:15,:16,:17,:18,:19)

刚才了解了一下,是应用程序往这个同义词插入数据,应用程序通过WEBLOGIC的连接池处理,是不是连接池的连接不释放,导致dblink连接不释放?
这玩意会不会增长到几百甚至更多?

A2:
1. 你的weblogic的连接池创建了50个会话连接到数据库上, 应用程序访问数据库, 会被随机分配一个连接, 可能是50个中的任何一个, 所以50个会话可能都被用到了, 所以数据库连接就会有50个
2. 程序退出, 连接池不释放连接, 会话依然存在, 所以数据库连接数不变
3. 只要会话数一定, 数据库连接数就一定, 所以不会涨到几百
4. open_links能够限制每个会话最大打开的数据库连接数, 没有系统全局的限制


Q3:
1,我的应用有7个呢,每个WEBLOGIC都设置有50个连接
2,检查发现大部分插入都是一个SQL,采用绑定变量方式;
3,难道就没有别的方式可以降低连接数量
4,Open_links这个参数好像对这个没有效果,每个进程肯定只建立一个连接,处理完成后进程消失,但连接不消失,而且从这个案例来说,不可能出现一个进程产生多个连接的现象

我考虑将这个表从B库迁移到A库,A库德插入可能是不同的进程,但B库扫描这个表的进程应该一直是一个,那么这样就应该只有一个连接了!

A3:
1. 你有7个weblogic, 每个的连接池有50个连接? 还是7个应用使用同一个weblogic, 一共有50个连接?
第一种情况: 如果7个weblogic, 连接池中每个连接都用到了你那个数据库连接, 那么应该最多看到350个连接
第二种情况, 如果连接池中每个连接都用到了数据库连接, 那么应该最多看到50个连接
上述任何一种情况, 如果不是每个连接都用到了数据库连接, 则看到的连接数应当更小

2. 跟这个没关, 即使用到了n个表, 连接也只有1个

3.
1)在本地数据库端只有open_links的限制
2)前面说过了回话退出时会释放连接
3)第一条指出了可能的最大连接数
4)alter session close database link可手工关闭连接, 但这么做没意义, 下次还会自动创建

4.用到了几个数据库连接, 建立的链接就是几个, 可建立的最大值由open_links确定
因为虽然程序(进程)退出了, 但连接池创建的会话不会退出, 所以连接不消失
因为连接池创建了50个连接(会话)到数据库,每个会话都用到了数据库连接, 所以你看到的连接数是50个





外部链接:
30 Managing a Distributed Database
OPEN_LINKS



-fin-

nmap ping scan

nmap ping scan
nmap ping扫描


1. 现象
有3台服务器, 都开了iptables, 允许ICMP,SSH访问
server1: 192.168.18.95
server2: 192.168.51.72
server3: 192.168.51.71

nmap扫描服务器server1的ssh端口, 结果是通的
$ nmap server1 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:47 UTC
Interesting ports on server1 (192.168.18.95):
PORT   STATE SERVICE
22/tcp open  ssh

Nmap finished: 1 IP address (1 host up) scanned in 0.134 seconds

另一台server2扫描不通
$ nmap server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:48 UTC
Note: Host seems down. If it is really up, but blocking our ping probes, try -P0
Nmap finished: 1 IP address (0 hosts up) scanned in 3.001 seconds
然而ping却能ping通
$ ping server2 -c 3
PING server2 (192.168.51.72) 56(84) bytes of data.
64 bytes from server2 (192.168.51.72): icmp_seq=1 ttl=63 time=0.314 ms
64 bytes from server2 (192.168.51.72): icmp_seq=2 ttl=63 time=0.214 ms
64 bytes from server2 (192.168.51.72): icmp_seq=3 ttl=63 time=0.247 ms

--- server2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.214/0.258/0.314/0.043 ms
SSH端口也可访问
$ ssh server2 uptime
username@server2's password:
 07:50:01 up 14 days,  8:02,  1 user,  load average: 0.00, 0.00, 0.00


将server2上的iptables关掉后, nmap扫描通过
# iptables -F
$ nmap server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:49 UTC
Interesting ports on server2 (192.168.51.72):
PORT   STATE SERVICE
22/tcp open  ssh

Nmap finished: 1 IP address (1 host up) scanned in 0.104 seconds

再打开server2上的iptables, 用root运行nmap扫描, 结果也是通的
# nmap server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:50 UTC
Interesting ports on server2 (192.168.51.72):
PORT   STATE SERVICE
22/tcp open  ssh

Nmap finished: 1 IP address (1 host up) scanned in 0.120 seconds

从server3用root向server2扫描也是通的
# nmap server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 08:10 UTC
Interesting ports on server2 (192.168.51.72):
PORT   STATE SERVICE
22/tcp open  ssh
MAC Address: 00:16:C0:A8:33:47 (Semtech)

Nmap finished: 1 IP address (1 host up) scanned in 0.141 seconds

server1的iptables设置:
*nat
...
-A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8080
-A OUTPUT -o lo -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8080
...
*filter
...
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 8080 -m state --state NEW -j ACCEPT
...
-A RH-Firewall-1-INPUT -j DROP

server2的iptables设置:
*filter
...
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp --icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport 8080 -m state --state NEW -j ACCEPT
...
-A RH-Firewall-1-INPUT -j DROP


2. 调查
打开nmap调试信息和strace跟踪
1) server1:
$ nmap -v -d9 --packet-trace -n server1 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:52 UTC
The max # of sockets we are using is: 0
--------------- Timing report ---------------
  hostgroups: min 1, max 100000
  rtt-timeouts: init 1000, min 100, max 10000
  scan-delay: TCP 1000, UDP 1000
  parallelism: min 0, max 0
  max-retries: 10, host-timeout: 0
---------------------------------------------
READ selected for machine 192.168.18.95
Machine 192.168.18.95 MIGHT actually be listening on probe port 80
Hostupdate called for machine 192.168.18.95 state UNKNOWN/COMBO -> HOST_UP (trynum 0, dotimeadj: yes time: 727)
Timeout vals: srtt: -1 rttvar: -1 to: 1000000 delta 798 ==> srtt: 798 rttvar: 5000 to: 100000
Finished block: srtt: 798 rttvar: 5000 timeout: 100000 block_tries: 1 up_this_block: 1 down_this_block: 0 group_sz: 1
massping done:  num_hosts: 1  num_responses: 1
Initiating Connect() Scan against 192.168.18.95 [1 port] at 07:52
CONN (0.1040s) TCP localhost > 192.168.18.95:22 => Operation now in progress
**TIMING STATS**: IP, probes active/freshportsleft/retry_stack/outstanding/retranwait/onbench, cwnd/ccthresh/delay, timeout/srtt/rttvar/
   Groupstats (1/1 incomplete): 1/*/*/*/*/* 10.00/50/* 1000000/-1/-1
   192.168.18.95: 1/0/0/1/0/0 10.00/50/0 100000/798/5000
Discovered open port 22/tcp on 192.168.18.95
Changing ping technique for 192.168.18.95 to TCP
Timeout vals: srtt: 798 rttvar: 5000 to: 100000 delta -381 ==> srtt: 750 rttvar: 3845 to: 100000
Timeout vals: srtt: -1 rttvar: -1 to: 1000000 delta 417 ==> srtt: 417 rttvar: 5000 to: 100000
The Connect() Scan took 0.00s to scan 1 total ports.
Host 192.168.18.95 appears to be up ... good.
Interesting ports on 192.168.18.95:
Fetchfile found /usr/share/nmap/nmap-services

PORT   STATE SERVICE
22/tcp open  ssh
Final times for host: srtt: 750 rttvar: 3845  to: 100000

Nmap finished: 1 IP address (1 host up) scanned in 0.116 seconds

2) server2:
$ nmap -v -d9 --packet-trace -n server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:53 UTC
The max # of sockets we are using is: 0
--------------- Timing report ---------------
  hostgroups: min 1, max 100000
  rtt-timeouts: init 1000, min 100, max 10000
  scan-delay: TCP 1000, UDP 1000
  parallelism: min 0, max 0
  max-retries: 10, host-timeout: 0
---------------------------------------------
Finished block: srtt: -1 rttvar: -1 timeout: 1000000 block_tries: 2 up_this_block: 0 down_this_block: 0 group_sz: 1
massping done:  num_hosts: 1  num_responses: 0
Note: Host seems down. If it is really up, but blocking our ping probes, try -P0
Nmap finished: 1 IP address (0 hosts up) scanned in 3.002 seconds
与1)对比发现, nmap扫描server1时连接了server1的80端口

$ strace nmap server2 -p22
...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.51.72")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, [3], [3], [3], {1, 0})        = 0 (Timeout)
close(3)                                = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.51.72")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, [3], [3], [3], {1, 0})        = 0 (Timeout)
close(3)                                = 0
...
write(1, "Note: Host seems down. If it is "..., 81Note: Host seems down. If it is really up, but blocking our ping probes, try -P0
) = 81
...
连接server2的80端口, 2次超时(Timeout)后报错'Host seems down.', 然后退出

3) root用户扫描server2:
# nmap -v -d9 --packet-trace -n server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 07:53 UTC
The max # of sockets we are using is: 0
--------------- Timing report ---------------
  hostgroups: min 1, max 100000
  rtt-timeouts: init 1000, min 100, max 10000
  scan-delay: TCP 1000, UDP 1000
  parallelism: min 0, max 0
  max-retries: 10, host-timeout: 0
---------------------------------------------
Packet capture filter (device bond0): (icmp and dst host 192.168.11.18) or ((tcp or udp) and dst host 192.168.11.18 and ( dst port 34806 or dst port 34807 or dst port 34808 or dst port 34809 or dst port 34810))
SENT (0.0100s) ICMP 192.168.11.18 > 192.168.51.72 Echo request (type=8/code=0) ttl=52 id=10191 iplen=28
SENT (0.0100s) TCP 192.168.11.18:34810 > 192.168.51.72:80 A ttl=40 id=58342 iplen=40 seq=1674883102 win=1024 ack=2782179358
RCVD (0.0100s) ICMP 192.168.51.72 > 192.168.11.18 Echo reply (type=0/code=0) ttl=63 id=43696 iplen=28
We got a ping packet back from 192.168.51.72: id = 56906 seq = 21184 checksum = 52980
Hostupdate called for machine 192.168.51.72 state UNKNOWN/COMBO -> HOST_UP (trynum 0, dotimeadj: yes time: 446)
Timeout vals: srtt: -1 rttvar: -1 to: 1000000 delta 382 ==> srtt: 382 rttvar: 5000 to: 100000
Finished block: srtt: 382 rttvar: 5000 timeout: 100000 block_tries: 1 up_this_block: 1 down_this_block: 0 group_sz: 1
massping done:  num_hosts: 1  num_responses: 1
Initiating SYN Stealth Scan against 192.168.51.72 [1 port] at 07:53
Pcap filter: dst host 192.168.11.18 and (icmp or (tcp and (src host 192.168.51.72)))
Packet capture filter (device bond0): dst host 192.168.11.18 and (icmp or (tcp and (src host 192.168.51.72)))
SENT (0.1240s) TCP 192.168.11.18:34786 > 192.168.51.72:22 S ttl=51 id=27748 iplen=44 seq=3810228640 win=4096
**TIMING STATS**: IP, probes active/freshportsleft/retry_stack/outstanding/retranwait/onbench, cwnd/ccthresh/delay, timeout/srtt/rttvar/
   Groupstats (1/1 incomplete): 1/*/*/*/*/* 10.00/50/* 1000000/-1/-1
   192.168.51.72: 1/0/0/1/0/0 10.00/50/0 100000/382/5000
RCVD (0.1240s) TCP 192.168.51.72:22 > 192.168.11.18:34786 SA ttl=63 id=0 iplen=44 seq=3799856132 win=5840 ack=3810228641
Discovered open port 22/tcp on 192.168.51.72
Changing ping technique for 192.168.51.72 to TCP
Timeout vals: srtt: 382 rttvar: 5000 to: 100000 delta 44 ==> srtt: 387 rttvar: 3761 to: 100000
Timeout vals: srtt: -1 rttvar: -1 to: 1000000 delta 426 ==> srtt: 426 rttvar: 5000 to: 100000
The SYN Stealth Scan took 0.01s to scan 1 total ports.
Host 192.168.51.72 appears to be up ... good.
Interesting ports on 192.168.51.72:
Fetchfile found /usr/share/nmap/nmap-services

PORT   STATE SERVICE
22/tcp open  ssh
Final times for host: srtt: 387 rttvar: 3761  to: 100000

Nmap finished: 1 IP address (1 host up) scanned in 0.141 seconds
               Raw packets sent: 3 (112B) | Rcvd: 2 (92B)
发送ICMP echo request, 并向80端口发送TCP ACK包

# strace nmap server2 -p22
...
socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 4
setsockopt(4, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
setsockopt(4, SOL_IP, IP_HDRINCL, [1], 4) = 0
socket(PF_INET, SOCK_RAW, IPPROTO_RAW)  = 5
setsockopt(5, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
setsockopt(5, SOL_IP, IP_HDRINCL, [1], 4) = 0
socket(PF_PACKET, SOCK_RAW, 768)        = 6
ioctl(6, SIOCGIFINDEX, {ifr_name="lo", ifr_index=1}) = 0
ioctl(6, SIOCGIFHWADDR, {ifr_name="bond0", ifr_hwaddr=00:14:22:12:39:2b}) = 0
ioctl(6, SIOCGIFINDEX, {ifr_name="bond0", ifr_index=5}) = 0
bind(6, {sa_family=AF_PACKET, proto=0x03, if5, pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
getsockopt(6, SOL_SOCKET, SO_ERROR, [17179869184], [4]) = 0
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 7
ioctl(7, SIOCGIFADDR, {ifr_name="bond0", ifr_addr={AF_INET, inet_addr("192.168.11.18")}}) = 0
ioctl(7, SIOCGIFNETMASK, {ifr_name="bond0", ifr_netmask={AF_INET, inet_addr("255.255.255.0")}}) = 0
close(7)                                = 0
brk(0x17164000)                         = 0x17164000
setsockopt(6, SOL_SOCKET, SO_ATTACH_FILTER, "\1\0\0\0\0\0\0\0\20)h\0\0\0\0\0", 16) = 0
fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
recvfrom(6, "\0\220\301\224x\377\177\0\0\220\301\224x\377\177\0\0p\301\224x\377\177\0\0\360\255\23\27\0\0\0"..., 1, MSG_TRUNC, NULL, NULL) = 114
recvfrom(6, "\0\220\301\224x\377\177\0\0\220\301\224x\377\177\0\0p\301\224x\377\177\0\0\360\255\23\27\0\0\0"..., 1, MSG_TRUNC, NULL, NULL) = 114
...
recvfrom(6, "\0\220\301\224x\377\177\0\0\220\301\224x\377\177\0\0p\301\224x\377\177\0\0\360\255\23\27\0\0\0"..., 1, MSG_TRUNC, NULL, NULL) = 146
recvfrom(6, 0x7fff7894c11f, 1, 32, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
fcntl(6, F_SETFL, O_RDWR)               = 0
setsockopt(6, SOL_SOCKET, SO_ATTACH_FILTER, "\231\0\224x\377\177\0\0\320\312\23\27\0\0\0\0", 16) = 0
sendto(5, "E\0\0\34\360Y\0\0-\1\335\334\300\250\v\22\300\2503H\10\0*\347\244\t)\17", 28, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.51.72")}, 16) = 28
sendto(4, "E\0\0(\2\275\0\0;\6\275h\300\250\v\22\300\2503H\367\205\0Pu\303\312^\312\303\312^"..., 40, 0, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.51.72")}, 16) = 40
select(7, [6], NULL, NULL, {0, 20000})  = 1 (in [6], left {0, 20000})
recvfrom(6, "\0\24\"\0229+\0\23\200\6\301A\10\0E\0\0\34\252\274\0\0?\1\21z\300\2503H\300\250"..., 104, MSG_TRUNC, {sa_family=AF_PACKET, proto=0x800, if5, pkttype=PACKET_HOST, addr(6)={1, 00138006c141}, [18]) = 60
ioctl(6, SIOCGSTAMP, 0x7fff78948b20)    = 0
close(4)                                = 0
close(5)                                = 0
close(6)                                = 0
...
创建原始套接字, 向server2端口0和80发送数据包

4) server2关闭iptables:
$ strace nmap server2 -p22
...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.51.72")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, [3], [3], [3], {1, 0})        = 2 (in [3], out [3], left {1, 0})
recvfrom(3, 0x7ffffaa7bed0, 255, 0, 0, 0) = -1 ECONNREFUSED (Connection refused)
close(3)                                = 0
...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL)                       = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.51.72")}, 16) = -1 EINPROGRESS (Operation now in progress)
select(4, [3], [3], [3], {0, 99000})    = 1 (out [3], left {0, 99000})
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(3, "", 0, 0, NULL, 0)            = 0
getpeername(3, {sa_family=AF_INET, sin_port=htons(22), sin_addr=inet_addr("192.168.51.72")}, [17179869200]) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(34707), sin_addr=inet_addr("192.168.11.18")}, [68719476752]) = 0
close(3)                                = 0
...
连接server2的80端口, 1次连接被拒绝(Connection refused)后, 不退出, 继续扫描22端口

5) root从server3扫描server2
# nmap -v -d9 --packet-trace -n server2 -p22

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2009-12-24 08:10 UTC
The max # of sockets we are using is: 0
--------------- Timing report ---------------
  hostgroups: min 1, max 100000
  rtt-timeouts: init 1000, min 100, max 10000
  scan-delay: TCP 1000, UDP 1000
  parallelism: min 0, max 0
  max-retries: 10, host-timeout: 0
---------------------------------------------
Initiating ARP Ping Scan against 192.168.51.72 [1 port] at 08:10
Pcap filter: arp and ether dst host 00:16:C0:A8:33:46
Packet capture filter (device eth0): arp and ether dst host 00:16:C0:A8:33:46
SENT (0.0140s) ARP who-has 192.168.51.72 tell 192.168.51.71
**TIMING STATS**: IP, probes active/freshportsleft/retry_stack/outstanding/retranwait/onbench, cwnd/ccthresh/delay, timeout/srtt/rttvar/
   Groupstats (1/1 incomplete): 1/*/*/*/*/* 10.00/50/* 100000/-1/-1
   192.168.51.72: 1/0/0/1/0/0 10.00/50/0 100000/-1/-1
RCVD (0.0150s) ARP reply 192.168.51.72 is-at 00:16:C0:A8:33:47
Timeout vals: srtt: -1 rttvar: -1 to: 100000 delta 364 ==> srtt: 364 rttvar: 5000 to: 100000
Timeout vals: srtt: -1 rttvar: -1 to: 100000 delta 364 ==> srtt: 364 rttvar: 5000 to: 100000
The ARP Ping Scan took 0.01s to scan 1 total hosts.
Initiating SYN Stealth Scan against 192.168.51.72 [1 port] at 08:10
Pcap filter: dst host 192.168.51.71 and (icmp or (tcp and (src host 192.168.51.72)))
Packet capture filter (device eth0): dst host 192.168.51.71 and (icmp or (tcp and (src host 192.168.51.72)))
SENT (0.0380s) TCP 192.168.51.71:35569 > 192.168.51.72:22 S ttl=57 id=4614 iplen=44 seq=1639929488 win=2048
**TIMING STATS**: IP, probes active/freshportsleft/retry_stack/outstanding/retranwait/onbench, cwnd/ccthresh/delay, timeout/srtt/rttvar/
   Groupstats (1/1 incomplete): 1/*/*/*/*/* 10.00/50/* 1000000/-1/-1
   192.168.51.72: 1/0/0/1/0/0 10.00/50/0 100000/364/5000
RCVD (0.0390s) TCP 192.168.51.72:22 > 192.168.51.71:35569 SA ttl=64 id=0 iplen=44 seq=474347775 win=5840 ack=1639929489
Discovered open port 22/tcp on 192.168.51.72
Changing ping technique for 192.168.51.72 to TCP
Timeout vals: srtt: 364 rttvar: 5000 to: 100000 delta 9 ==> srtt: 365 rttvar: 3752 to: 100000
Timeout vals: srtt: -1 rttvar: -1 to: 1000000 delta 373 ==> srtt: 373 rttvar: 5000 to: 100000
The SYN Stealth Scan took 0.01s to scan 1 total ports.
Fetchfile found /usr/share/nmap/nmap-mac-prefixes

Host 192.168.51.72 appears to be up ... good.
Interesting ports on 192.168.51.72:
Fetchfile found /usr/share/nmap/nmap-services

PORT   STATE SERVICE
22/tcp open  ssh
MAC Address: 00:16:C0:A8:33:47 (Semtech)
Final times for host: srtt: 365 rttvar: 3752  to: 100000

Nmap finished: 1 IP address (1 host up) scanned in 0.142 seconds
               Raw packets sent: 2 (86B) | Rcvd: 2 (86B)
发送ARP扫描


3. 分析
查看nmap手册
$ man nmap
...
HOST DISCOVERY
       One of the very first steps in any network reconnaissance mission is to reduce a (sometimes huge) set of IP ranges into a list of active or interesting hosts. Scanning every port of every single IP address is slow and usually unnecessary. Of course what makes a host interesting depends greatly on the scan purposes. Network administrators may only be interested in hosts running a certain service, while security auditors may care about every single device with an IP address. An administrator may be comfortable using just an ICMP ping to locate hosts on his internal network, while an external penetration tester may use a diverse set of dozens of probes in an attempt to evade firewall restrictions.

       Because host discovery needs are so diverse, Nmap offers a wide variety of options for customizing the techniques used. Host discovery is sometimes called ping scan, but it goes well beyond the simple ICMP echo request packets associated with the ubiquitous ping tool. Users can skip the ping step entirely with a list scan (-sL) or by disabling ping (-P0), or engage the network with arbitrary combinations of multi-port TCP SYN/ACK, UDP, and ICMP probes. The goal of these probes is to solicit responses which demonstrate that an IP address is actually active (is being used by a host or network device). On many networks, only a small percentage of IP addresses are active at any given time. This is particularly common with RFC1918-blessed private address space such as 10.0.0.0/8. That network has 16 million IPs, but I have seen it used by companies with less than a thousand machines. Host discovery can find those machines in a sparsely allocated sea of IP addresses.

       If no host discovery options are given, Nmap sends a TCP ACK packet destined for port 80 and an ICMP Echo Request query to each target machine. An exception to this is that an ARP scan is used for any targets which are on a local ethernet network. For unprivileged UNIX shell users, a SYN packet is sent instead of the ack using the connect() system call. These defaults are equivalent to the -PA -PE options. This host discovery is often sufficient when scanning local networks, but a more comprehensive set of discovery probes is recommended for security auditing.
...
       -sP (Ping Scan)
              This option tells Nmap to only perform a ping scan (host discovery), then print out the available hosts that responded to the scan. No further testing (such as port scanning or OS detection) is performed. This is one step more intrusive than the list scan, and can often be used for the same purposes. It allows light reconnaissance of a target network without attracting much attention. Knowing how many hosts are up is more valuable to attackers than the list provided by list scan of every single IP and host name.

              Systems administrators often find this option valuable as well. It can easily be used to count available machines on a network or monitor server availability. This is often called a ping sweep, and is more reliable than pinging the broadcast address because many hosts do not reply to broadcast queries.

              The -sP option sends an ICMP echo request and a TCP packet to port 80 by default. When executed by an unprivileged user, a SYN packet is sent (using a connect() call) to port 80 on the target. When a privileged user tries to scan targets on a local ethernet network, ARP requests (-PR) are used unless --send-ip was specified. The -sP option can be combined with any of the discovery probe types (the -P* options, excluding -P0) for greater flexibility. If any of those probe type and port number options are used, the default probes (ACK and echo request) are overridden. When strict firewalls are in place between the source host running Nmap and the target network, using those advanced techniques is recommended. Otherwise hosts could be missed when the firewall drops probes or their responses.
...
nmap在检测服务和端口之前, 默认首先检测主机是否是活动的, 这被称为主机发现(Host Disvocery)或ping扫描(ping scan)
如果主机发现不通过, 则不进行后面的端口扫描
主机发现的方式有多种, 默认方式是向主机80端口发送一个TCP ACK包, 并发送ICMP 回显请求(Echo Request)查询, 如在同一局域网则使用ARP扫描. 对于UNIX下非特权用户, 不发送ack包, 而是使用connect()系统调用发送一个SYN包.


4. 结论
1) 特权用户才可以建立原始套接字, 使用ARP, ICMP, ACK扫描

2) 普通用户运行nmap扫描server1:
nmap使用connect向80端口发送SYN, 因为server1上启动了tomcat服务, 80端口可访问, 所以主机发现成功通过

3) 普通用户运行nmap扫描打开iptables的server2:
nmap使用connect向80端口发送SYN, server2上iptables不允许访问80端口, 造成扫描超时错误, nmap认为主机不可达, 所以主机发现失败

4) 普通用户运行nmap扫描关闭iptables的server2:
nmap使用connect向80端口发送SYN, server2上没有监听80端口的服务, 造成连接被拒绝, nmap认为主机可到达, 所以主机发现通过

5) 特权用户运行nmap扫描打开iptables的server2:
nmap使用原始套接字发出ICMP回显请求, server2上iptables允许ICMP, 所以主机发现能通过

6) 普通用户ping命令扫描打开iptables的server2:
因为ping命令拥有setuid root权限, 运行时具有root权限, 可以创建原始套接字用ICMP扫描, 所以能ping通主机
$ ls -l $(which ping)
-rwsr-xr-x 1 root root 37280 Mar 14  2007 /bin/ping

6) 特权用户从server3扫描打开iptables的server2:
因为server2和server3处于同一子网, 所以使用ARP扫描, ARP属于第2层数据链路层, iptables无法过滤, 所以主机发现能通过



外部链接:
Chapter 15. Nmap Reference Guide
The Art of Port Scanning - by Fyodor





-fin-

Thursday, December 17, 2009

How to download oracle patches from CLI

How to download oracle patches from CLI
如何从命令行下载Oracle补丁


Oracle的FTP下载站点已于2009年11月6号关闭, 不再提供服务, 下载补丁只能通过My Oracle Support网站
$ ftp updates.oracle.com
Connected to updates.oraclegha.com.
421-*********************** Downtime Notice ************************
421-
421-This service was retired as of November 06, 2009.
421-
421-****************************************************************
421
ftp>

其实可以用命令行工具以HTTPS方式下载补丁
补丁的URL地址格式是:

https://updates.oracle.com/Orion/Services/download/?aru=<ARU编号>
ARU(Automated Release Updates)编号可在My Oracle Support网站上查到
1) 访问My Oracle Support网站
oraclepatch-01-home.png
2) 登录
oraclepatch-02-signin.png
3) 选择"Patches & Updates"
oraclepatch-03-tab2.png
4) 查找补丁
oraclepatch-04-advsearch.png
oraclepatch-05-search.png
5) 单击选择该行, 弹出菜单, 选择"View Read Me"
oraclepatch-06-viewreadme.png
6) 在说明文档的URL地址栏中找到ARU编号
oraclepatch-07-readme.png

比如,得到10.2.0.4 Patch Set的下载地址是:
https://updates.oracle.com/Orion/Services/download/?aru=10029612
然后用wget工具下载即可, 在命令行中还需提供登录用户名和密码, 如:
wget --http-user="myusername" --http-password="mypassword" "https://updates.oracle.com/Orion/Services/download/?aru=10029612"
对有密码保护的补丁, URL中再加上patch_password参数, 如:
https://updates.oracle.com/Orion/Services/download/?aru=12345678&patch_password=nosuchpasswd


wget常见问题:
1. File name too long
报错"File name too long", 无法写文件
Resolving oracle-updates.oracle.com... 68.142.116.74, 68.142.116.73
Connecting to oracle-updates.oracle.com|68.142.116.74|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 111,654 (109K) [application/zip]
p8524113_10204_Linux-x86-64.zip?download_url=http:%2F%2Fupdates.oracle.com%2FARULink%2FDownload%2Fprocess_form&userid=o-wenxie@motorola.com&email=wenxie@motorola.com&record_download_url=http:%2F%2Fupdates.oracle.com%2FARULink%2FPatchDownload%2Fpack_stats&timestamp=1260979159&patch_password=NO_PASS&context=A@10+H@aaru203.oracle.com+P@&aru=11423647&patch_file=p8524113_10204_Linux-x86-64.zip&ip_address=123.127.66.13&file_id=27511963&use_proxy=false&id=ZWtXM1l3WXFjVUlEZ2YwVDFUZ1ZOQTpHOWEvMlhXSTlRTDJRL2RKbXMvejJB: File name too long

Cannot write to `p8524113_10204_Linux-x86-64.zip?download_url=http:%2F%2Fupdates.oracle.com%2FARULink%2FDownload%2Fprocess_form&userid=o-wenxie@motorola.com&email=wenxie@motorola.com&record_download_url=http:%2F%2Fupdates.oracle.com%2FARULink%2FPatchDownload%2Fpack_stats&timestamp=1260979159&patch_password=NO_PASS&context=A@10+H@aaru203.oracle.com+P@&aru=11423647&patch_file=p8524113_10204_Linux-x86-64.zip&ip_address=123.127.66.13&file_id=27511963&use_proxy=false&id=ZWtXM1l3WXFjVUlEZ2YwVDFUZ1ZOQTpHOWEvMlhXSTlRTDJRL2RKbXMvejJB' (File name too long).

需要指定下载文件名, 该文件名可从wget输出中得到
文件名格式为:
p<补丁编号>_<数据库版本>_<操作系统>.zip
如:
p6810189_10204_Linux-x86-64.zip
也可在My Oracle Support查到
1) 访问My Oracle Support网站
2) 登录
3) 选择"Patches & Updates"
4) 查找补丁
5) 点该行开头的补丁编号
oraclepatch-11-detail.png
6) 显示出补丁详细信息, 单击"Show File Details"
oraclepatch-12-filedetail.png
7) 显示出文件名
oraclepatch-13-filename.png

在wget命令中加-O选项指定下载文件名:
wget --http-user="myusername" --http-password="mypassword" -O "p6810189_10204_Linux-x86-64.zip" "https://updates.oracle.com/Orion/Services/download/?aru=11423647"

2. 加-c选项不能断点续传
报错304 Not Modified
Location: https://updates.oracle.com/Orion/Services/download/?aru=10029612 [following]
--15:59:04--  https://updates.oracle.com/Orion/Services/download/?aru=10029612
Connecting to updates.oracle.com|141.146.44.51|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://oracle-updates.oracle.com/ARUConnect/p6810189_10204_Linux-x86-64.zip?download_url=http%3A%2F%2Fupdates.oracle.com%2FARULink%2FDownload%2Fprocess_form&userid=o-wenxie%40motorola.com&record_download_url=http%3A%2F%2Fupdates.oracle.com%2FARULink%2FPatchDownload%2Fpack_stats&email=wenxie%40motorola.com&timestamp=1260979145&patch_password=NO_PASS&context=A%4010%2BH%40aaru204.oracle.com%2BP%40&ip_address=123.127.66.8&patch_file=p6810189_10204_Linux-x86-64.zip&aru=10029612&file_id=22503408&use_proxy=false&id=QUViSEhzNWVGZFlpMGZvVjBlRnZwQTppeWJXRk9Mc2xiVitPU2F1UTNrd21R [following]
--15:59:05--  http://oracle-updates.oracle.com/ARUConnect/p6810189_10204_Linux-x86-64.zip?download_url=http%3A%2F%2Fupdates.oracle.com%2FARULink%2FDownload%2Fprocess_form&userid=o-wenxie%40motorola.com&record_download_url=http%3A%2F%2Fupdates.oracle.com%2FARULink%2FPatchDownload%2Fpack_stats&email=wenxie%40motorola.com&timestamp=1260979145&patch_password=NO_PASS&context=A%4010%2BH%40aaru204.oracle.com%2BP%40&ip_address=123.127.66.8&patch_file=p6810189_10204_Linux-x86-64.zip&aru=10029612&file_id=22503408&use_proxy=false&id=QUViSEhzNWVGZFlpMGZvVjBlRnZwQTppeWJXRk9Mc2xiVitPU2F1UTNrd21R
Resolving oracle-updates.oracle.com... 68.142.116.74, 68.142.116.73
Connecting to oracle-updates.oracle.com|68.142.116.74|:80... connected.
HTTP request sent, awaiting response... 304 Not Modified
15:59:05 ERROR 304: Not Modified.

同样需要指定下载文件名:
wget --http-user="myusername" --http-password="mypassword" -c -O "p6810189_10204_Linux-x86-64.zip" "https://updates.oracle.com/Orion/Services/download/?aru=11423647"

3. HTTPS证书出错
加上--no-check-certificate选项


外部链接:
Classic MetaLink and My Oracle Support Transition Information Page
Retrieving Oracle patches with wget
Automated 'wget' patch downloads: issue resolution



-fin-

Monday, November 23, 2009

trim spaces from a shell string

how to remove leading and trailing spaces from a shell variable?
如何删除shell字符串变量中的前缀的和后缀的空格?


var="   hello world!    "
echo "var=\"$var\""
var="   hello world!    "


1. echo
newvar=$(echo $var)
echo "newvar=\"$newvar\""
newvar="hello world!"


2. Parameter Substitution
newvar=${var#"${var%%[![:blank:]]*}"}
newvar=${newvar%"${newvar##*[![:blank:]]}"}
echo "newvar=\"$newvar\""
newvar="hello world!"


3. Pattern Matching
shopt -s extglob
newvar=${var##+( )}
newvar=${newvar%%+( )}
shopt -u extglob
echo "newvar=\"$newvar\""
newvar="hello world!"



4. sed
newvar=$(echo "$var"|sed 's/\(^[[:blank:]]\+\|[[:blank:]]\+$\)//g')
echo "newvar=\"$newvar\""
newvar="hello world!"


5. awk
newvar=$(echo "$var"|awk 'gsub(/^[[:blank:]]+|[[:blank:]]+$/,"")')
echo "newvar=\"$newvar\""
newvar="hello world!"



-fin-

Friday, October 30, 2009

how to resync a broken master-master replication

how to resync a broken master-master replication

[still in draft]

db01:3306 and db03:3306 are setting up with master-master replicaition.

in db03 the slave process is accidentally stopped for quite a long time that caused by a wrong configuration, after resolving the problem, dual master replication seems to start working, but unfortunately binary logs were delete older than x days, many tables has got out of date or even lost in db03, recreating the slave db is not acceptable, we have to find a more efficient way to make thier data resync again.


1. stop replication from db03 to db01
login db01
show slave status\G
stop slave;
show slave status\G


2. dump sql script for recreating table structure later
login db01
DB_LIST=$(mysql -u root --batch --skip-column-names -e 'show databases;'|sed '/^information_schema$\|^mysql$\|^test$/d'|tr '\n' ' ')
mysqldump -u root --no-data --skip-add-drop-table --databases $DB_LIST >$HOME/db01-3306-db.sql

login db03
DB_LIST=$(mysql -u root --batch --skip-column-names -e 'show databases;'|sed '/^information_schema$\|^mysql$\|^test$/d'|tr '\n' ' ')
mysqldump -u root --no-data --skip-add-drop-table --databases $DB_LIST >$HOME/db03-3306-db.sql


3. recreate lost tables
login db03
scp db01:db01-3306-db.sql $HOME/
mysql -u root --force <$HOME/db01-3306-db.sql

login db01
scp db03:db03-3306-db.sql $HOME/
mysql -u root --force <$HOME/db03-3306-db.sql


4. solve slave sql issues in db03, if any
login db03
show slave status\G
set global sql_slave_skip_counter=1;
start slave;
show slave status\G

now db01 and db03 are actually working in master-slave replication.


5. check data consistency between db01 and db03
run mk-table-checksum against db01, it will execute sql queries to calculate crc checksum of tables in db01, then in db02 the same queries will run too, and calculate checksum in db02, finally we will get checksums of both databases.

mk-table-checksum --ignore-databases=mysql,test --replicate mysql.checksum --empty-replicate-table --create-replicate-table h=db01,P=3306,u=webuser


6. sync table records from db01 to db03
run mk-table-sync with dry-run or print option, see what will happen in db03, it will not actually take any actions.

mk-table-sync --dry-run --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser
mk-table-sync --print --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser

actually sync table data
mk-table-sync --execute --ignore-databases=mysql,test --sync-to-master --replicate=mysql.checksum h=db03,P=3306,u=webuser


7. known issue
"Can't make changes on the master: no unique index exists at /usr/bin/mk-table-sync line 3775. while doing dbtest.t"
solution: create unique index on the table
it is not a perfect solution, because there may be no suitable columns for defining an unique index, or there may be duplicate records in the table.


8. check to see if all data in both databases are identical
mk-table-checksum --ignore-databases=mysql,test -P 3306 -u webuser db01 db03|mk-checksum-filter


9. start replication from db03 to db01
login db01
show slave status\G
start slave;
show slave status\G


10. external links:

mk-table-checksum
mk-table-sync
How to sync tables in master-master MySQL replication
MySQL Master-Master replication table sync
Maatkit Options for Restoring a Slave or Master
Tip of the Day — Checking Your Tables — Part I
Sync up a MySQL slave with mk-table-sync
mk-table-sync and small tables



-fin-

Tuesday, October 27, 2009

how to generate sequence in bash

how to generate sequence in bash
BASH脚本如何产生序号


1. BASH FOR循环结构(Looping Constructs)
$ for ((a=1;a<=5;a++));do echo $a;done
1
2
3
4
5
这种用法只能在FOR循环中使用


2. BASH大括号扩展机制(Brace Expansion)
$ echo {1..10}
1 2 3 4 5 6 7 8 9 10
$ echo {z..a}
z y x w v u t s r q p o n m l k j i h g f e d c b a
bash第4版增加了功能,可指定步长

开头补零:
$ printf "%02d " {6..12}
06 07 08 09 10 11 12 


3. seq命令
seq命令是属于coreutils包的一个工具, 用于生成一组数字序列号, 如:
$ seq 1 2 10
1
3
5
7
9

开头补零:
$ seq -f %02.0f 6 12
06
07
08
09
10
11
12


4. awk
$ awk 'BEGIN {for (i=1;i<=5;i++) print i}'
1
2
3
4
5


5. jot
BSD系统中提供, 可以产生序列,随机数
jot - UNIX Power Tools, 2nd Edition
Using BSD jot
jot (BSD)
JOT(1)



-fin-

Monday, October 26, 2009

monitoring io statistics by process

monitoring io statistics by process
监控进程的IO统计信息


1. I/O accounting
Linux内核2.6.20增加了IO审计功能, 要使用这个功能需要首先在编译内核时打开TASK_DELAY_ACCT和TASK_IO_ACCOUNTING参数
RedHat Linux在内核2.6.18-144之后也加入了该功能, 并在RHEL5.4版本(kernel 2.6.18-164)中发布, 可以直接使用

一些基于IO审计的监控工具:
1)IOtop
Iotop is a Python program with a top like UI used to show of behalf of which process is the I / O going on. It requires Python ≥ 2.5 and a Linux kernel ≥ 2.6.20 with the TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled. It requires Python ≥ 2.5 and ≥ Linux kernel 2.6.20 with the TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled.

2)pidstat(SYSSTAT工具之一)
Report I/O statistics (kernels 2.6.20 and later only)

3)Dstat: Versatile resource statistics tool
0.6.7 版本后支持显示topio,topbio

4)htop - an interactive process viewer for Linux
显示RBYTES,WBYTES,IO_RATE等列

5)collectl - Process I/O Stats
2.4.0版本之后支持显示io统计信息, 见Process I/O Stats

外部链接
Red Hat backported I/O accounting to RHEL5
Red Hat Enterprise Linux 5.4 - Release Notes


2. block_dump
设置内核参数block_dump为非零值, 内核将报告所有磁盘读写操作和脏块刷新的信息

下载iodump脚本, 用于分析处理内核报告
wget http://maatkit.googlecode.com/svn/trunk/util/iodump

打开block_dump
echo 1 > /proc/sys/vm/block_dump

通过iodump分析dmesg输出的内核报告
while true; do sleep 1; dmesg -c; done | perl iodump

运行一会儿, 按Ctrl-c停止输出, 得到类似下面的结果
# while true; do sleep 1; dmesg -c; done | perl iodump
# Caught SIGINT.
TASK                   PID      TOTAL       READ      WRITE      DIRTY DEVICES
kjournald              478       1294          0       1294          0 sda2
pdflush              21422        857          0        857          0 sda5, sdb6
kjournald             1752        365          0        365          0 sda5
kjournald             1758        174          0        174          0 sdb6
kjournald             1756         52          0         52          0 sdb5
syslogd              26752         28          0         28          0 sdb6
firefox              12811         27         27          0          0 sdb5
perl                  3003         25         25          0          0 sda2
process_perfdat       3026         12         12          0          0 sda5
bash                 14749          3          3          0          0 sda2
firefox              12654          3          3          0          0 sdb5
bash                  3004          2          2          0          0 sda2
bash                  3017          1          1          0          0 sda2
squid                25555          1          1          0          0 sda2
check_nrpe            3001          1          1          0          0 sda2


最后关闭block_dump
echo 0 > /proc/sys/vm/block_dump

外部链接:
block_dump
How to find per-process I/O statistics on Linux
Monitoring filesystem activity under Linux with block_dump
How can I record what process or kernel activity is using the disk in GNU/Linux?
How to identify what processes are generating IO Wait load.



3. blktrace
支持block trace的内核版本:
– Patch for Linux 2.6.14rc3
(or later, up to 2.6.17)
– Linux 2.6.17 (or later) – built in

安装
yum -y install blktrace

挂载debugfs文件系统
mount -t debugfs debugfs /sys/kernel/debug

运行blktrace
blktrace -d /dev/sda2 -o - | blkparse -i - -s

运行一段时间后, CTRL-C退出, 显示统计信息:
...
...
...
  8,2    0     1576     0.618648628   478  U   N [kjournald] 1
  8,2    0     1577     0.618669180     0  C   W 375869 + 128 [0]
  8,2    0     1578     0.618763920   478  Q   W 376045 + 8 [kjournald]
  8,2    0     1579     0.618765468   478  G   W 376045 + 8 [kjournald]
  8,2    0     1580     0.618766125   478  P   N [kjournald]
  8,2    0     1581     0.618766440   478  I   W 376045 + 8 [kjournald]
  8,2    0     1582     0.618767278   478  U   N [kjournald] 1
  8,2    0     1583     0.618768643   478  D   W 376045 + 8 [kjournald]
  8,2    0     1584     0.618824993     0  C   W 376045 + 8 [0]
blktrace (5609)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        2,        8KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        2,       12KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
kjournald (478)
 Reads Queued:           0,        0KiB  Writes Queued:         330,    1,320KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       42,    1,320KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        4,       76KiB
 Read Merges:            0,        0KiB  Write Merges:          288,    1,152KiB
 IO unplugs:            12               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
nagios (5574)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        1,       16KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
pdflush (776)
 Reads Queued:           0,        0KiB  Writes Queued:         126,      504KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:       33,      132KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:       33,      136KiB
 Read Merges:            0,        0KiB  Write Merges:            1,        4KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0
swapper (0)
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      131,      536KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      233,    2,404KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 IO unplugs:             0               Timer unplugs:           0
 Allocation wait:        0               Allocation wait:         0
 Dispatch wait:          0               Dispatch wait:           0
 Completion wait:        0               Completion wait:         0

CPU0 (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:         456,    1,824KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      208,    1,996KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      273,    2,644KiB
 Read Merges:            0,        0KiB  Write Merges:          289,    1,156KiB
 Read depth:             0               Write depth:            20
 IO unplugs:            11               Timer unplugs:           0
CPU2 (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:        0,        0KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB  Write Merges:            0,        0KiB
 Read depth:             0               Write depth:            20
 IO unplugs:             1               Timer unplugs:           0

Total (8,2):
 Reads Queued:           0,        0KiB  Writes Queued:         456,    1,824KiB
 Read Dispatches:        0,        0KiB  Write Dispatches:      208,    1,996KiB
 Reads Requeued:         0               Writes Requeued:         0
 Reads Completed:        0,        0KiB  Writes Completed:      273,    2,644KiB
 Read Merges:            0,        0KiB  Write Merges:          289,    1,156KiB
 IO unplugs:            12               Timer unplugs:           0

Throughput (R/W): 0KiB/s / 4,278KiB/s
Events (8,2): 1,585 entries
Skips: 0 forward (0 -   0.0%)

最后卸载debug文件系统
umount /sys/kernel/debug

外部链接:
Block I/O Layer Tracing: blktrace
blktrace User Guide
Tracing I/O usage on Linux


4. atop
对内核源码打补丁, 能够支持显示每个进程的IO


5. systemtap
例子见http://bpineau.livejournal.com/


6. DTrace
Solaris专用
iotop脚本见Top Ten DTrace (D) Scripts
iotop - display top disk I/O events by process.


7. 其它
pio and topio: for solaris,hp-ux,windows
Psio + other disk I/O by process tools: for solaris only



-fin-

Tuesday, October 20, 2009

setting up pitrtools

setting up pitrtools

pitrtools也是一种管理postgresql备库的工具集, 由2个python编写的小脚本组成
cmd_archiver, 调用rsync复制归档日志文件, 支持向多个备用库服务器传输文件, 这里简单介绍了传输文件的流程PITRTools: Multiple slave support
cmd_standby, 将数据库从主库备份到备库, 调用pg_standby在备库上应用归档日志


1. 测试环境:
primary(master) db: pgdb01
standby(slave) db: pgdb02
db version: Postgresql 8.3.7
os version: CentOS 5.3


2. 配置ssh登录
略, 见前篇博客setting up postgresql warm standby


3. 安装Pitrtools
登录pgdb01,pgdb02, 下载安装
wget -O - http://files.commandprompt.com/pitrtools/pitrtools-1.2.tar.bz2|tar -C $HOME -jxf -
chmod +x $HOME/pitrtools-1.2/{cmd_archiver,cmd_standby}


4. 主库配置archiver

编辑配置文件
cat >$HOME/pitrtools-1.2/cmd_archiver.ini <<EOF
[DEFAULT]
state: online
pgdata: /var/lib/pgsql/data
r_archivedir: /var/lib/pgsql/archive
l_archivedir: /var/lib/pgsql/archive
rsync_bin: /usr/bin/rsync
rsync_version = 2
slaves: pgdb02
user: postgres
timeout: 10
notify_ok: echo OK
notify_warning:  echo WARNING
notify_critical: echo CRITICAL
debug: on
ssh_debug: off
EOF

初始化建归档目录
$HOME/pitrtools-1.2/cmd_archiver -C $HOME/pitrtools-1.2/cmd_archiver.ini -I
We are initializing queues, one moment.

NOTICE: init_env_func()
NOTICE: generate_slave_list_func()
NOTICE: Your slaves are: ['pgdb02']


为备库建归档日志目录
ssh pgdb02 "mkdir -p /var/lib/pgsql/archive"

为cmd_standby创建辅助函数
psql -U postgres <$HOME/pitrtools-1.2/cmd_standby.sql


5. 主库打开归档模式
sed -i "
\$a \
archive_mode = 'on'\\
archive_command = '\$HOME/pitrtools-1.2/cmd_archiver -C \$HOME/pitrtools-1.2/cmd_archiver.ini -F %p \>\>\$HOME/pitrtools-1.2/cmd_archiver.\$(date \"+%%Y%%m%%d\").log'
/^archive_timeout/s/^/#/g
/^archive_mode\|^archive_command/d
" /var/lib/pgsql/data/postgresql.conf

pg_ctl restart -m fast


6. 备库配置standby
cat >$HOME/pitrtools-1.2/cmd_standby.ini <<EOF
[DEFAULT]
pgversion: 8.3
numarchives: 10
ssh: /usr/bin/ssh
rsync: /usr/bin/rsync
pg_standby: /usr/bin/pg_standby
pg_ctl: /usr/bin/pg_ctl
r_psql: /usr/bin/psql
port: 5432
master_public_ip: pgdb01
master_local_ip: 127.0.0.1
user: postgres
debug: off
ssh_timeout: 30
archivedir: /var/lib/pgsql/archive
pgdata: /var/lib/pgsql/data
postgresql_conf: $HOME/pitrtools-1.2/postgresql.conf
pg_hba_conf: $HOME/pitrtools-1.2/pg_hba.conf
notify_critical: echo critical.
notify_warning: echo warning.
notify_ok: echo ok.
action_failover: echo failover.
EOF

初始化创建备库目录
scp pgdb01:/var/lib/pgsql/data/postgresql.conf $HOME/pitrtools-1.2/
scp pgdb01:/var/lib/pgsql/data/pg_hba.conf $HOME/pitrtools-1.2/
sed -i '/^archive_/s/^/#/g' $HOME/pitrtools-1.2/postgresql.conf
rm -rf /var/lib/pgsql/data
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -I


7. 备份主库到备库
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -B


8. 启动备库
cp -p $HOME/pitrtools-1.2/{postgresql.conf,pg_hba.conf} /var/lib/pgsql/data/
$HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -S
至此备库配置完成


9. 测试主库是否连通
$ $HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -P
SUCCESS: Master returned: 1


10. 打开备库
$ $HOME/pitrtools-1.2/cmd_standby -C $HOME/pitrtools-1.2/cmd_standby.ini -F999
waiting for server to shut down... done
server stopped
server starting
failover.
NOTICE: Statistics are not replicated in warm standy mode.
HINT: Execute ANALYZE on your databases


11. 总结
同Walmgr相比, pitrtools用户必须手工创建一些目录和修改配置文件, 不够自动化, 不能保留主库备份, 程序也写得略显粗糙, 不够专业, 有不少有待完善的地方




外部链接:
Pitr Made Easy



-fin-

Tuesday, October 13, 2009

setting up postgresql warm standby

setting up postgresql warm standby
配置Postgresql备库

其工作原理跟Oracle备库一样, Postgresql数据库也支持将所有操作记录到日志文件(write-ahead log, WAL), 支持定时归档日志文件, 传输到备库服务器上, 然后在备库服务器上应用归档日志, 连续进行基于时间点的恢复(Point-In-Time Recovery), 这种高可用功能被称为warm standby或log shipping.


测试环境:
主库服务器: pgdb01
备库服务器: pgdb02
数据库版本: Postgresql 8.3.7
操作系统版本: CentOS 5.3

1. 设置服务器之间ssh认证, 用于scp拷归档日志
pgdb01上生成公私密钥文件, 配置认证信息, 并将同样的密钥文件和配置信息复制到pgdb02
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
[ -f ~/.ssh/authorized_keys ] && sed -i '/'$(whoami)'@'$(hostname)'/d' ~/.ssh/authorized_keys
cat ~/.ssh/id_dsa.pub >>~/.ssh/authorized_keys
scp -pr $HOME/.ssh pgdb02:

测试一下
$ ssh pgdb02 "hostname -s"
pgdb02
$ ssh pgdb02 "ssh pgdb01 'hostname -s'"
pgdb01


2. 主库pgdb01开启日志归档/传输功能
备库pgdb02上创建归档日志存放目录:
mkdir -p /var/lib/pgsql/archlog/

编辑主库pgdb01配置文件
vi /var/lib/pgsql/data/postgresql.conf
增加如下配置项
archive_mode = on
archive_command = 'scp -C -o ConnectTimeout=30 "%p" pgdb02:/var/lib/pgsql/archlog/"%f"'
archive_timeout = 1800
archive_mode:on表示开启归档功能,off表示关闭. 重启后此参数才能生效
archive_command:归档命令, 调用scp拷贝归档文件至备库
archive_timeout:表示每隔多少秒强制归档一次

重启数据库, 使配置生效
pg_ctl restart -m fast
或用immediate参数停
pg_ctl restart -m immediate


3. 检查日志传输情况
调用pg_switch_xlog手工切换几次日志
$ psql -c "select pg_switch_xlog();"
 pg_switch_xlog
----------------
 2/21000088
(1 row)

如果当前日志中没有任何事务, pg_switch_xlog不会切换日志, 见9.23. System Administration Functions

查看pg_xlog目录
$ psql -c "select pg_current_xlog_location();"
 pg_current_xlog_location
--------------------------
 2/82000000
(1 row)

$ ls -lR /var/lib/pgsql/data/pg_xlog
/var/lib/pgsql/data/pg_xlog:
total 49216
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081
drwxr-xr-x 2 postgres postgres     4096 Oct 12 07:32 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 07:27 00000001000000020000007F.done
-rw------- 1 postgres postgres 0 Oct 12 07:31 000000010000000200000080.done
-rw------- 1 postgres postgres 0 Oct 12 07:32 000000010000000200000081.done
archive_status目录下.done文件表示已传输成功日志的编号, 由postgresql维护,自动删除

查看pgdb02, 归档日志已传到指定目录
$ ls -l /var/lib/pgsql/archlog/
total 49212
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081



4. 备份主库, 恢复到备库服务器上
备库pgdb02停止数据库实例
pg_ctl stop -m fast
删除data目录
rm -rf /var/lib/pgsql/data

pgdb01备份主库, 传输至备库
psql -c "checkpoint;select pg_start_backup('dbbackup');"
tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
psql -c "select pg_stop_backup();"
$ psql -c "checkpoint;select pg_start_backup('dbbackup');"
 pg_start_backup
-----------------
 2/820000B0
(1 row)

$ tar -C /var/lib/pgsql/ -zcf - data |ssh pgdb02 "tar -C /var/lib/pgsql/ -zxf -"
$ ssh pgdb02 'cd /var/lib/pgsql/data && rm -rf pg_log/* && find pg_xlog/ -type f -exec rm -f {} \;'
$ psql -c "select pg_stop_backup();"
 pg_stop_backup
----------------
 2/82000118
(1 row)


备份产生的.backup文件记录了备份信息
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 65624
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres      244 Oct 12 07:56 000000010000000200000082.000000B0.backup
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 000000010000000200000083
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000084
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000085
drwxr-xr-x 2 postgres postgres     4096 Oct 12 07:56 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.000000B0.backup.done
-rw------- 1 postgres postgres 0 Oct 12 07:56 000000010000000200000082.done
$ cat /var/lib/pgsql/data/pg_xlog/000000010000000200000082.000000B0.backup
START WAL LOCATION: 2/820000B0 (file 000000010000000200000082)
STOP WAL LOCATION: 2/82000118 (file 000000010000000200000082)
CHECKPOINT LOCATION: 2/820000B0
START TIME: 2009-10-12 07:55:13 UTC
LABEL: dbbackup
STOP TIME: 2009-10-12 07:56:28 UTC
表示是从000000B0这个位置开始备份的, 解释见24.3.2. Making a Base Backup


5. 配置备库
备库上不需要归档, 所以修改备库配置文件, 关闭归档
sed -i '/^archive_/s/^/#/g' /var/lib/pgsql/data/postgresql.conf

安装postgresql contrib软件包, 使用其中的pg_standby辅助工具
sudo rpm -Uvh http://ftp9.us.postgresql.org/pub/mirrors/postgresql/binary/v8.3.7/linux/rpms/redhat/rhel-5-x86_64/postgresql-contrib-8.3.7-1PGDG.rhel5.x86_64.rpm

编辑恢复配置文件recovery.conf
cat >/var/lib/pgsql/data/recovery.conf <<'EOF'
restore_command = 'pg_standby -l -d -s 2 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/archlog %f %p %r 2>>pg_log/standby.log.$(date +%%Y%%m%%d)'
EOF
-l: 创建归档日志的符号链接, 而不是拷贝
-d: 打开调试信息
-s 2: 每2秒检查一次是否有新的归档日志
-t /tmp/pgsql.trigger.5432: 一旦发现存在触发器文件, 将终止恢复过程, 然后打开备库
各选项解释见F.23. pg_standby

启动备库
pg_ctl start

日志显示已恢复了所有归档日志, 等待着下一个日志
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001.history
WAL file path           : /var/lib/pgsql/archlog/00000001.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000082.000000B0.backup
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000082
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000082
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000082" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000083
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000083
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000083" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000010000000200000082 and later
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
...
...
...

$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 8
drwxr-xr-x 2 postgres postgres 4096 Oct 12 07:55 archive_status
lrwxrwxrwx 1 postgres postgres   63 Oct 12 08:11 RECOVERYHISTORY -> /var/lib/pgsql/archlog/000000010000000200000082.000000B0.backup

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
[postgres@pgdb02.cloud101.bqa4.blurdev.com ~]$ ls -lR /var/lib/pgsql/archlog/
/var/lib/pgsql/archlog/:
total 65620
-rw------- 1 postgres postgres 16777216 Oct 12 07:27 00000001000000020000007F
-rw------- 1 postgres postgres 16777216 Oct 12 07:31 000000010000000200000080
-rw------- 1 postgres postgres 16777216 Oct 12 07:32 000000010000000200000081
-rw------- 1 postgres postgres 16777216 Oct 12 07:56 000000010000000200000082
-rw------- 1 postgres postgres      244 Oct 12 07:56 000000010000000200000082.000000B0.backup
pg_xlog目录下RECOVERYHISTORY文件不知有什么用
archlog目录下归档文件由postgresql维护, 成功恢复后, 会自动删除一些旧的


6. 打开备库
备库处在恢复状态时不能进行任何操作, 必须打开数据库后才能查询SQL
$ psql
psql: FATAL:  the database system is starting up


当pg_standby检测到触发器文件, 就会终止恢复过程, 打开数据库
touch /tmp/pgsql.trigger.5432
日志:
...
...
...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...trigger file found

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 000000010000000200000085
WAL file path           : /var/lib/pgsql/archlog/000000010000000200000085
Restoring to...         : pg_xlog/RECOVERYXLOG
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/000000010000000200000085" "pg_xlog/RECOVERYXLOG"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000002.history
WAL file path           : /var/lib/pgsql/archlog/00000002.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000002.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK
Trigger file            : /tmp/pgsql.trigger.5432
Waiting for WAL file    : 00000001.history
WAL file path           : /var/lib/pgsql/archlog/00000001.history
Restoring to...         : pg_xlog/RECOVERYHISTORY
Sleep interval          : 2 seconds
Max wait interval       : 0 forever
Command for restore     : ln -s -f "/var/lib/pgsql/archlog/00000001.history" "pg_xlog/RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore         : OK

恢复过程停止后, recovery.conf被自动改名为recovery.done
$ ls -lR /var/lib/pgsql/data/pg_xlog/
/var/lib/pgsql/data/pg_xlog/:
total 16412
lrwxrwxrwx 1 postgres postgres       47 Oct 12 09:28 000000020000000200000085 -> /var/lib/pgsql/archlog/000000010000000200000085
-rw------- 1 postgres postgres 16777216 Oct 12 09:28 000000020000000200000086
-rw------- 1 postgres postgres       74 Oct 12 09:28 00000002.history
drwxr-xr-x 2 postgres postgres     4096 Oct 12 09:28 archive_status

/var/lib/pgsql/data/pg_xlog/archive_status:
total 0
-rw------- 1 postgres postgres 0 Oct 12 09:28 00000002.history.ready
$ ls -lR /var/lib/pgsql/data/recovery.*
-rw-r--r-- 1 postgres postgres 142 Oct 12 08:11 /var/lib/pgsql/data/recovery.done

要注意的是, 一旦打开就无法再次进入恢复状态, 只能重建备库



外部链接:
Chapter 25. High Availability, Load Balancing, and Replication
Chapter 24. Backup and Restore
Simple HA with PostgreSQL Point-In-Time Recovery
F.23. pg_standby
Using pg_standby for high availability of Postgresql



7. 记录级别的日志传送(Record-based Log Shipping)
前面实现的是文件级别的日志传送, 以文件为单位进行恢复, 即使设置了定时强制归档, 恢复窗口也太大.
Postgresql支持记录级别的日志传送, 见24.4.4. Record-based Log Shipping, 但没提供相应的工具. 所幸的是Skype公司开发了一套SkyTools, 给广大数据库管理员带去了福音, 其中一款工具walmgr可以方便的创建/管理备库, 并实现了记录级别的日志传送.


8. 安装SkyTools
主备库服务器上安装SkyTools
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/compat-postgresql-libs-4-1PGDG.rhel5.x86_64.rpm
yum -y install python-psycopg2.x86_64
rpm -Uvh http://yum.pgsqlrpms.org/8.3/redhat/rhel-5-x86_64/skytools-2.1.10-1.rhel5.x86_64.rpm
chmod +x /usr/bin/*.py


9. 配置walmgr
说明见walmgr(1)

主库:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name             = pgdb01.cloud101_walgmr_master
logfile              = /var/lib/pgsql/walmgr/logs/master.log
pidfile              = /var/lib/pgsql/walmgr/logs/master.pid
master_db            = dbname=template1
master_data          = /var/lib/pgsql/data
master_config        = /var/lib/pgsql/data/postgresql.conf
slave                = pgdb02:/var/lib/pgsql/walmgr/shipping
slave_config         = /var/lib/pgsql/walmgr/slave.ini
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
loop_delay           = 10.0
use_xlog_functions   = 1
compression          = 1
EOF

备库:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name             = pgdb02.cloud101_walgmr_slave
logfile              = /var/lib/pgsql/walmgr/logs/slave.log
slave_data           = /var/lib/pgsql/data
slave_stop_cmd       = sudo /etc/init.d/postgresql stop
slave_start_cmd      = sudo /etc/init.d/postgresql start
slave_config_dir     = /var/lib/pgsql/data
slave                = /var/lib/pgsql/walmgr/shipping
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
keep_backups         = 5
EOF


10. 配置主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini setup
2009-10-12 10:09:58,499 29592 INFO Configuring WAL archiving
2009-10-12 10:09:58,502 29592 WARNING database must be restarted to enable archiving
2009-10-12 10:09:58,506 29592 INFO Sending SIGHUP to postmaster
2009-10-12 10:09:59,222 29592 INFO Done

运行完setup,配置文件增加了如下2条配置项
archive_mode = 'on'
archive_command = '/usr/bin/walmgr.py /var/lib/pgsql/walmgr/master.ini xarchive %p %f'

重启使修改生效
pg_ctl restart -m fast


11. 备份主库
$ walmgr.py /var/lib/pgsql/walmgr/master.ini backup
2009-10-12 10:27:05,706 18300 INFO Backup lock obtained.
2009-10-12 10:27:05,706 18300 INFO got SystemExit(0), exiting
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,268 18323 INFO First useful WAL file is: 00000001000000020000008B
2009-10-12 10:27:09,744 18329 INFO Backup lock released.
2009-10-12 10:27:09,785 29813 INFO Full backup successful
数据库备份到了备库服务器指定目录上
数据文件放在/var/lib/pgsql/walmgr/shipping/data.master
配置文件放在/var/lib/pgsql/walmgr/shipping/config.backup

日志:
2009-10-12 10:27:05,741 29813 INFO Execute SQL: select pg_start_backup('FullBackup'); [dbname=template1]
2009-10-12 10:27:07,696 29813 INFO Checking tablespaces
2009-10-12 10:27:08,776 29813 INFO Backup conf files from /var/lib/pgsql/data
2009-10-12 10:27:09,785 29813 INFO Full backup successful
2009-10-12 10:27:09,852 29825 INFO {count: 1, duration: 0.387887001038}
2009-10-12 10:27:10,724 29829 INFO {count: 1, duration: 0.725953102112}
2009-10-12 10:27:11,272 29832 INFO {count: 1, duration: 0.396477937698}
2009-10-12 10:27:12,056 29835 INFO {count: 1, duration: 0.628087997437}


12. 启用备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
Stopping postgresql service:                               [  OK  ]
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
Starting postgresql service:                               [  OK  ]
这样备库就建好了

日志:
2009-10-12 10:28:32,363 18411 INFO Stopping postmaster: sudo /etc/init.d/postgresql stop
2009-10-12 10:28:35,535 18411 INFO Move /var/lib/pgsql/data to /var/lib/pgsql/data.2
2009-10-12 10:28:35,536 18411 INFO Copy /var/lib/pgsql/walmgr/shipping/data.master to /var/lib/pgsql/data
2009-10-12 10:28:36,001 18411 INFO Write /var/lib/pgsql/data/recovery.conf
2009-10-12 10:28:36,003 18411 INFO Restoring configuration files
2009-10-12 10:28:36,007 18411 INFO Starting postmaster: sudo /etc/init.d/postgresql start
2009-10-12 10:28:37,094 18511 INFO 00000001.history: not found, ignoring
2009-10-12 10:28:37,094 18511 INFO got SystemExit(1), exiting
2009-10-12 10:28:37,233 18512 INFO 00000001000000020000008B.00000020.backup: Found
2009-10-12 10:28:37,243 18512 INFO {count: 1}
2009-10-12 10:28:37,387 18514 INFO 00000001000000020000008B: Found
2009-10-12 10:28:37,489 18514 INFO {count: 1}


13. 手工同步
在主库上运行
$ walmgr.py /var/lib/pgsql/walmgr/master.ini sync
2009-10-12 10:30:29,591 29862 INFO sent to slave: 00000001000000020000008B @ 0 +16777216


14. 打开备库
$ walmgr.py /var/lib/pgsql/walmgr/slave.ini boot
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode

日志:
2009-10-12 10:31:38,781 18534 INFO Stopping recovery mode
2009-10-12 10:31:39,485 18516 INFO 00000001000000020000008C: not found, stopping
2009-10-12 10:31:39,485 18516 INFO got SystemExit(1), exiting
2009-10-12 10:31:39,628 18536 INFO 00000001000000020000008B: Found
2009-10-12 10:31:39,737 18536 INFO {count: 1}
2009-10-12 10:31:39,881 18538 INFO 00000002.history: not found, ignoring
2009-10-12 10:31:39,882 18538 INFO got SystemExit(1), exiting
2009-10-12 10:31:40,024 18539 INFO 00000001.history: not found, ignoring
2009-10-12 10:31:40,024 18539 INFO got SystemExit(1), exiting


15. 恢复备库
备库打开后无法切换回恢复状态, 必须重新从备份重建. 操作同步骤12
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master


16. 自动定时同步
主库上运行
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &
按loop_delay指定的时间间隔同步


17. 切换主备库
假设pgdb01是主库, pgdb02是备库, 切换主备库角色

1) 按新的主备库角色编辑配置文件
pgdb02:
mkdir -p /var/lib/pgsql/walmgr/logs
cat >/var/lib/pgsql/walmgr/master.ini <<'EOF'
[wal-master]
job_name             = pgdb02.cloud101_walgmr_master
logfile              = /var/lib/pgsql/walmgr/logs/master.log
pidfile              = /var/lib/pgsql/walmgr/logs/master.pid
master_db            = dbname=template1
master_data          = /var/lib/pgsql/data
master_config        = /var/lib/pgsql/data/postgresql.conf
slave                = pgdb01:/var/lib/pgsql/walmgr/shipping
slave_config         = /var/lib/pgsql/walmgr/slave.ini
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
loop_delay           = 10.0
use_xlog_functions   = 1
compression          = 1
EOF

pgdb01:
mkdir -p /var/lib/pgsql/walmgr/{logs,shipping/{logs.{complete,partial},data.master,config.backup}}
cat >/var/lib/pgsql/walmgr/slave.ini <<'EOF'
[wal-slave]
job_name             = pgdb01.cloud101_walgmr_slave
logfile              = /var/lib/pgsql/walmgr/logs/slave.log
slave_data           = /var/lib/pgsql/data
slave_stop_cmd       = sudo /etc/init.d/postgresql stop
slave_start_cmd      = sudo /etc/init.d/postgresql start
slave_config_dir     = /var/lib/pgsql/data
slave                = /var/lib/pgsql/walmgr/shipping
completed_wals       = %(slave)s/logs.complete
partial_wals         = %(slave)s/logs.partial
full_backup          = %(slave)s/data.master
config_backup        = %(slave)s/config.backup
keep_backups         = 5
EOF

2) 停止应用程序, 将应用程序的数据库配置指向pgdb02

3) pgdb01强制同步一次
walmgr.py /var/lib/pgsql/walmgr/master.ini sync

4) 停止pgdb01的后台同步进程
kill $(cat /var/lib/pgsql/walmgr/logs/master.pid)

5) 打开pgdb02
walmgr.py /var/lib/pgsql/walmgr/slave.ini boot

6) 配置pgdb02, 启用归档
walmgr.py /var/lib/pgsql/walmgr/master.ini setup
pg_ctl restart -m fast

7) 现在pgdb02是主库, 启动应用程序, 连接pgdb02

8) 备份pgdb02
walmgr.py /var/lib/pgsql/walmgr/master.ini backup

9) 恢复到pgdb01
walmgr.py /var/lib/pgsql/walmgr/slave.ini restore data.master

10) pgdb02启用后台自动定时同步
nohup walmgr.py /var/lib/pgsql/walmgr/master.ini syncdaemon >>/var/lib/pgsql/walmgr/logs/syncdaemon.log 2>&1 </dev/null &

11) 完成



外部链接:
walmgr
walmgr(1)
Testing PostgreSQL replication solutions: Log shipping with walmgr
Core Services 2.0 skytools setup



18. 其它复制/集群/高可用方案

介绍:
Replication, Clustering, and Connection Pooling
PostgreSQL Replication Solutions - Presentation
PostgreSQL Replication
The Four Horsemen of Replication 1 2 3
Getting Smart about the New World of PostgreSQL Replication

本人认为比较简单好用的几种:
  1. Warm Standby: 基于日志的异步复制, 复制时备库不可用
  2. Hot Standby: 备库可读. postgresql当前版本不支持此功能, 8.5版本开始支持
  3. Mammoth Replicator: Postgresql的修改版, 支持主从异步复制, , 使用上有很多限制, 比如只支持复制一个数据库(schema)
  4. Tungsten Replicator: 支持主从复制, 基于日志分析, 性能很差, 目前只支持Mysql,Oracle复制, 貌似很有前途, 值得一用
    链接:
    Continuent Tungsten Replicator Guide
    The Scale-Out Blog
  5. RubyRep: Ruby脚本编写的, 支持Mysql和Postgresql, 跟据主键或唯一索引比较2个数据库中表的不同, 进行同步. 可以实现主从或主主复制. 估计性能不行
    例子:Yet Another PostgreSQL Replication Tool - RubyRep


19. Postgresql数据库下载地址
http://www.postgresql.org/ftp/
https://projects.commandprompt.com/public/pgcore/wiki/direct_download



-fin-

Monday, October 12, 2009

sort file by length of lines

sort file by length of lines
按每行长度对文件内容排序

比如文件包含一些人名, 想按名称长度进行排序
cat >myfile <<EOF
Andre Anichanov
Andrew Mogrelia
Antonio Vivaldi
Antoni Wit
Capella Istropolitana
Christian Kohn
Cologne Chamber Orchestra
Daniel Rothert
Daniel Taylor
Einar Steen-Nokleberg
Elena Papandreou
Eteri Andjaparidze
EOF

awk得到每行的长度, 按长度排序后显示输出
awk '{print length,$0}' myfile|sort -rn|cut -d" " -f2-
或者将管道写在print后
awk '{print length,$0|"sort -rn|cut -d\" \" -f2-"}' myfile

运行结果
Cologne Chamber Orchestra
Einar Steen-Nokleberg
Capella Istropolitana
Eteri Andjaparidze
Elena Papandreou
Antonio Vivaldi
Andrew Mogrelia
Andre Anichanov
Daniel Rothert
Christian Kohn
Daniel Taylor
Antoni Wit



-fin-

Wednesday, September 23, 2009

introducing maatkit - parallel dump

introducing maatkit - parallel dump
maatkit介绍 - 并行导出

Maatkit是一组为MySQL提供的命令行工具集, 是由Percona公司开发的开源软件
该公司同时还开发了XtraDB存储引擎, XtraBackup热备工具, MySQL增强补丁版本Percona

包括:
  • mk-archiver 将表数据清除或归档到另外的表或文件
  • mk-audit 分析MySQL的配置,概要,操作, 生成报表
  • mk-checksum-filter mk-table-checksum的过滤器
  • mk-deadlock-logger 记录InnoDB的死锁信息
  • mk-duplicate-key-checker 查找重复或冗余的外键和索引
  • mk-fifo-split 将一个文件拆分为多个部分, 输出到FIFO管道(有用吗?)
  • mk-find 按指定规则查找表名, 然后执行操作
  • mk-heartbeat 监视数据库之间复制的延迟
  • mk-log-player 拆分并重演慢速查询日志
  • mk-parallel-dump 多线程导出
  • mk-parallel-restore 多线程导入
  • mk-profile-compact 压缩mk-query-profiler输出
  • mk-query-digest 分析日志
  • mk-query-profiler 查询性能分析工具
  • mk-show-grants 显示用户权限
  • mk-slave-delay 实现备库与主库之间一定的延时
  • mk-slave-find 查找/显示出备库的树型层次结构
  • mk-slave-move 在层次结构中移动备库(什么玩意?)
  • mk-slave-prefetch 在备库上运行SELECT查询语句, 使数据预读取到内存中
  • mk-slave-restart 监测备库发生的错误并重启
  • mk-table-checksum 快速检测两个表的数据是否相同. 可以用来检测备库和主库的数据一致性
  • mk-table-sync 发现并修复不同服务器上的两个表之间的数据差异
  • mk-upgrade 比较2个数据库中语句的运行结果
  • mk-visual-explain 以树形显示执行计划



1. 安装
http://maatkit.googlecode.com/处下载RPM包进行安装
yum -y install perl-TermReadKey.x86_64
rpm -Uvh http://maatkit.googlecode.com/files/maatkit-4623-1.noarch.rpm

依赖以下安装包
# rpm -q --requires maatkit
/usr/bin/env
perl(DBD::mysql) >= 1.0
perl(DBI)
perl(DBI) >= 1.13
perl(Data::Dumper)
perl(Digest::MD5)
perl(English)
perl(Exporter)
perl(File::Basename)
perl(File::Find)
perl(File::Spec)
perl(File::Temp)
perl(Getopt::Long)
perl(IO::File)
perl(List::Util)
perl(POSIX)
perl(Socket)
perl(Term::ReadKey) >= 2.10
perl(Time::HiRes)
perl(Time::Local)
perl(constant)
perl(sigtrap)
perl(strict)
perl(warnings)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1


2. 并行导出mk-parallel-dump和导入mk-parallel-restore
使用方法见:
mk-parallel-dump - Dump sets of MySQL tables in parallel.
mk-parallel-restore - Load files into MySQL in parallel.

1) 生成测试表
生成一个3百万行记录的测试表
mysql --socket=/var/lib/mysql/data_3306/mysql.sock

set autocommit=0;
drop database if exists dbtest;
create database dbtest;
use dbtest;
drop table if exists t1;
create table t1 (
  id int(9) not null auto_increment,
  name varchar(20) not null,
  age int(3) not null,
  notes varchar(100),
  primary key (id),
  index ind_t1_name (name)
);
truncate table t1;
insert into t1 (name, age, notes)
select conv(floor(rand() * 99999999999999), 10, 36), floor(1+rand()*(100-1)), md5(rand())
  from information_schema.COLUMNS a
       , information_schema.COLUMNS b
       , information_schema.COLUMNS c
 limit 3000000;
commit;
大小300多M
mysql> insert into t1 (name, age, notes)
    -> select conv(floor(rand() * 99999999999999), 10, 36), floor(1+rand()*(100-1)), md5(rand())
    ->   from information_schema.COLUMNS a
    ->        , information_schema.COLUMNS b
    ->        , information_schema.COLUMNS c
    ->  limit 3000000;
Query OK, 3000000 rows affected (3 min 30.61 sec)
Records: 3000000  Duplicates: 0  Warnings: 0

mysql> commit;
Query OK, 0 rows affected (0.07 sec)
mysql> show table status like 't1';
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
| Name | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment              |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
| t1   | InnoDB |      10 | Compact    | 3000249 |             76 |   228294656 |               0 |     86654976 |         0 |        6000000 | 2009-09-23 05:26:34 | NULL        | NULL       | utf8_general_ci |     NULL |                | InnoDB free: 4096 kB |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
1 row in set (0.08 sec)

mysql> system ls -l /var/lib/mysql/data_3306/dbtest
total 319820
-rw-rw---- 1 mysql mysql        61 Sep 23 04:17 db.opt
-rw-rw---- 1 mysql mysql      8646 Sep 23 05:26 t1.frm
-rw-rw---- 1 mysql mysql 327155712 Sep 23 05:30 t1.ibd
mysql>

2) mk-parallel-dump和mysqldump分别导出该表作比较
mysqldump导出表
# mkdir -p $HOME/backup
# cd $HOME/backup && rm -rf *
# time mysqldump --socket=/var/lib/mysql/data_3306/mysql.sock --opt dbtest t1 >dbback-dbtest-t1.sql

real    0m11.316s
user    0m2.348s
sys     0m0.472s
# ls -l dbback-dbtest-t1.sql
-rw-r--r-- 1 root root 179090589 Sep 23 05:31 dbback-dbtest-t1.sql

mk-parallel-dump导出表, 导出文件放在目录$HOME/backup/pdump1下, 不压缩, 不记录binlog的位置
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump1 --no-gzip --nobin-log-position --tables="dbtest.t1"
     default:              1 tables,     1 chunks,     1 successes,  0 failures,  11.40 wall-clock time,  11.31 dump time

real    0m11.608s
user    0m2.176s
sys     0m0.556s
# find pdump1 -ls
7162198    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump1
7162199    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump1/default
7162200    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:33 pdump1/default/dbtest
7162201 175076 -rw-r--r--   1 root     root     179096039 Sep 23 05:33 pdump1/default/dbtest/t1.000000.sql

mk-parallel-dump导出表, 增加每一百万条(近似值)导出一个文件
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump2 --no-gzip --nobin-log-position --tables="dbtest.t1" --chunk-size=1000000
     default:              1 tables,     4 chunks,     4 successes,  0 failures,  10.36 wall-clock time,  15.92 dump time

real    0m10.509s
user    0m2.580s
sys     0m0.560s
# find pdump2 -ls
7162202    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump2
7162203    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump2/default
7162204    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:33 pdump2/default/dbtest
7162205    4 -rw-r--r--   1 root     root          101 Sep 23 05:33 pdump2/default/dbtest/t1.chunks
7162208 58544 -rw-r--r--   1 root     root     59879876 Sep 23 05:33 pdump2/default/dbtest/t1.000001.sql
7162206   16 -rw-r--r--   1 root     root        16365 Sep 23 05:33 pdump2/default/dbtest/t1.000003.sql
7162209 58000 -rw-r--r--   1 root     root     59324125 Sep 23 05:33 pdump2/default/dbtest/t1.000000.sql
7162207 58544 -rw-r--r--   1 root     root     59880064 Sep 23 05:33 pdump2/default/dbtest/t1.000002.sql
# cat pdump2/default/dbtest/t1.chunks
`id` < 1999835
`id` >= 1999835 AND `id` < 3999669
`id` >= 3999669 AND `id` < 5999503
`id` >= 5999503
.chunks文件记录了分块规则

mk-parallel-dump导出表, 增加启动4个线程同时导出(不指定则默认为2个线程)
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump3 --no-gzip --nobin-log-position --tables="dbtest.t1" --chunk-size=1000000 --threads=4
     default:              1 tables,     4 chunks,     4 successes,  0 failures,   9.37 wall-clock time,  25.29 dump time

real    0m9.529s
user    0m2.572s
sys     0m0.516s
# find pdump3 -ls
7359077    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:34 pdump3
7359078    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:34 pdump3/default
7359079    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:34 pdump3/default/dbtest
7359080    4 -rw-r--r--   1 root     root          101 Sep 23 05:34 pdump3/default/dbtest/t1.chunks
7359084 58544 -rw-r--r--   1 root     root     59879876 Sep 23 05:34 pdump3/default/dbtest/t1.000001.sql
7359081   16 -rw-r--r--   1 root     root        16365 Sep 23 05:34 pdump3/default/dbtest/t1.000003.sql
7359083 58000 -rw-r--r--   1 root     root     59324125 Sep 23 05:34 pdump3/default/dbtest/t1.000000.sql
7359082 58544 -rw-r--r--   1 root     root     59880064 Sep 23 05:34 pdump3/default/dbtest/t1.000002.sql

导出速度差不多, 无显著差异, 因为这是个单CPU的系统, 只导出一个表, 分块和多线程可能还会带来额外的开销.
如果是多核多CPU系统导出多个表, mk-parallel-dump应该会更快些.

3) 比较mk-parallel-dump和mysqldump导入
mysql导入
# time mysql --socket=/var/lib/mysql/data_3306/mysql.sock dbtest <dbback-dbtest-t1.sql

real    3m16.760s
user    0m1.672s
sys     0m0.156s

mk-parallel-restore导入
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump1
    1 tables,     1 files,     1 successes,  0 failures, 199.75 wall-clock time, 199.75 load time

real    3m19.910s
user    0m0.232s
sys     0m0.136s

mk-parallel-restore导入多个文件
# mysql --socket=/var/lib/mysql/data_3306/mysql.sock -e "drop table dbtest.t1;"
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump2
    1 tables,     4 files,     1 successes,  0 failures, 196.55 wall-clock time, 196.54 load time

real    3m16.653s
user    0m0.268s
sys     0m0.148s

mk-parallel-restore导入多个文件, 启4个线程
# mysql --socket=/var/lib/mysql/data_3306/mysql.sock -e "drop table dbtest.t1;"
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump3 --threads=4
    1 tables,     4 files,     1 successes,  0 failures, 194.19 wall-clock time, 194.19 load time

real    3m14.606s
user    0m0.204s
sys     0m0.164s

速度也都差不多


总体感觉很一般啊, 以后再试试其它工具. 关于数据库复制, 日志/语句分析等工具可能还比较有用.



外部链接:
Tools for MySQL - Maatkit makes MySQL - easier to manage.
maatkit - A toolkit that provides advanced functionality for MySQL
mysql-parallel-dump test


-fin-

Monday, September 21, 2009

DML Error Logging

DML Error Logging

提问:一个问题, 在存储过程中,有对20万条记录进行修改的一个UPDATE,当中间某个表由于字段长度问题而导致修改失败,在异常处理中,如何定位是哪条记录问题导致失败?

这个无法显示哪行出现超出?你能定位到哪行??
SQL> CREATE OR REPLACE PROCEDURE test111 AS
  2 
  3    v_sqlcode number;
  4    v_sqlerrm varchar2(100);
  5  BEGIN
  6 
  7 
  8  update test set aa=aa||'string' ;
  9      commit;
 10  EXCEPTION
 11      when others then
 12      v_sqlcode:=Sqlcode;
 13      v_sqlerrm:=Sqlerrm;
 14       rollback;
 15  DBMS_OUTPUT.PUT_LINE('AAA' || SQLCODE || SQLERRM || 'START');
 16 
 17  END;
 18  /
 
Procedure created.
 
SQL> set serveroutput on
SQL> exec test111;
AAA-12899ORA-12899: value too large for column "HR"."TEST"."AA" (actual: 11,
maximum: 10)START

回答: 数据操纵语言错误记录(DML Error Logging)能够满足您的要求

运行SQL语句, 如果只有一条记录出错, 也会立刻终止语句运行, 回滚事务, 导致对数据的修改全部失败, 尤其是当用一条语句批量处理大量记录时这个问题更加突出
10gR2增加了DML错误记录功能, 在运行SQL语句发生某些异常时, 不会中断整个事务, 而是自动将错误信息记录到另一个指定的表, 然后继续处理

1. 举例
创建测试表
set serveroutput on size unlimited
set pages 50000 line 130
drop table t purge;
drop table err$_t purge;
create table t(a number(1) primary key, b char);

然后需要建一个记录错误信息的表, 用Oracle提供的DBMS_ERRLOG包自动创建或手工创建
exec dbms_errlog.create_error_log('t');
缺省名称是ERR$_加原表名的前25个字符
SQL> select * from tab;

TNAME                          TABTYPE  CLUSTERID
------------------------------ ------- ----------
T                              TABLE
ERR$_T                         TABLE

SQL> desc ERR$_T
 Name                                                                    Null?    Type
 ----------------------------------------------------------------------- -------- ------------------------------------------------
 ORA_ERR_NUMBER$                                                                  NUMBER
 ORA_ERR_MESG$                                                                    VARCHAR2(2000)
 ORA_ERR_ROWID$                                                                   ROWID
 ORA_ERR_OPTYP$                                                                   VARCHAR2(2)
 ORA_ERR_TAG$                                                                     VARCHAR2(2000)
 A                                                                                VARCHAR2(4000)
 B                                                                                VARCHAR2(4000)

SQL>
ORA_ERR_NUMBER$ 错误编号
ORA_ERR_MESG$ 错误信息
ORA_ERR_ROWID$ 出错行的rowid(只对update和delete)
ORA_ERR_OPTYP$ 错误类型 I:插入 U:更新 D:删除
ORA_ERR_TAG$ 由用户定义的标签
(如果采用手工创建, 必须包括上述字段)
后两个字段与原表对应, 数据类型为varchar2(4000), 用于存储出错的记录(数据类型转换见Table 15-2 Error Logging Table Column Data Types)

用原始方式插入测试数据
SQL> insert into t (a) select level from dual connect by level <= 12;
insert into t (a) select level from dual connect by level <= 12
                               *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column


字段a只有一位数字, 不能超过9, 插入10导致出错了

增加LOG ERRORS子句后
SQL> insert into t (a) select level from dual connect by level <= 12 log errors reject limit unlimited;

9 rows created.

SQL> select * from t;

         A B
---------- -
         1
         2
         3
         4
         5
         6
         7
         8
         9

9 rows selected.

SQL> col ORA_ERR_MESG$ for a50
SQL> col ORA_ERR_TAG$ for a10
SQL> col ORA_ERR_ROWID$ for a10
SQL> col A for a10
SQL> col b for a10
SQL> select * from err$_t;

ORA_ERR_NUMBER$ ORA_ERR_MESG$                                      ORA_ERR_RO OR ORA_ERR_TA A          B
--------------- -------------------------------------------------- ---------- -- ---------- ---------- ----------
           1438 ORA-01438: value larger than specified precision a            I             10
                llowed for this column

           1438 ORA-01438: value larger than specified precision a            I             11
                llowed for this column

           1438 ORA-01438: value larger than specified precision a            I             12
                llowed for this column


SQL语句运行成功并且将错误记录到了err$_t



2. 语法
Error logging的语法是:
LOG ERRORS [INTO [schema.]table]
[ (simple_expression) ]
[ REJECT LIMIT {integer|UNLIMITED} ]

INTO子句可选, 缺省表名是err$_原表名的前25个字符
simple_expression可以是一个由表达式构成字符串, 作为为标签插入到字段ORA_ERR_TAG$
REJECT LIMIT表示最多允许记录多少个错误, 超过这个范围就抛出异常. 如果是0, 表示不记录错误(见下)


3. REJECT LIMIT 子句
在10.2.0.4测试结果和10gR2文档说的有些出入

缺省情况也记录错误, 只记录一条错误
SQL> truncate table t;

Table truncated.

SQL> truncate table err$_t;

Table truncated.

SQL> insert into t (a) select level from dual connect by level <= 13 log errors;
insert into t (a) select level from dual connect by level <= 13 log errors
                               *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column


SQL> select * from err$_t;

ORA_ERR_NUMBER$ ORA_ERR_MESG$                                                          ORA_ERR_RO OR ORA_ERR_TA A
--------------- ---------------------------------------------------------------------- ---------- -- ---------- ----------
B
----------
           1438 ORA-01438: value larger than specified precision allowed for this colu            I             10
                mn



SQL>

如果指定了数量n, 会记录n+1条
SQL> truncate table t;

Table truncated.

SQL> truncate table err$_t;

Table truncated.

SQL> insert into t (a) select level from dual connect by level <= 13 log errors reject limit 2;
insert into t (a) select level from dual connect by level <= 13 log errors reject limit 2
                               *
ERROR at line 1:
ORA-01438: value larger than specified precision allowed for this column


SQL> select * from err$_t;

ORA_ERR_NUMBER$ ORA_ERR_MESG$                                                          ORA_ERR_RO OR ORA_ERR_TA A
--------------- ---------------------------------------------------------------------- ---------- -- ---------- ----------
B
----------
           1438 ORA-01438: value larger than specified precision allowed for this colu            I             10
                mn


           1438 ORA-01438: value larger than specified precision allowed for this colu            I             11
                mn


           1438 ORA-01438: value larger than specified precision allowed for this colu            I             12
                mn



SQL>

11gR1文档似乎修正了这个错误:
This subclause indicates the maximum number of errors that can be encountered before the INSERT statement terminates and rolls back. You can also specify UNLIMITED. The default reject limit is zero, which means that upon encountering the first error, the error is logged and the statement rolls back. For parallel DML operations, the reject limit is applied to each parallel server.

11gR2文档:
If REJECT LIMIT X had been specified, the statement would have failed with the error message of error X=1. The error message can be different for different reject limits. In the case of a failing statement, only the DML statement is rolled back, not the insertion into the DML error logging table. The error logging table will contain X+1 rows.


4. 错误记录表
错误记录表不会自动清除, 以自治事务运行. 如超过REJECT LIMIT限制, DML语句回滚, 错误记录表不回滚
错误记录表所属的用户和运行DML语句的用户可以不相同, 运行语句的用户对错误记录表必须具有插入权限


5. 使用限制
对以下情况记录错误:
  • 列值太大
  • 违反非空,唯一,引用(referential),或检查(check)约束条件
  • 由触发器抛出的异常
  • 数据类型转换错误
  • 分区映射错误
  • 某些merge操作错误(如 ORA-30926: Unable to get a stable set of rows for MERGE operation.)

下述情况不记录错误:
  • 违反了延迟的(deferred)约束条件
  • 空间不够
  • 直接路径插入操作(insert或merge)抛出的唯一约束或唯一索引错误
  • 更新操作(update或merge)抛出的唯一约束或唯一索引错误




外部链接:
DML Error Logging
Error Logging and Handling Mechanisms
Inserting Data with DML Error Logging
38 DBMS_ERRLOG

Faster Batch Processing By Mark Rittman
10gR2 New Feature: DML Error Logging
DML Error Logging in Oracle 10g Database Release 2
Oracle DBMS_ERRLOG
Oracle DML Error Logging
dml error logging in oracle 10g release 2



-fin-
Website Analytics

Followers