Locations of visitors to this page

Wednesday, September 23, 2009

introducing maatkit - parallel dump

introducing maatkit - parallel dump
maatkit介绍 - 并行导出

Maatkit是一组为MySQL提供的命令行工具集, 是由Percona公司开发的开源软件
该公司同时还开发了XtraDB存储引擎, XtraBackup热备工具, MySQL增强补丁版本Percona

包括:
  • mk-archiver 将表数据清除或归档到另外的表或文件
  • mk-audit 分析MySQL的配置,概要,操作, 生成报表
  • mk-checksum-filter mk-table-checksum的过滤器
  • mk-deadlock-logger 记录InnoDB的死锁信息
  • mk-duplicate-key-checker 查找重复或冗余的外键和索引
  • mk-fifo-split 将一个文件拆分为多个部分, 输出到FIFO管道(有用吗?)
  • mk-find 按指定规则查找表名, 然后执行操作
  • mk-heartbeat 监视数据库之间复制的延迟
  • mk-log-player 拆分并重演慢速查询日志
  • mk-parallel-dump 多线程导出
  • mk-parallel-restore 多线程导入
  • mk-profile-compact 压缩mk-query-profiler输出
  • mk-query-digest 分析日志
  • mk-query-profiler 查询性能分析工具
  • mk-show-grants 显示用户权限
  • mk-slave-delay 实现备库与主库之间一定的延时
  • mk-slave-find 查找/显示出备库的树型层次结构
  • mk-slave-move 在层次结构中移动备库(什么玩意?)
  • mk-slave-prefetch 在备库上运行SELECT查询语句, 使数据预读取到内存中
  • mk-slave-restart 监测备库发生的错误并重启
  • mk-table-checksum 快速检测两个表的数据是否相同. 可以用来检测备库和主库的数据一致性
  • mk-table-sync 发现并修复不同服务器上的两个表之间的数据差异
  • mk-upgrade 比较2个数据库中语句的运行结果
  • mk-visual-explain 以树形显示执行计划



1. 安装
http://maatkit.googlecode.com/处下载RPM包进行安装
yum -y install perl-TermReadKey.x86_64
rpm -Uvh http://maatkit.googlecode.com/files/maatkit-4623-1.noarch.rpm

依赖以下安装包
# rpm -q --requires maatkit
/usr/bin/env
perl(DBD::mysql) >= 1.0
perl(DBI)
perl(DBI) >= 1.13
perl(Data::Dumper)
perl(Digest::MD5)
perl(English)
perl(Exporter)
perl(File::Basename)
perl(File::Find)
perl(File::Spec)
perl(File::Temp)
perl(Getopt::Long)
perl(IO::File)
perl(List::Util)
perl(POSIX)
perl(Socket)
perl(Term::ReadKey) >= 2.10
perl(Time::HiRes)
perl(Time::Local)
perl(constant)
perl(sigtrap)
perl(strict)
perl(warnings)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1


2. 并行导出mk-parallel-dump和导入mk-parallel-restore
使用方法见:
mk-parallel-dump - Dump sets of MySQL tables in parallel.
mk-parallel-restore - Load files into MySQL in parallel.

1) 生成测试表
生成一个3百万行记录的测试表
mysql --socket=/var/lib/mysql/data_3306/mysql.sock

set autocommit=0;
drop database if exists dbtest;
create database dbtest;
use dbtest;
drop table if exists t1;
create table t1 (
  id int(9) not null auto_increment,
  name varchar(20) not null,
  age int(3) not null,
  notes varchar(100),
  primary key (id),
  index ind_t1_name (name)
);
truncate table t1;
insert into t1 (name, age, notes)
select conv(floor(rand() * 99999999999999), 10, 36), floor(1+rand()*(100-1)), md5(rand())
  from information_schema.COLUMNS a
       , information_schema.COLUMNS b
       , information_schema.COLUMNS c
 limit 3000000;
commit;
大小300多M
mysql> insert into t1 (name, age, notes)
    -> select conv(floor(rand() * 99999999999999), 10, 36), floor(1+rand()*(100-1)), md5(rand())
    ->   from information_schema.COLUMNS a
    ->        , information_schema.COLUMNS b
    ->        , information_schema.COLUMNS c
    ->  limit 3000000;
Query OK, 3000000 rows affected (3 min 30.61 sec)
Records: 3000000  Duplicates: 0  Warnings: 0

mysql> commit;
Query OK, 0 rows affected (0.07 sec)
mysql> show table status like 't1';
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
| Name | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment              |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
| t1   | InnoDB |      10 | Compact    | 3000249 |             76 |   228294656 |               0 |     86654976 |         0 |        6000000 | 2009-09-23 05:26:34 | NULL        | NULL       | utf8_general_ci |     NULL |                | InnoDB free: 4096 kB |
+------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+----------------------+
1 row in set (0.08 sec)

mysql> system ls -l /var/lib/mysql/data_3306/dbtest
total 319820
-rw-rw---- 1 mysql mysql        61 Sep 23 04:17 db.opt
-rw-rw---- 1 mysql mysql      8646 Sep 23 05:26 t1.frm
-rw-rw---- 1 mysql mysql 327155712 Sep 23 05:30 t1.ibd
mysql>

2) mk-parallel-dump和mysqldump分别导出该表作比较
mysqldump导出表
# mkdir -p $HOME/backup
# cd $HOME/backup && rm -rf *
# time mysqldump --socket=/var/lib/mysql/data_3306/mysql.sock --opt dbtest t1 >dbback-dbtest-t1.sql

real    0m11.316s
user    0m2.348s
sys     0m0.472s
# ls -l dbback-dbtest-t1.sql
-rw-r--r-- 1 root root 179090589 Sep 23 05:31 dbback-dbtest-t1.sql

mk-parallel-dump导出表, 导出文件放在目录$HOME/backup/pdump1下, 不压缩, 不记录binlog的位置
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump1 --no-gzip --nobin-log-position --tables="dbtest.t1"
     default:              1 tables,     1 chunks,     1 successes,  0 failures,  11.40 wall-clock time,  11.31 dump time

real    0m11.608s
user    0m2.176s
sys     0m0.556s
# find pdump1 -ls
7162198    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump1
7162199    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump1/default
7162200    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:33 pdump1/default/dbtest
7162201 175076 -rw-r--r--   1 root     root     179096039 Sep 23 05:33 pdump1/default/dbtest/t1.000000.sql

mk-parallel-dump导出表, 增加每一百万条(近似值)导出一个文件
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump2 --no-gzip --nobin-log-position --tables="dbtest.t1" --chunk-size=1000000
     default:              1 tables,     4 chunks,     4 successes,  0 failures,  10.36 wall-clock time,  15.92 dump time

real    0m10.509s
user    0m2.580s
sys     0m0.560s
# find pdump2 -ls
7162202    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump2
7162203    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:33 pdump2/default
7162204    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:33 pdump2/default/dbtest
7162205    4 -rw-r--r--   1 root     root          101 Sep 23 05:33 pdump2/default/dbtest/t1.chunks
7162208 58544 -rw-r--r--   1 root     root     59879876 Sep 23 05:33 pdump2/default/dbtest/t1.000001.sql
7162206   16 -rw-r--r--   1 root     root        16365 Sep 23 05:33 pdump2/default/dbtest/t1.000003.sql
7162209 58000 -rw-r--r--   1 root     root     59324125 Sep 23 05:33 pdump2/default/dbtest/t1.000000.sql
7162207 58544 -rw-r--r--   1 root     root     59880064 Sep 23 05:33 pdump2/default/dbtest/t1.000002.sql
# cat pdump2/default/dbtest/t1.chunks
`id` < 1999835
`id` >= 1999835 AND `id` < 3999669
`id` >= 3999669 AND `id` < 5999503
`id` >= 5999503
.chunks文件记录了分块规则

mk-parallel-dump导出表, 增加启动4个线程同时导出(不指定则默认为2个线程)
# time mk-parallel-dump --socket=/var/lib/mysql/data_3306/mysql.sock --base-dir=$HOME/backup/pdump3 --no-gzip --nobin-log-position --tables="dbtest.t1" --chunk-size=1000000 --threads=4
     default:              1 tables,     4 chunks,     4 successes,  0 failures,   9.37 wall-clock time,  25.29 dump time

real    0m9.529s
user    0m2.572s
sys     0m0.516s
# find pdump3 -ls
7359077    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:34 pdump3
7359078    4 drwxr-xr-x   3 root     root         4096 Sep 23 05:34 pdump3/default
7359079    4 drwxr-xr-x   2 root     root         4096 Sep 23 05:34 pdump3/default/dbtest
7359080    4 -rw-r--r--   1 root     root          101 Sep 23 05:34 pdump3/default/dbtest/t1.chunks
7359084 58544 -rw-r--r--   1 root     root     59879876 Sep 23 05:34 pdump3/default/dbtest/t1.000001.sql
7359081   16 -rw-r--r--   1 root     root        16365 Sep 23 05:34 pdump3/default/dbtest/t1.000003.sql
7359083 58000 -rw-r--r--   1 root     root     59324125 Sep 23 05:34 pdump3/default/dbtest/t1.000000.sql
7359082 58544 -rw-r--r--   1 root     root     59880064 Sep 23 05:34 pdump3/default/dbtest/t1.000002.sql

导出速度差不多, 无显著差异, 因为这是个单CPU的系统, 只导出一个表, 分块和多线程可能还会带来额外的开销.
如果是多核多CPU系统导出多个表, mk-parallel-dump应该会更快些.

3) 比较mk-parallel-dump和mysqldump导入
mysql导入
# time mysql --socket=/var/lib/mysql/data_3306/mysql.sock dbtest <dbback-dbtest-t1.sql

real    3m16.760s
user    0m1.672s
sys     0m0.156s

mk-parallel-restore导入
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump1
    1 tables,     1 files,     1 successes,  0 failures, 199.75 wall-clock time, 199.75 load time

real    3m19.910s
user    0m0.232s
sys     0m0.136s

mk-parallel-restore导入多个文件
# mysql --socket=/var/lib/mysql/data_3306/mysql.sock -e "drop table dbtest.t1;"
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump2
    1 tables,     4 files,     1 successes,  0 failures, 196.55 wall-clock time, 196.54 load time

real    3m16.653s
user    0m0.268s
sys     0m0.148s

mk-parallel-restore导入多个文件, 启4个线程
# mysql --socket=/var/lib/mysql/data_3306/mysql.sock -e "drop table dbtest.t1;"
# time mk-parallel-restore --socket=/var/lib/mysql/data_3306/mysql.sock $HOME/backup/pdump3 --threads=4
    1 tables,     4 files,     1 successes,  0 failures, 194.19 wall-clock time, 194.19 load time

real    3m14.606s
user    0m0.204s
sys     0m0.164s

速度也都差不多


总体感觉很一般啊, 以后再试试其它工具. 关于数据库复制, 日志/语句分析等工具可能还比较有用.



外部链接:
Tools for MySQL - Maatkit makes MySQL - easier to manage.
maatkit - A toolkit that provides advanced functionality for MySQL
mysql-parallel-dump test


-fin-

No comments:

Website Analytics

Followers