谢太数据库: InnoDB Index Structure

InnoDB Index Structure
InnoDB索引结构

题问:
查看下面查询语句, 字段b是数字类型, 建立了索引, 查询条件是LIKE, 为何使用了索引全扫描(type=index)?

mysql> desc t1;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a     | int(11) | NO   | PRI | 0       |       |
| b     | int(11) | YES  | MUL | NULL    |       |
| c     | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
3 rows in set (0.01 sec)

mysql> explain select a, b from t1 where b like '50';
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t1    | index | b             | b    | 5       | NULL | 4866 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql>

回答:
这条查询语句用到了覆盖索引(covering index), 跟InnoDB索引的结构有关.

1. 存储结构
InnoDB存储引擎中, 每一个表具有一个特殊的索引,叫聚集索引(clustered index, 又叫primary index), 行数据存储在该索引节点上, 这相当于Oracle的索引组织表的结构.
聚集索引建立的规则如下:
1)如果有主键, 那么选择该主键
2)如果没有主键, 有非空列的唯一索引, 则选择第一个非空唯一索引
3)否则, InnoDB存储引擎内部为每一条记录分配一个6字节的唯一标识(row id)
以此作为行的唯一标识创建聚集索引, 每条记录的数据存储在聚集索引的叶子节点上, 所以InnoDB聚集索引就是表本身.

其它索引称为二级索引(Secondary index)
二级索引的节点不仅保存索引键值(索引字段的值), 还保存了对应行的聚集索引(primary key)的键值. 此值起到指针的作用, 查询通过它去访问聚集索引, 扫描主键(primary key)从而得到行数据.(Oracle IOT的二级索引保存的是逻辑行标识符[logical rowid]). 这意味着, 通过二级索引查询要扫描两个索引, 二级索引和聚集索引

如图所示:

.
  primary key
         o  <-------
         |          |
      -------       |
     |       |      |
     o       o      |
     |       |      |
    ---     ---     |
   |   |   |   |    |
   o   o   o   o    |
  k+r k+r k+r k+r   |
                    |
  secondary key     |
         o          |
         |          |
      -------       |
     |       |      |
     o       o      |
     |       |      |
    ---     ---     |
   |   |   |   |    |
   o   o   o   o    |
  k+p k+p k+p k+p   |
                |   |
                 ---
.
primary key: 即primary index, 即聚集索引
secondary key: 即secondary index
k+r: key value + row data, 即索引键值和行数据
k+p: key value + primary key values, 即索引键值和主键键值

2. 选择主键
由于InnoDB索引存储结构的特殊性, 因此在设计表时对主键的选择显得格外重要, 主要原则有:
1) 显式指定主键, 不要让系统自动创建
如果不指定主键, 系统会自动创建一个自增长类型的隐含的字段(row id)作为主键, 长6字节
演示如下:
建一个表, 不带主键, 查看事务锁的状态

drop table if exists test;
create table test(a int, b int, c int) engine=innodb;
insert into test values(1, 16, 17), (2, 32, 34);
-- connection 1:
set transaction isolation level serializable;
start transaction;
select * from test;
-- connection 2:
set transaction isolation level serializable;
start transaction;
update test set a = 9;
-- connection 3:
SHOW ENGINE INNODB STATUS\G

------------
TRANSACTIONS
------------
Trx id counter 0 5673
Purge done for trx's n:o < 0 5672 undo n:o < 0 0
History list length 24
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 4304, OS thread id 1162881344
MySQL thread id 20, query id 7423 localhost root
SHOW ENGINE INNODB STATUS
---TRANSACTION 0 5672, ACTIVE 3 sec, process no 4304, OS thread id 1162615104 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 19, query id 7422 localhost root Updating
update test set a = 9
------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 35 page no 3 n bits 72 index `GEN_CLUST_INDEX` of table `motoaccounts0/test` trx id 0 5672 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
 0: len 6; hex 000000002604; asc     & ;; 1: len 6; hex 000000001625; asc      %;; 2: len 7; hex 80000000340110; asc     4  ;; 3: len 4; hex 80000001; asc     ;; 4: len 4; hex 80000010; asc     ;; 5: len 4; hex 80000011; asc     ;;

------------------
---TRANSACTION 0 5670, ACTIVE 8 sec, process no 4304, OS thread id 1162348864
2 lock struct(s), heap size 368
MySQL thread id 18, query id 7419 localhost root

系统创建的索引的名字是GEN_CLUST_INDEX
索引结构中的字段个数(n_fields)为6, 第0个的长度是6字节(0: len 6;), 就是系统自动创建的隐含rowid; 后面两个1和2是系统内部使用的; 接下来第3,4,5代表表中的实际的字段

表如果带主键, 则是另一种情况

drop table if exists test;
create table test(a int primary key, b int, c int) engine=innodb;
insert into test values(1, 16, 17), (2, 32, 34);
-- connection 1:
set transaction isolation level serializable;
start transaction;
select * from test;
-- connection 2:
set transaction isolation level serializable;
start transaction;
update test set a = 9;
-- connection 3:
SHOW ENGINE INNODB STATUS\G

------------
TRANSACTIONS
------------
Trx id counter 0 5682
Purge done for trx's n:o < 0 5681 undo n:o < 0 0
History list length 27
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 0, not started, process no 4304, OS thread id 1162881344
MySQL thread id 20, query id 7434 localhost root
SHOW ENGINE INNODB STATUS
---TRANSACTION 0 5681, ACTIVE 2 sec, process no 4304, OS thread id 1162615104 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 19, query id 7433 localhost root Searching rows for update
update test set a = 9
------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 36 page no 3 n bits 72 index `PRIMARY` of table `motoaccounts0/test` trx id 0 5681 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 00000000162e; asc      .;; 2: len 7; hex 800000002d0110; asc     -  ;; 3: len 4; hex 80000010; asc     ;; 4: len 4; hex 80000011; asc     ;;

------------------
---TRANSACTION 0 5679, ACTIVE 6 sec, process no 4304, OS thread id 1162348864
2 lock struct(s), heap size 368
MySQL thread id 18, query id 7430 localhost root

索引变成PRIMARY. n_fields值减少了一个, 变为5(n_fields 5;). 第0个是主键字段(0: len 4; hex 80000001; asc ;;).

2) 因为数据是按主键顺序存储的, 如果插入记录, 主键的值是随机的, 为了保证索引树的平衡, 聚集索引将重组, 这会产生较多写操作, 并导致产生索引碎片. 所以记录应该按主键升序顺序插入, 可以选择自增长类型字段作为主键.

3) 主键数据值应该尽可能小. 因为在所有索引中都包含了主键键值, 如果主键很大, 其它二级索引也会很大, 占用更多的空间. 如果自然主键很大, 可增加一个自增长列作主键.

4) 对主键字段更新操作也会引起聚集索引重组, 导致索引节点上的行数据的移动, 因此应尽量避免主键更新.

3. 问题解答

select a, b from t1 where b like '50';

语句查询了a,b两个字段, 字段b上有二级索引, 因为二级索引节点内包含了主键, 从二级索引中可以得到主键的值, 所以不用去查询表, 只要查询此二级索引一次就可以得到a,b字段的值. 执行计划的extra信息显示出"Using index", 表明可以从索引中取得所有要查询的列, 这种方式被称为索引覆盖(index covering).

外部链接:

13.6.10. InnoDB Table and Index Structures
InnoDB Internals: InnoDB File Formats and Source Code Structure
High performance MySQL By Baron Schwartz, Peter Zaitsev, Vadim Tkachenko, Jeremy D. Zawodny, Arjen Lentz, Derek Balling

How to exploit MySQL index optimizations
Everything you need to know about designing MySQL InnoDB primary keys
Index Layouts - Pro MySQL By Michael Kruckenberg, Jay Pipes
Innodb Performance Optimization

-fin-

谢太数据库

谢太的最爱

Blog Archive

Sunday, August 23, 2009

InnoDB Index Structure

No comments:

Followers