MySQL SQL优化案例：相关子查询(dependent subquery)优化-蒲公英云

原始语句：

SELECT
  t1.*
FROM
  t_payment_bank_account_info t1
WHERE
  EXISTS (
    SELECT
      1
    FROM
      t_payment_account_dtl t2
    WHERE
      t1.account_no = t2.account_no
    AND t2.parent_account_no = '7311810182600115231'
    AND t2.txn_Date >= '2015-12-23'
    AND t2.account_no != t2.opp_acc_no
  );

执行计划

+----+--------------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------------+-------+------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY            | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4552 | Using where |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | NULL          | NULL | NULL    | NULL | 7924 | Using where |
+----+--------------------+-------+------+---------------+------+---------+------+------+-------------+

语句在hotfix环境运行时间：14 rows in set (27.98 sec)

第一个问题：select * 语句在生产环境严格禁止，需明确指明查询字段。

第二个问题：相关子查询，尤其是使用不到索引时效率或非常低，可改写成join方式。

select t1.*
from t_payment_bank_account_info t1
join t_payment_account_dtl t2
using(account_no)
where t2.parent_account_no = '7311810182600115231'
    AND t2.txn_Date >= '2015-12-23'
    AND t2.account_no != t2.opp_acc_no
group by 需要查询的字段;

（因为join时内表中的一条记录可能跟外表中的多条记录匹配，所以最终会比使用相关子查询的方式多出一些重复的记录结果，故使用group by去重复，当然也可以使用distinct关键字，两者原理相同。如果重复值对于最终需求并没有什么影响则可以移除该从句以避免分组、排序造成的临时表和文件排序等额外开销，提高查询效率）

执行计划：

+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 4552 | Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL | 7924 | Using where; Using join buffer  |
+----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+

语句在hotfix环境运行时间：14 rows in set (2.67 sec)

第三个问题：到这里优化还没有结束，看到上述执行计划中有Using join buffer 出现，这是MySQL内部的一个优化，可大大减少join时的IO开销。但如果能在join字段上添加适当索引的话，性能还能更加显著的提升。

那么是要在t_payment_account_dtl表还是在t_payment_bank_account_info表的account_no字段添加索引呢？

可以看一下每个表中account_No字段的筛选度：

mysql> select count(*) from t_payment_account_dtl;
+----------+
| count(*) |
+----------+
|     7594 |
+----------+
1 row in set (0.04 sec)
mysql> select count(distinct(account_No)) from t_payment_account_dtl;
+-----------------------------+
| count(distinct(account_No)) |
+-----------------------------+
|                          75 |
+-----------------------------+
1 row in set (0.00 sec)
mysql> select count(distinct(account_No)) from t_payment_bank_account_info\G
*************************** 1. row ***************************
count(distinct(account_No)): 4753
1 row in set (0.00 sec)
mysql> select count(*) from t_payment_bank_account_info\G
*************************** 1. row ***************************
count(*): 4789
1 row in set (0.01 sec)

发现t_payment_bank_account_info表account_no字段筛选度较高，那么在该表添加索引。((inner)join时MySQL会自动根据索引情况选择哪个表做内部表那个表做外部表)

mysql> alter table t_payment_bank_account_info add index idx_account_no(account_no);
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

在看执行计划

mysql> desc select t1.*  from t_payment_bank_account_info t1 join t_payment_account_dtl t2 using(account_no) where t2.parent_account_no = '7311810182600115231'     AND t2.txn_Date >= '2015-12-23'     AND t2.account_no != t2.opp_acc_no   group by 需要查询的字段;
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys  | key            | key_len | ref                       | rows | Extra                                        |
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | t2    | ALL  | NULL           | NULL           | NULL    | NULL                      | 7924 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t1    | ref  | idx_account_no | idx_account_no | 99      | dcf_payment.t2.account_No |   22 | Using where                                  |
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

发现可以使用刚才创建的索引，hotfix执行时间只需14 rows in set (0.01 sec)

这里MySQL选择t2做内部表t1做外部表，join阶段对于t2中的每条记录依次从t1的索引中进行查找。

在上边统计t1, t2行数的时候发现t2记录数是t1的近两倍，若果两表account_no字段上都有索引，那么使用记录数较少的表t1做内部表性能可能会更好。

我们尝试一下

mysql> alter table t_payment_account_dtl  add index idx_account_no(account_no);
mysql> desc select t1.*  from t_payment_bank_account_info t1 join t_payment_account_dtl t2 using(account_no) where t2.parent_account_no = '7311810182600115231'     AND t2.txn_Date >= '2015-12-23'     AND t2.account_no != t2.opp_acc_no   group by 需要查询的字段;
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys  | key            | key_len | ref                       | rows | Extra                                        |
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | t2    | ALL  | NULL           | NULL           | NULL    | NULL                      | 7924 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | t1    | ref  | idx_account_no | idx_account_no | 99      | dcf_payment.t2.account_No |   22 | Using where                                  |
+----+-------------+-------+------+----------------+----------------+---------+---------------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)