SQL優化教程之in與range查詢

阿新 • • 發佈：2020-12-05

前言

《高效能MySQL》裡面提及用in這種方式可以有效的替代一定的range查詢，提升查詢效率，因為在一條索引裡面，range欄位後面的部分是不生效的（ps.需要考慮 ICP）。MySQL優化器將in這種方式轉化成 n*m 種組合進行查詢，最終將返回值合併，有點類似union但是更高效。

MySQL在 IN() 組合條件過多的時候會發生很多問題。查詢優化可能需要花很多時間，並消耗大量記憶體。新版本MySQL在組合數超過一定的數量就不進行計劃評估了，這可能導致MySQL不能很好的利用索引。

這裡的一定數在MySQL5.6.5以及以後的版本中是由eq_range_index_dive_limit這個引數控制。預設設定是10，一直到5.7以後的版本預設修改為200，當然可以手動設定的。5.6手冊說明如下：

The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To disable use of statistics and always use index dives,set eq_range_index_dive_limit to 0. To permit use of index dives for comparisons of up to N equality ranges,set eq_range_index_dive_limit to N + 1. eq_range_index_dive_limit is available as of MySQL 5.6.5. Before 5.6.5,the optimizer uses index dives,which is equivalent to eq_range_index_dive_limit=0.

換言之，

eq_range_index_dive_limit = 0 只能使用index dive

0 < eq_range_index_dive_limit <= N 使用index statistics

eq_range_index_dive_limit > N 只能使用index dive

在MySQL5.7版本中將預設值從10修改成200目的是為了儘可能的保證範圍等值運算（IN()）執行計劃儘量精準，因為IN()list的數量很多時候都是超過10的。

在MySQL的官方手冊上有這麼一句話:

the optimizer can estimate the row count for each range using dives into the index or index statistics.

大意:

優化器預估每個範圍段－－如"a IN (10,20,30)" 視為等值比較,括3個範圍段實則簡化為3個單值,分別是10,30－－中包括的元組數,用範圍段來表示是因為 MySQL 的"range"掃描方式多數做的是範圍掃描，此處單值可視為範圍段的特例;

估計方法有2種:

dive到index中即利用索引完成元組數的估算,簡稱index dive;
index statistics:使用索引的統計數值,進行估算;

對比這兩種方式

index dive: 速度慢,但能得到精確的值（MySQL的實現是數索引對應的索引項個數，所以精確）
index statistics: 速度快,但得到的值未必精確

簡單說，**選項 eq_range_index_dive_limit 的值設定了 IN列表中的條件個數上線，超過設定值時，會將執行計劃從 index dive 變成 index statistics **。

為什麼要區分這2種方式呢?

查詢優化器會使用代價估算模型計算每個計劃的代價,選擇其中代價最小的
單表掃描時,需要計算代價;所以單表的索引掃描也需要計算代價
單表的計算公式通常是: 代價 = 元組數 * IO平均值
所以不管是哪種掃描方式,都需要計算元組數
當遇到“a IN (10,30)”這樣的表示式的時候，發現a列存在索引，則需要看這個索引可以掃描到的元組數由多少而計算其索引掃描代價，所以就用到了本文提到的“index dive”、“index statistics”這2種方式。

討論主題

range查詢與索引使用
eq_range_index_dive_limit的說明

range查詢與索引使用

SQL如下：

SELECT * FROM pre_forum_post WHERE tid=7932552 AND invisible IN('0','-2') ORDER BY dateline DESC LIMIT 10;

索引如下：

PRIMARY(tid,position),pid(pid),fid(tid),displayorder(tid,invisible,dateline)
first(tid,first)
new_auth(authorid,tid)
idx_dt(dateline)
mul_test(tid,dateline,pid)

看下執行計劃：

root@localhost 16:08:27 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') 
 -> ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys  | key | key_len | ref | rows | Extra   |
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | pre_forum_post | range | PRIMARY,displayorder,first,mul_test,idx_1 | displayorder | 4 | NULL | 54 | Using index condition; Using filesort | 
+----+-------------+----------------+-------+-------------------------------------------+--------------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

MySQL優化器認為這是一個range查詢，那麼(tid,dateline)這條索引中，dateline欄位肯定用不上了，也就是說這個SQL最後的排序肯定會生成一個臨時結果集，然後再結果集裡面完成排序，而不是直接在索引中直接完成排序動作，於是我們嘗試增加了一條索引。

root@localhost 16:09:06 [ultrax]> alter table pre_forum_post add index idx_1 (tid,dateline); 
Query OK,20374596 rows affected,0 warning (600.23 sec)
Records: 0 Duplicates: 0 Warnings: 0
root@localhost 16:20:22 [ultrax]> explain SELECT * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+
| 1 | SIMPLE | pre_forum_post | ref | idx_1 | idx_1 | 3 | const | 120646 | Using where | 
+----+-------------+----------------+------+---------------+-------+---------+-------+--------+-------------+
1 row in set (0.00 sec)
root@localhost 16:22:06 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
...
10 rows in set (0.40 sec)
root@localhost 16:23:55 [ultrax]> SELECT sql_no_cache * FROM pre_forum_post force index (idx_1) WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
...
10 rows in set (0.00 sec)

實驗證明效果是極好的，其實不難理解，上面我們就說了in()在MySQL優化器裡面是以多種組合方式來檢索資料的，如果加了一個排序或者分組那勢必只能在臨時結果集上操作，也就是說索引裡面即使包含了排序或者分組的欄位依然是沒用的。唯一不滿的是MySQL優化器的選擇依然不夠靠譜。

總結下：在MySQL查詢裡面使用in()，除了要注意in()list的數量以及eq_range_index_dive_limit的值以外（具體見下），還要注意如果SQL包含排序/分組/去重等等就需要注意索引的使用。

eq_range_index_dive_limit的說明

還是上面的案例，為什麼idx_1無法直接使用？需要使用hint強制只用這個索引呢？這裡我們首先看下eq_range_index_dive_limit的值。

root@localhost 22:38:05 [ultrax]> show variables like 'eq_range_index_dive_limit';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| eq_range_index_dive_limit | 2 | 
+---------------------------+-------+
1 row in set (0.00 sec)

根據我們上面說的這種情況0 < eq_range_index_dive_limit <= N使用index statistics，那麼接下來我們用OPTIMIZER_TRACE來一看究竟。

{
 "index": "displayorder","ranges": [
 "7932552 <= tid <= 7932552 AND -2 <= invisible <= -2","7932552 <= tid <= 7932552 AND 0 <= invisible <= 0"
 ],"index_dives_for_eq_ranges": false,"rowid_ordered": false,"using_mrr": false,"index_only": false,"rows": 54,"cost": 66.81,"chosen": true
}
// index dive為false，最終chosen是true
...
{
 "index": "idx_1","ranges": [
 "7932552 <= tid <= 7932552"
 ],"index_dives_for_eq_ranges": true,"rows": 120646,"cost": 144776,"chosen": false,"cause": "cost"
}

我們可以看到displayorder索引的cost是66.81，而idx_1的cost是120646，而最終MySQL優化器選擇了displayorder這條索引。那麼如果我們把eq_range_index_dive_limit設定>N是不是應該就會使用index dive計算方式，得到更準確的執行計劃呢？

root@localhost 22:52:52 [ultrax]> set eq_range_index_dive_limit = 3;
Query OK,0 rows affected (0.00 sec)
root@localhost 22:55:38 [ultrax]> explain SELECT * FROM pre_forum_post WHERE tid=7932552 AND `invisible` IN('0','-2') ORDER BY dateline DESC LIMIT 10;
+----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys  | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+
| 1 | SIMPLE | pre_forum_post | ref | PRIMARY,idx_1 | idx_1 | 3 | const | 120646 | Using where | 
+----+-------------+----------------+------+-------------------------------------------+-------+---------+-------+--------+-------------+
1 row in set (0.00 sec)

optimize_trace結果如下

{
 "index": "displayorder","rows": 188193,"cost": 225834,"chosen": true
}
...
{
 "index": "idx_1","chosen": true
}
...
 "cost_for_plan": 144775,"rows_for_plan": 120646,"chosen": true

在備選索引選擇中兩條索引都被選擇，在最後的邏輯優化中選在了代價最小的索引也就是idx_1 以上就是在等值範圍查詢中eq_range_index_dive_limit的值怎麼影響MySQL優化器計算開銷，從而影響索引的選擇。另外我們可以通過profiling來看看優化器的統計耗時：

index dive

+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000048 | 
| checking permissions | 0.000004 | 
| Opening tables | 0.000015 | 
| init  | 0.000044 | 
| System lock | 0.000009 | 
| optimizing | 0.000014 | 
| statistics | 0.032089 | 
| preparing | 0.000022 | 
| Sorting result | 0.000003 | 
| executing | 0.000003 | 
| Sending data | 0.000101 | 
| end  | 0.000004 | 
| query end | 0.000002 | 
| closing tables | 0.000009 | 
| freeing items | 0.000013 | 
| cleaning up | 0.000012 | 
+----------------------+----------+

index statistics

+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000045 | 
| checking permissions | 0.000003 | 
| Opening tables | 0.000014 | 
| init  | 0.000040 | 
| System lock | 0.000008 | 
| optimizing | 0.000014 | 
| statistics | 0.000086 | 
| preparing | 0.000016 | 
| Sorting result | 0.000002 | 
| executing | 0.000002 | 
| Sending data | 0.000016 | 
| Creating sort index | 0.412123 | 
| end  | 0.000012 | 
| query end | 0.000004 | 
| closing tables | 0.000013 | 
| freeing items | 0.000023 | 
| cleaning up | 0.000015 | 
+----------------------+----------+

可以看到當eq_range_index_dive_limit加大使用index dive時，優化器統計耗時明顯比ndex statistics方式來的長，但最終它使用了作出了更合理的執行計劃。統計耗時0.032089s vs .000086s，但是SQL執行耗時卻是約0.03s vs 0.41s。

附：

如何使用optimize_trace

set optimizer_trace='enabled=on';

select * from information_schema.optimizer_trace\G

注：optimizer_trace建議只在session模式下開啟除錯即可

如何使用profile

set profiling=ON;
執行sql;
show profiles;
show profile for query 2;
show profile block io,cpu for query 2;

另外還可以看到memory,swaps,context switches,source 等資訊

參考資料

[1]MySQL SQL優化系列之 in與range 查詢

https://www.jb51.net/article/201251.htm

[2]MySQL物理查詢優化技術---index dive辨析

http://blog.163.com/li_hx/blog/static/18399141320147521735442/

到此這篇關於SQL優化教程之in與range查詢的文章就介紹到這了,更多相關SQL優化之in與range查詢內容請搜尋我們以前的文章或繼續瀏覽下面的相關文章希望大家以後多多支援我們！

SQL優化教程之in與range查詢

SQL優化教程之in與range查詢

MySQL SQL優化教程之in和range查詢

MySQL優化教程之超大分頁查詢

Influx Sql系列教程八：query資料查詢基本篇

Influx Sql系列教程九：query資料查詢基本篇二

SQL基礎教程之行轉列Pivot函式

PHP優化教程之解決巢狀問題

Kernel pwn 基礎教程之 ret2usr 與 bypass_smep

Django ORM之F與Q查詢

《MySQL慢查詢優化》之慢SQL日誌獲取與分析

《MySQL慢查詢優化》之SQL語句及索引優化

SQL Prompt教程：子查詢使用[NOT] EXISTS代替[NOT] IN（PE019）

QueryWrapper.in幾種查詢方式QueryWrapper.in查詢不準優化結果，如何使用QueryWrapper.in的SQL查詢幾種方式講解

【MySQL學習筆記（十二）】之查詢優化器基於規則的優化與子查詢優化

【MySQL學習筆記（十一）】之基於成本的單表查詢優化計算與連線查詢的優化計算

Mysql優化技巧之Limit查詢的優化分析

mysql基礎架構教程之查詢語句執行的流程詳解

Oracle 12CR2查詢轉換教程之表擴充套件詳解

Oracle 12CR2查詢轉換教程之cursor-duration臨時表詳解

Oracle 12CR2查詢轉換教程之臨時錶轉換詳解

SQL優化教程之in與range查詢

相關推薦