Mongo的SQL语句如何通过explain来优化-蒲公英云

Author:skatexg

Time:2021/08/12

mongo的SQL语句为什么慢？如何优化慢sql？（如何定位慢sql不在本文讨论范围内）

这个时候就需要根据SQL语句的执行计划来诊断了

一. Mongo的SQL执行计划介绍

Mongo的explain的常用两种模式如下：

explain(“queryPlanner”)

explain(“executionStats”)

1.1 queryPlanner的解释：

查看SQL语句执行计划：
db.docl2.find({“by”:”sdsd”}).explain(“queryPlanner”) //这个语句不是真正执行SQL语句,是explain的默认参数

queryPlanner的解释：

{
“queryPlanner”:{
“plannerVersion”:1,
“namespace”:”mongo_testdb.docl2”, //该值返回的是该query所查询的表
“indexFilterSet”:false,
“parsedQuery”:{
“by”:{
“$eq”:”sdsd”
}
},
“winningPlan”:{ //查询优化器针对该query所返回的最优执行计划的详细内容
“stage”:”FETCH”, //最优执行计划的stage，这里返回是FETCH，可以理解为通过返回的index位置去检索具体的文档
“inputStage”:{ //是上面stage的child stage，此处是IXSCAN，表示进行的是index scanning
“stage”:”IXSCAN”,
“keyPattern”:{ //所扫描的index内容，此处是”by”:1.0。
“by”:1.0
},
“indexName”:”idx_title4”, //winning plan所选用的index
“isMultiKey”:false, //是否是Multikey，此处返回是false，如果索引建立在array上，此处将是true
“multiKeyPaths”:{
“by”:[]
},
“isUnique”:true,
“isSparse”:false,
“isPartial”:false,
“indexVersion”:2,
“direction”:”forward”, //此query的查询顺序，此处是forward，如果用了.sort({by:-1})将显示backward。
“indexBounds”:{ //winning plan所扫描的索引范围
“by”:[
“[\“sdsd\“, \“sdsd\“]“
]
}
}
},
“rejectedPlans”:[]
}
}

其中Stage的意义
explain.queryPlanner.winningPlan.stage和explain.queryPlanner.winningPlan.inputStage等，有如下几类介绍

COLLSCAN: 全表扫描
IXSCAN: 索引扫描
FETCH: 根据索引去检索指定document
SHARD_MERGE:将各个分片返回数据进行merge
SORT: 表明在内存中进行了排序
LIMIT: 使用limit限制返回数
SKIP: 使用skip进行跳过
IDHACK: 针对_id进行查询
SHARDING_FILTER:通过mongos对分片数据进行查询
COUNT: 利用db.coll.explain().count()之类进行count运算
COUNTSCAN: count不使用Index进行count时的stage返回
COUNT_SCAN: count使用了Index进行count时的stage返回
SUBPLA: 未使用到索引的$or查询的stage返回
TEXT: 使用全文索引进行查询时候的stage返回
PROJECTION: 限定返回字段时候stage的返回
SORT_KEY_GENERATOR：获取每一个文档排序所用的键值

1.2executionStats的解释

查看SQL语句执行计划和统计信息
db.docl2.find({“by”:”sdsd”}).explain(“executionStats”) //这个语句真正执行SQL语句

executionStats的解释

{
“executionStats”:{
“executionSuccess”:true, //语句执行是否成功
“nReturned”:0, //查询返回的条目
“executionTimeMillis”:0, //该query的整体查询时间
“totalKeysExamined”:0, //索引扫描条目
“totalDocsExamined”:0, //文档扫描条目
“executionStages”:{
“stage”:”FETCH”,
“nReturned”:0,
“executionTimeMillisEstimate”:0, //该查询根据index去检索document获取N条具体数据的时间
“works”:1,
“advanced”:0,
“needTime”:0,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“docsExamined”:0,
“alreadyHasObj”:0,
“inputStage”:{
“stage”:”IXSCAN”,
“nReturned”:0,
“executionTimeMillisEstimate”:0, //该查询扫描N行index所用时间
“works”:1,
“advanced”:0,
“needTime”:0,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“keyPattern”:{
“by”:1.0
},
“indexName”:”idx_title4”,
“isMultiKey”:false,
“multiKeyPaths”:{
“by”:[]
},
“isUnique”:true,
“isSparse”:false,
“isPartial”:false,
“indexVersion”:2,
“direction”:”forward”,
“indexBounds”:{
“by”:[
“[\“sdsd\“, \“sdsd\“]“
]
},
“keysExamined”:0,
“seeks”:1,
“dupsTested”:0,
“dupsDropped”:0
}
},
“allPlansExecution”:[]
}
}

二.通过Explain优化mongo的SQL语句

测试数据

db.doc_zhaoxg.insert( { “a” : 1, “b” :1 ,”c”:1 })
db.doc_zhaoxg.insert( { “a” : 1, “b” :2 ,”c”:2 })
db.doc_zhaoxg.insert( { “a” : 1, “b” :3 ,”c”:3 })
db.doc_zhaoxg.insert( { “a” : 4, “b” :2 ,”c”:3 })
db.doc_zhaoxg.insert( { “a” : 4, “b” :2 ,”c”:5 })
db.doc_zhaoxg.insert( { “a” : 4, “b” :2 ,”c”:5 })
db.doc_zhaoxg.insert( { “a” : 1, “b” :9 ,”c”:1 })
db.doc_zhaoxg.insert( { “a” : 1, “b” :9 ,”c”:1 })
db.doc_zhaoxg.insert( { “a” : 1, “b” :9 ,”c”:1 })
db.doc_zhaoxg.insert( { “a” : 2, “b” :1 ,”c”:1 })

执行如下SQL语句
db.doc_zhaoxg.find({a:1,b:{$lt:3}}).sort({c:-1}).explain(“executionStats”)

executionStats信息内容如下：

{
“executionStats”:{
“executionSuccess”:true,
“nReturned”:2,
“executionTimeMillis”:0,
“totalKeysExamined”:0,
“totalDocsExamined”:10,
“executionStages”:{
“stage”:”SORT”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:16,
“advanced”:2,
“needTime”:13,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“sortPattern”:{
“c”:-1
},
“memUsage”:126,
“memLimit”:33554432,
“inputStage”:{
“stage”:”SORT_KEY_GENERATOR”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:13,
“advanced”:2,
“needTime”:10,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“inputStage”:{
“stage”:”COLLSCAN”,
“filter”:{
“$and”:[
{
“a”:{
“$eq”:1
}
},
{
“b”:{
“$lt”:3
}
}
]
},
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:12,
“advanced”:2,
“needTime”:9,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“direction”:”forward”,
“docsExamined”:10
}
}
},
“allPlansExecution”:[]
}
}

从这个执行计划的统计信息可以得到如下信息
1.这个query是全表扫描，并在把查询结果在内存中排序
2.nReturned=2 < totalDocsExamined=10 且 totalKeysExamined=0 也说明是没用到索引是全表扫描

我们期望的结果是什么?
query语句使用索引，最好可以利用索引来排序，怎么实现呢？

我们创建多字段的复合索引

db.doc_zhaoxg.createIndex({“a”:1,”b”:1,”c”:1},{“name”:’idx_a_b_c’,background:true})

执行计划如下：

{
“executionStats”:{
“executionSuccess”:true,
“nReturned”:2,
“executionTimeMillis”:1,
“totalKeysExamined”:2,
“totalDocsExamined”:2,
“executionStages”:{
“stage”:”FETCH”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:7,
“advanced”:2,
“needTime”:4,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“docsExamined”:2,
“alreadyHasObj”:0,
“inputStage”:{
“stage”:”SORT”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:6,
“advanced”:2,
“needTime”:4,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“sortPattern”:{
“c”:-1
},
“memUsage”:86,
“memLimit”:33554432,
“inputStage”:{
“stage”:”SORT_KEY_GENERATOR”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:4,
“advanced”:2,
“needTime”:1,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“inputStage”:{
“stage”:”IXSCAN”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:3,
“advanced”:2,
“needTime”:0,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“keyPattern”:{
“a”:1.0,
“b”:1.0,
“c”:1.0
},
“indexName”:”idx_a_b_c”,
“isMultiKey”:false,
“multiKeyPaths”:{
“a”:[],
“b”:[],
“c”:[]
},
“isUnique”:false,
“isSparse”:false,
“isPartial”:false,
“indexVersion”:2,
“direction”:”forward”,
“indexBounds”:{
“a”:[
“[1, 1]“
],
“b”:[
“[-inf.0, 3)”
],
“c”:[
“[MinKey, MaxKey]“
]
},
“keysExamined”:2,
“seeks”:1,
“dupsTested”:0,
“dupsDropped”:0
}
}
}
},
“allPlansExecution”:[]
}
}

可以看到：
nReturned=2 & totalDocsExamined=2 & totalKeysExamined=2 说明通过索引检索数据，虽然是回表查询数据的，是我们希望看到的
inputStage.stage=SORT 说明查询的结果在内存里排序的，如果数据量很多大的，将很耗费资源，不是我们想看到的

我们调整下索引中列的顺序
db.doc_zhaoxg.createIndex({“a”:1,”c”:1,”b”:1},{“name”:’idx_a_c_b’,background:true})

执行计划如下：

{
“executionStats”:{
“executionSuccess”:true,
“nReturned”:2,
“executionTimeMillis”:0,
“totalKeysExamined”:4,
“totalDocsExamined”:2,
“executionStages”:{
“stage”:”FETCH”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:6,
“advanced”:2,
“needTime”:2,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“docsExamined”:2,
“alreadyHasObj”:0,
“inputStage”:{
“stage”:”IXSCAN”,
“nReturned”:2,
“executionTimeMillisEstimate”:0,
“works”:5,
“advanced”:2,
“needTime”:2,
“needYield”:0,
“saveState”:0,
“restoreState”:0,
“isEOF”:1,
“keyPattern”:{
“a”:1.0,
“c”:1.0,
“b”:1.0
},
“indexName”:”idx_a_c_b”,
“isMultiKey”:false,
“multiKeyPaths”:{
“a”:[],
“c”:[],
“b”:[]
},
“isUnique”:false,
“isSparse”:false,
“isPartial”:false,
“indexVersion”:2,
“direction”:”backward”,
“indexBounds”:{
“a”:[
“[1, 1]“
],
“c”:[
“[MaxKey, MinKey]“
],
“b”:[
“(3, -inf.0]“
]
},
“keysExamined”:4,
“seeks”:3,
“dupsTested”:0,
“dupsDropped”:0
}
},
…….
}
}

这回看到：
nReturned=2 & totalDocsExamined=2 & totalKeysExamined=4 可以看到在索引扫描时，比上面的索引多扫描了两条记录
inputStage.stage=IXSCAN 说明查询的结果利用索引排序的，从executionTimeMillis=0也可以看到，比前面语句executionTimeMillis=1执行要快

小结：
SQL语句执行最理想的状态是

nReturned=totalKeysExamined & totalDocsExamined=0 ：使用了覆盖索引，只用到了index，无需回表文档扫描，这是最理想状态。

nReturned=totalKeysExamined=totalDocsExamined：正常index索引，没有多余index扫描与文档扫描

totalKeysExamined>nReturned=totalDocsExamined：正常index索引，会少许的多扫描几条index记录

totalKeysExamined>totalDocsExamined>nReturned：正常index索引，有很多的index扫描与文档扫描

如果有sort的时候，为了使sort不在内存中进行，可以在保证nReturned=totalDocsExamined的基础上，totalKeysExamined可以大于totalDocsExamined与nReturned，因为量级较大的时候内存排序更消耗性能。

参考：

https://mongoing.com/eshu_explain1

-———end———-