一、简介

这里简单介绍一下各个工具的使用场景,一般用mysql,redis,mongodb做存储层,hadoop,spark做大数据分析。

  • mysql适合结构化数据,类似excel表格一样定义严格的数据,用于数据量中,速度一般支持事务处理场合

  • redis适合缓存内存对象,如缓存队列,用于数据量小,速度快不支持事务处理高并发场合

  • mongodb,适合半结构化数据,如文本信息,用于数据量大,速度较快不支持事务处理场合

  • hadoop是个生态系统,上面有大数据分析很多组件,适合事后大数据分析任务

  • spark类似hadoop,偏向于内存计算,流计算,适合实时半实时大数据分析任务

移动互联网及物联网让数据呈指数增长,NoSql大数据新起后,数据存储领域发展很快,似乎方向都是向大数据,内存计算,分布式框架,平台化发展,出现不少新的方法,普通应用TB,GB级别达不到PB级别的数据存储,用mongodb,mysql就够了,hadoop,spark这类是航母一般多是大规模应用场景,多用于事后分析统计用,如电商的推荐系统分析系统。IAO

看标题,这里是不是跑题了呢,显然不是,了解一下mongodb在存储中的位置还是非常有必要的,explain 和 hint 一看就知道是从mysql借鉴过来的(猜的),实际就是检测查询语句的性能和使用强制索引

二、explain

先写入测试数据

db.test.insertMany([
{ "_id" : 1, "a" : "f1", b: "food", c: 500 },
{ "_id" : 2, "a" : "f2", b: "food", c: 100 },
{ "_id" : 3, "a" : "p1", b: "paper", c: 200 },
{ "_id" : 4, "a" : "p2", b: "paper", c: 150 },
{ "_id" : 5, "a" : "f3", b: "food", c: 300 },
{ "_id" : 6, "a" : "t1", b: "toys", c: 500 },
{ "_id" : 7, "a" : "a1", b: "apparel", c: 250 },
{ "_id" : 8, "a" : "a2", b: "apparel", c: 400 },
{ "_id" : 9, "a" : "t2", b: "toys", c: 50 },
{ "_id" : 10, "a" : "f4", b: "food", c: 75 }]);

写入成功返回值

{
"acknowledged" : true,
"insertedIds" : [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10
]
}

开始查询

> db.test.find();
{ "_id" : 1, "a" : "f1", "b" : "food", "c" : 500 }
{ "_id" : 2, "a" : "f2", "b" : "food", "c" : 100 }
{ "_id" : 3, "a" : "p1", "b" : "paper", "c" : 200 }
{ "_id" : 4, "a" : "p2", "b" : "paper", "c" : 150 }
{ "_id" : 5, "a" : "f3", "b" : "food", "c" : 300 }
{ "_id" : 6, "a" : "t1", "b" : "toys", "c" : 500 }
{ "_id" : 7, "a" : "a1", "b" : "apparel", "c" : 250 }
{ "_id" : 8, "a" : "a2", "b" : "apparel", "c" : 400 }
{ "_id" : 9, "a" : "t2", "b" : "toys", "c" : 50 }
{ "_id" : 10, "a" : "f4", "b" : "food", "c" : 75 }
> db.test.find().count();
10
> db.test.find({ c: { $gte: 100, $lte: 200 }}).count()
3
> db.test.find({ c: { $gte: 100, $lte: 200 }}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.test",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"nReturned" : 3,
"executionTimeMillisEstimate" : 0,
"works" : 12,
"advanced" : 3,
"needTime" : 8,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 10
}
},
"serverInfo" : {
"host" : "iZbp1g11g0cdeeq9ht9fhjZ",
"port" : 27017,
"version" : "3.4.12",
"gitVersion" : "bfde702b19c1baad532ed183a871c12630c1bbba"
},
"ok" : 1
}

看一下几个关键词

"stage" : "COLLSCAN",

"nReturned" : 3,

"totalDocsExamined" : 10,

全部扫描,不走索引,这里只是演示,所以数据量比较少,如果数据量多起来这样查询将会很慢,甚至会卡死

COLLSCAN

这个是什么意思呢? 如果你仔细一看,应该知道就是CollectionScan,就是所谓的“集合扫描”,对不对,看到集合扫描是不是就可以直接map到数据库中的table scan/heap scan呢??? 是的,这个就是所谓的性能最烂最无奈的由来。

nReturned

这个很简单,就是所谓的numReturned,就是说最后返回的num个数,从图中可以看到,就是最终返回了三条。。。

docsExamined

那这个是什么意思呢??就是documentsExamined,检查了10个documents。。。而从返回上面的nReturned。

创建索引并查询

> db.test.createIndex({ c:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.test.find({ c: { $gte: 100, $lte: 200 }}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.test",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"c" : 1
},
"indexName" : "c_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 3,
"totalDocsExamined" : 3,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 3,
"executionTimeMillisEstimate" : 0,
"works" : 4,
"advanced" : 3,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 3,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 3,
"executionTimeMillisEstimate" : 0,
"works" : 4,
"advanced" : 3,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"c" : 1
},
"indexName" : "c_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
]
},
"keysExamined" : 3,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "iZbp1g11g0cdeeq9ht9fhjZ",
"port" : 27017,
"version" : "3.4.12",
"gitVersion" : "bfde702b19c1baad532ed183a871c12630c1bbba"
},
"ok" : 1
}

 再看看上面几个关键词

"stage" : "IXSCAN"

"totalDocsExamined" : 3,

瞬间就少了,这样查询时间也会大大减少

三、hint

这时一个很好玩的一个东西,就是用来force mongodb to excute special index,对吧,为了方便演示,我们做两组复合索引,比如这次我们在c和b上构建一下:

创建索引

> db.test.createIndex({ c:1,b:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
> db.test.createIndex({ b:1,c:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}

  hint查询

> db.test.find({ c: { $gte: 100, $lte: 200 },b:"food"}).hint({c:1,b:1}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.test",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"b" : {
"$eq" : "food"
}
},
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"c" : 1,
"b" : 1
},
"indexName" : "c_1_b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
],
"b" : [
"[\"food\", \"food\"]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 3,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 10,
"works" : 3,
"advanced" : 1,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 10,
"works" : 3,
"advanced" : 1,
"needTime" : 1,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"c" : 1,
"b" : 1
},
"indexName" : "c_1_b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
],
"b" : [
"[\"food\", \"food\"]"
]
},
"keysExamined" : 3,
"seeks" : 2,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "iZbp1g11g0cdeeq9ht9fhjZ",
"port" : 27017,
"version" : "3.4.12",
"gitVersion" : "bfde702b19c1baad532ed183a871c12630c1bbba"
},
"ok" : 1
}

 正常查询

> db.test.find({ c: { $gte: 100, $lte: 200 },b:"food"}).explain("executionStats")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.test",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"b" : {
"$eq" : "food"
}
},
{
"c" : {
"$lte" : 200
}
},
{
"c" : {
"$gte" : 100
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"b" : 1,
"c" : 1
},
"indexName" : "b_1_c_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[\"food\", \"food\"]"
],
"c" : [
"[100.0, 200.0]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"c" : 1,
"b" : 1
},
"indexName" : "c_1_b_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
],
"b" : [
"[\"food\", \"food\"]"
]
}
}
},
{
"stage" : "FETCH",
"filter" : {
"b" : {
"$eq" : "food"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"c" : 1
},
"indexName" : "c_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"c" : [
"[100.0, 200.0]"
]
}
}
}
]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 3,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 1,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 1,
"executionTimeMillisEstimate" : 0,
"works" : 2,
"advanced" : 1,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"b" : 1,
"c" : 1
},
"indexName" : "b_1_c_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"b" : [
"[\"food\", \"food\"]"
],
"c" : [
"[100.0, 200.0]"
]
},
"keysExamined" : 1,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "iZbp1g11g0cdeeq9ht9fhjZ",
"port" : 27017,
"version" : "3.4.12",
"gitVersion" : "bfde702b19c1baad532ed183a871c12630c1bbba"
},
"ok" : 1
}

主要对比的还是:

"totalKeysExamined" : 3,
"totalDocsExamined" : 1,

"totalKeysExamined" : 1,
"totalDocsExamined" : 1,

是不是比较有意思,有时候monogdb并不会,走你想要的索引,当你创建多个联合索引的时候,情况就比较明显了

MongoDB中的explain和hint提的使用的更多相关文章

  1. mongodb之使用explain和hint性能分析和优化

    当你第一眼看到explain和hint的时候,第一个反应就是mysql中所谓的这两个关键词,确实可以看出,这个就是在mysql中借鉴过来的,既然是借鉴 过来的,我想大家都知道这两个关键字的用处,话不多 ...

  2. MongoDB的学习--explain()和hint()

    Explain 从之前的文章中,我们可以知道explain()能够提供大量与查询相关的信息.对于速度比较慢的查询来说,这是最重要的诊断工具之一.通过查看一个查询的explain()输出信息,可以知道查 ...

  3. 在MongoDB中执行查询、创建索引

    1. MongoDB中数据查询的方法 (1)find函数的使用: (2)条件操作符: (3)distinct找出给定键所有不同的值: (4)group分组: (5)游标: (6)存储过程. 文档查找 ...

  4. 在MongoDB中执行查询与创建索引

    实验目的: (1)掌握MongoDB中数据查询的方法: (2)掌握MongoDB中索引及其创建: 实验内容: 一. MongoDB中数据查询的方法: (1)find函数的使用: (2)条件操作符: a ...

  5. mongodb中的排序和索引快速学习

    在mongodb中,排序和索引其实都是十分容易的,先来小结下排序: 1 先插入些数据    db.SortTest.insert( { name : "Denis", age : ...

  6. MongoDB中聚合工具Aggregate等的介绍与使用

    Aggregate是MongoDB提供的众多工具中的比较重要的一个,类似于SQL语句中的GROUP BY.聚合工具可以让开发人员直接使用MongoDB原生的命令操作数据库中的数据,并且按照要求进行聚合 ...

  7. MongoDB中的聚合操作

    根据MongoDB的文档描述,在MongoDB的聚合操作中,有以下五个聚合命令. 其中,count.distinct和group会提供很基本的功能,至于其他的高级聚合功能(sum.average.ma ...

  8. MongoDB 大数据技术之mongodb中在嵌套子文档的文档上面建立索引

    一.给collection objectid赋自定义的值 MongoDB Enterprise > db.testid.insert({_id:{imsi:"4567890123&qu ...

  9. MongoDB 索引 和 explain 的使用

    索引基本使用 索引是对数据库表中一列或多列的值进行排序的一种结构,可以让我们查询数据库变得 更快.MongoDB 的索引几乎与传统的关系型数据库一模一样,这其中也包括一些基本的查 询优化技巧. 首先我 ...

随机推荐

  1. linux下安装openoffice

    一.环境 centos6.9 安装jdk1.6及以上 二.安装依赖 yum install libXext.x86_64 -y yum install freetype -y yum groupins ...

  2. HTML页面滚动时获取离页面顶部的距离2种实现方法

    获取离滚动页面的顶部距离有两种方法一是DOM:而是jquery,具体的实现如下,感兴趣的朋友可以尝试操作下     方法一:DOM 复制代码 代码如下: <script> window.o ...

  3. 【Java】 剑指offer(20) 表示数值的字符串

    本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集   题目 请实现一个函数用来判断字符串是否表示数值(包括整数和小数).例如, ...

  4. BootstrapTable使用实例

    一.bootstrapTable简单使用: <link rel="stylesheet" href="./static/libs/bootstrap/css/boo ...

  5. VSCode从非根目录编译golang程序(转)

    1.问题提出 “习惯在项目目录里建src放源码文件,根目录里放配置文件或者别的什么,在交付时直接忽视掉src目录就行了,但vscode好像不能这样愉快的玩耍...”??? 要实现把源码放到src目录下 ...

  6. P1510 精卫填海

    P1510 精卫填海二分答案二分背包容量,判断能否满足v.判断的话就跑01背包就好了. #include<iostream> #include<cstdio> #include ...

  7. 不一样的go语言-不同的OO

    前言   go语言因为产生时代的原因,大神们在设计go时,不得不考虑业界的流行趋势(编程理念),使得go既可以面向过程编程,也可以面向对象编程.这里不探讨两者的优劣,存在即是合理,面向过程编程经久不衰 ...

  8. 获取img的高

    我们可以通过css设置图片的width,然后通过 clientWidth获取图片的宽,但是这个宽似乎是css里面定义的width值,但是对于图片的高,使用 clientHeight 来获取似乎是有些问 ...

  9. BZOJ.3238.[AHOI2013]差异(后缀自动机 树形DP/后缀数组 单调栈)

    题目链接 \(Description\) \(Solution\) len(Ti)+len(Tj)可以直接算出来,每个小于n的长度会被计算n-1次. \[\sum_{i=1}^n\sum_{j=i+1 ...

  10. python面向对象编程练习

    练习题 1.面向对象三大特性,各有什么用处,说说你的理解. 面向对象的三大特性: 1.继承:解决代码的复用性问题 2.封装:对数据属性严格控制,隔离复杂度 3.多态性:增加程序的灵活性与可扩展性 2. ...