Elastic Search中mapping的问题

Mapping在ES中是非常重要的一个概念。决定了一个index中的field使用什么数据格式存储，使用什么分词器解析，是否有子字段，是否需要copy to其他字段等。
Mapping决定了index中的field的特征。

在ES中有一些自动的字段数据类型识别。
自动识别标准：
数字 -> long 长整数
文本 -> text 文本，字符串
特殊格式的字符串（如：2018-01-01） -> 对应的特殊类型（如：date）
字面值true|false -> boolean类型。

1 测试搜索
测试数据：

PUT /test_index/test_type/1

{

  "post_date": "2018-01-01",

  "title": "my first title",

  "content": "this is my first content in this test",

  "author_id": 110

}

PUT /test_index/test_type/2

{

  "post_date": "2018-01-02",

  "title": "my second title",

  "content": "this is my second content in this test",

  "author_id": 110

}

PUT /test_index/test_type/3

{

  "post_date": "2018-01-03",

  "title": "my third title",

  "content": "this is my third content in this test",

  "author_id": 110

}

测试搜索：（ES 6.3.1版本中）

GET /test_index/test_type/_search?q=2018 # 搜索结果不满意。只有一条数据

GET /test_index/test_type/_search?q=2018-01-01 # 搜索结果正确

GET /test_index/test_type/_search?q=post_date:2018-01-01 # 搜索结果正确

GET /test_index/test_type/_search?q=post_date:2018 # 只有一条数据

GET /test_index/test_type/_search?q=this # 搜索结果正确

GET /test_index/test_type/_search?q=content:this # 搜索结果正确

查看mapping：可以检查index的mapping，是否符合具体的需求。

GET /index_name/_mapping/type_name

GET /test_index/_mapping/test_type

{

  "test_index": { 索引名称

    "mappings": { 开始显示mapping

      "test_type": { 类型名称

        "properties": { 映射中的具体配置

          "author_id": { “字段名” :{映射信息} 映射信息包括子字段，数据类型，分词器

            "type": "long" 字段类型为长整数。

          },

          "content": {

            "type": "text", 字段类型是文本

            "fields": { 子字段列表，就是ES自动的为当前字段创建的一个子字段。字段名称是 父字段名.子字段名。 ES为text类型字段默认提供的子字段名称为keyword。

              "keyword": {

                "type": "keyword", 不做任何分词的文本类型

                "ignore_above": 256 默认最长存储多少个字符

              }

            }

          },

          "post_date": {

            "type": "date" 日期类型，没有分词

          },

          "title": {

            "type": "text",

            "fields": {

              "keyword": {

                "type": "keyword",

                "ignore_above": 256

              }

            }

          }

        }

      }

    }

  }

}

ES中有字段映射mapping。是有一定的规则的。

文本类型为text，分词器为standard，子字段一定创建，命名为xxx.keyword，类型是keyword类型，长度为256个字符。
整数位long类型
“yyyy-MM-dd”是date类型，不做分词

总结：
自动或手动为index中的type建立的一种数据结构和相关配置，简称为mapping
dynamic mpping：是ES自动为我们建立index，创建type，以及type对应的mapping，mapping中包含了每个field对应的数据类型，以及如何分词等设置

搜索结果为什么不一致？因为ES自动建立mapping的时候，为不同的field设置了不同的data type。不同的data type的分词、搜索等行为是不一样的。所以出现了_all field和post_date field的搜索结果和预期不一致的问题（老版本区别更大）。

ES在6.x版本中，对date类型数据进行了搜索优化，会为同年数据创建一个默认搜索数据（如2018-01-01），而不是将2018-01-01分词为2018、01、01三个数据。
而这种搜索日期必须完全匹配，搜索文本可以模糊匹配的搜索方式也称为：exact value（精确匹配）、full text（全文搜索）。

2 测试分词结果

GET /_analyze

{

  "analyzer" : "standard",

  "text" : "2018-01-01 my first title this is my first content in this test 110"

}

GET /_analyze

{

  "analyzer" : "standard",

  "text" : "I Love You"

}

3 mapping核心数据类型

ES中的数据类型有很多，在这里只介绍常用的数据类型。
字符串：text（string）
整数：byte、short、integer、long
浮点型：float、double
布尔类型：boolean
日期类型：date

4 dynamic mapping对字段的类型分配

true or false -> boolean
123 -> long
123.123 -> double
2018-01-01 -> date
hello world -> text（string）
在上述的自动mapping字段类型分配的时候，只有text类型的字段需要分词器。默认分词器是standard分词器。

5 custom mapping
可以通过命令，在创建index和type的时候，自指定mapping，也就是指定字段的类型和字段数据使用的分词器。
手工创建mapping时，只能新增mapping设置，不能对已有的mapping进行修改。
如：有索引a，其中有类型b，增加字段f1的mapping定义。后续可以增加字段f2的mapping定义，但是不能修改f1字段的mapping定义。
通常都是手工创建index，并进行各种定义。如：settings,mapping等。

5.1 创建索引时指定mapping
语法：

PUT /test_index

{

  "settings": {

    "number_of_shards": 2,

    "number_of_replicas": 1

  },

  "mappings": {

    "test_type":{

      "properties": {

        "author_id" : {

          "type": "byte",

          "index": false

        },

        "title" : {

          "type": "text",

          "analyzer": "standard",

          "fields": {

            "keyword" : {

              "type": "keyword",

              "ignore_above": 256

            }

          }

        },

        "content" : {

          "type": "text",

          "analyzer": "ik_max_word"

        },

        "post_date" : {

          "type": "date"

        }

      }

    }

  }

}

"index" - 是否可以作为搜索索引。可选值：true | false
"analyzer" - 指定分词器。
"type" - 指定字段类型

5.2 为已有索引添加新的字段mapping
语法：

PUT /test_index/_mapping/test_type

{

  "properties" : {

    "new_field" : { "type" : "text" , "analyzer" : "standard" }

  }

}

5.3 测试不同的字段的分词器

GET /test_index/_analyze

{

  "field": "new_field",

  "text": "中华人民共和国国歌"

}

GET /test_index/_analyze

{

  "field": "content",

  "text": "中华人民共和国国歌"

}

6 定制分词器
ES中可以为index定制分词器，就是依托ES提供的默认分词器，实现新的定制化。
案例1：

PUT /test_analyzer

{

  "settings": {

    "number_of_shards": 2,

    "number_of_replicas": 1,

    "analysis": {

      "analyzer": {

        "my_analyzer" : {

          "type" : "standard",

          "stopwords" : "_english_"

        }

      }

    }

  }

}

GET /test_analyzer/_analyze

{

  "analyzer": "my_analyzer",

  "text": "this is a test analyzer content"

}

GET /test_analyzer/_analyze

{

  "analyzer": "standard",

  "text": "this is a test analyzer content"

}

案例2：

PUT /test_analyzer1

{

  "settings": {

    "analysis": {

      "char_filter": {

        "my_char_filter" : {

          "type" : "mapping",

          "mappings" : [ "&=>and"]

        }

      },

      "filter":{

        "my_stopwords_filter" :{

          "type" : "stop",

          "stopwords" : [ "the", "a" ]

        }

      },

      "analyzer" : {

        "my_second_analyzer" : {

          "type" : "custom",

          "char_filter" : "my_char_filter",

          "tokenizer" : "standard",

          "filter" : [ "lowercase", "my_stopwords_filter"]

        }

      }

    }

  }

}

GET /test_analyzer1/_analyze

{

  "analyzer": "my_second_analyzer",

  "text": "this is a test analyzer content & it is second analyzer"

}

在商业项目中，使用自定义分词器的相对较少。除非在专业领域。如：生物制药，航空领域，证券等。。。

使用自定义分词器：自定义分词器只能在定义这个分词器的索引中使用。wiki

PUT test_analyzer/_mapping/test_type

{

  "properties": {

    "field_name" : {

      "type": "text",

      "analyzer": "my_analyzer"

    }

  }

}

7 mapping复杂定义
ES中可以为类型相对复杂的字段定义mapping。如：multi field（一个字段有多个值、数组），empty field（保存null值，或空数据的[]），object field（对象类型）。上述的复杂类型都是常用的类型。不是全部。

7.1 multi field
数组数据： [ "tags" : "tag1", "tag2" ]
这种数据类型和普通的数据类型没有什么区别。只是要求字段中的多个数据的类型必须相同。
测试：

PUT /test_index/test_type/1

{

  "tags" : [ "tag1", "tag2", "tag3" ],

  "name" : "zhangsan"

}

GET /test_index/_mapping/test_type

手工定义mapping

PUT /test_index

{

  "mappings" : {

    "test_type" : {

      "properties" : {

        "tags" : { "type" : "text" , "analyzer" : "standard" },

        "name" : { "type" : "text" , "analyzer" : "english" }

      }

    }

  }

}

7.2 empty field
空数据： null [] [null]
空数据如果直接保存到index中，由ES为index自动创建mapping，那么此空数据对应的field将不会创建mapping映射值。而任意的mapping定义都可以保存空数据。
测试：

PUT /test_index/test_type/1

{

  "name" : "zhangsan",

  "empty_field" : null

}

GET /test_index/_mapping/test_type

7.3 object field
对象数据： { "address" : { "province" : "北京", "city" : "北京", "street" : "建材城西路" } }
对象数据如果保存到ES中，由ES自动创建mapping，那么ES会为对象中的每个字段定义mapping映射。
测试：

PUT /test_index/test_type/1

{

  "name" : "zhangsan" ,

  "age" : 20,

  "address" : {

    "province" : "beijing",

    "city" : "beijing",

    "street" : "jian chai cheng xi lu"

  }

}

GET /test_index/_mapping/test_type

ES在底层存储对象数据的时候，是使用特定的格式存储的。如上述测试数据中，如果保存到ES中，ES底层存储的数据为：

{

  "name" : "zhangsan",

  "age" : 20,

  "address.province" : "beijing",

  "address.city" : "beijing",

  "address.street" : "jian chai cheng xi lu"

}

手工定义mapping

PUT test_index

{

  "mappings": {

    "test_type":{

      "properties": {

        "name" : {

          "type": "text",

          "analyzer": "ik_max_word"

        },

        "age" : {

          "type": "byte"

        },

        "address" : {

          "properties": {

            "province" : {

              "type" : "text",

              "analyzer" : "ik_max_word"

            },

            "city" : {

              "type" : "text",

              "analyzer" : "ik_max_word"

            },

            "street" : {

              "type" : "text",

              "analyzer" : "ik_max_word"

            }

          }

        }

      }

    }

  }

}

更复杂的对象：（数组+对象）这种数据格式，在ES中如果自动创建mapping，是为数组中的每个对象的字段创建mapping映射信息。如下述的案例中，ES会自动的为emps数组对象中的name和age字段分别创建mapping映射信息。

PUT /test_index/test_type/1

{

  "dept_name" : "sales",

  "emps" : [

    { "name" : "zhangsan", "age" : 20 },

    { "name" : "lisi", "age" : 21 },

    { "name" : "wangwu", "age" : 22 }

  ]

}

GET /test_index/_mapping/test_type

上述的数据在ES中底层存储也有其特有的格式，大致如下：（如果name数据可以进行分词的话，emps.name对应的数据数组内容会更多。）

{

  "dept_name" : "sales",

  "emps.name" : [ "zhangsan", "lisi", "wangwu" ],

  "emps.age" : [20, 21, 22]

}

8 mapping的root object
所谓的mapping的root object就是设置index的mapping时，一个type对应的json数据。包括的内容有：properties， metadata（_id, _source, _all）, settings（分词器等）。其中字段配置include_in_all已在6.x版本中删除。_all配置将在7.x版本中删除。
如：强调部分就是root object。

PUT /test_index9

{

  "settings" : {

    "number_of_shards" : 2,

    "number_of_replicas" : 1

  },

  "mappings" : {

    "test_type" : {

      "properties" : {

        "post_date" : { "type" : "date" },

        "title" : { "type" : "text", "index" : false },

        "content" : { "type" : "text" , "analyzer" : "english" },

        "author_id" : { "type" : "integer" }

      },

      "_all" : { "enabled" : false },

      "_source" : { "enabled" : false }

    }

  }

}

9 定制dynamic mapping策略
ES中可以手工干预ES的dynamic mapping。如：定义index中是否可以增加不在mapping范围内的字段；如果增加了不在mapping范围内的字段的时候，如何管理；自动映射中如果是对象类型的字段，对象中是否可以增加不在mapping范围内的字段，如何管理不在mapping范围内的字段。
ES中支持在自定义mapping时，为type定制dynamic mapping策略。可以让ES中的index更加的友好。在定制dynamic mapping策略时，可选值有：true（默认值）-遇到陌生字段自动进行dynamic mapping， false-遇到陌生字段，不进行dynamic mapping（会保存数据，但是不做倒排索引，无法实现任何的搜索），strict-遇到陌生字段，直接报错。
案例：

PUT /test_index

{

  "mappings": {

    "test_type" : {

      "dynamic" : "strict",

      "properties": {

        "field1" : {

          "type": "text"

        },

        "field2" : {

          "type": "object",

          "dynamic" : false

        }

      }

    }

  }

}

PUT /test_index/test_type/1

{

  "field1" :"aaa",

  "field3" : "bbb"

}

PUT /test_index/test_type/1

{

  "field1" : "aaa",

  "field2" : {

    "sub_f1" : "sub1",

    "sub_f2" : "sub2"

  }

}

GET /test_index/test_type/1

GET /test_index/_mapping/test_type

定制dynamic mapping，使用比较少，因为很难去分析出一套完整的，有扩展能力的结构。无法适应业务的变更。
如果使用，一般在固定的，几乎不会改变的数据结构中使用。如：人的身份证信息：姓名、出生年月、地址、身份证号、照片、发证机关、有效期。

Elastic Search中mapping的问题的更多相关文章

Elastic search中使用nested类型的内嵌对象
在大数据的应用环境中,往往使用反范式设计来提高读写性能. 假设我们有个类似简书的系统,系统里有文章,用户也可以对文章进行赞赏.在关系型数据库中,如果按照数据库范式设计,需要两张表:一张文章表和一张赞赏 ...
Elastic Search中Document的CRUD操作
一. 新增Document在索引中增加文档.在index中增加document.ES有自动识别机制.如果增加的document对应的index不存在.自动创建,如果index存在,type不存在自动创 ...
Elastic Search中filter的理解
在ES中,请求一旦发起,ES服务器是按照请求参数的顺序依次执行具体的搜索过滤逻辑的.如何定制请求体中的搜索过滤条件顺序,是一个经验活.类似query(指search中的query请求参数),也是搜索的 ...
Elastic Search中DSL Query的常见语法
Query DSL是一种通过request body提交搜索参数的请求方式.就是将请求头参数(?xxx=xxx)转换为请求体参数.语法格式:GET [/index_name/type_name]/_s ...
Elastic Search中normalization和分词器
为key_words提供更加完整的倒排索引. 如:时态转化(like | liked),单复数转化(man | men),全写简写(china | cn),同义词(small | little)等. ...
Elastic Search中Query String常见语法
1 搜索所有数据timeout参数:是超时时长定义.代表每个节点上的每个shard执行搜索时最多耗时多久.不会影响响应的正常返回.只会影响返回响应中的数据数量.如:索引a中,有10亿数据.存储在5个s ...
elastic search文档详解
在elastic search中文档(document)类似于关系型数据库里的记录(record),类型(type)类似于表(table),索引(index)类似于库(database). 文档一定有 ...
elastic search使用
elastic使用使用python时注意保持一个好习惯:不要使用类似str.type这样的变量名,很容易引发错误: https://blog.csdn.net/lifelegendc/article ...
Elastic Search快速上手（2）：将数据存入ES
前言在上手使用前,需要先了解一些基本的概念. 推荐可以到 https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.htm ...

随机推荐

python正则表达式的用法
import re r1 = re.compile(r'(?im)(?P<name></html>)$') content = """ <H ...
javaScript基础用Number()把其它类型转换为Number类型
一:基本类型字符串把字符串转换为数字,只要字符串中包含任意一个非有效数字字符(第一个点除外)结果都是NaN,空字符串会变为数字零 console.log(Number("12.5&quo ...
【零基础】入门51单片机图文教程（Proteus+Keil）
参考资料: https://www.jianshu.com/p/88dfc09e7403 https://blog.csdn.net/feit2417/article/details/80890218 ...
python操作Elasticsearch (一、例子)
E lasticsearch是一款分布式搜索引擎,支持在大数据环境中进行实时数据分析.它基于Apache Lucene文本搜索引擎,内部功能通过ReST API暴露给外部.除了通过HTTP直接访问El ...
pm2 配置方式
1.命令生产默认示例配置文件pm2 ecosystem或pm2 init,运行默认会生成ecosystem.config.js配置文件 module.exports = { apps: [ { nam ...
vue如何动态绑定v-model
如图所示有三个字段要从弹出的输入框取值点击字段会弹出上面的弹窗,输入input会响应变化,比如点击身高,弹出输入框,输入值后身高后面会跟着一个同样的值点击体重,弹出输入框,输入值后体重后面会跟着一个同 ...
3.AOP中的IntroductionAdvisor
上篇中的自定义Advisor是实现的AbstractPointcutAdvisor,Advisor其实还有一个接口级别的IntroductionAdvisor ...
Tomcat安装应用部署及配置文件解读
Tomcat服务器是一个免费的开放源代码的Web应用服务器,属于轻量级应用服务器,在中小型系统和并发访问用户不是很多的场合下被普遍使用,是开发和调试JSP程序的首选. Tomcat和Nginx,APa ...
Spring配置文件里加载路径中的通配符
?代表匹配任意一个字符 *代表匹配0个或多个任意字符 **/匹配任意多个目录 classpath:app-Beans.xml 查找app-Beans.xm ...
Selenium 2自动化测试实战40（单线程）
单线程 #onethread.py #coding:utf-8 from time import sleep,ctime #听音乐任务 def music(): print('i was listen ...

Elastic Search中mapping的问题

Elastic Search中mapping的问题的更多相关文章

随机推荐

热门专题