elasticsearch补全功能之只补全筛选后的部分数据context suggester

官方文档https://www.elastic.co/guide/en/elasticsearch/reference/5.0/suggester-context.html

　　下面所有演示基于elasticsearch5.x和Python3.x

　　最近项目使用elasticsearch的补全功能时，需要对于所有文章(article)的作者名字(author)的搜索做补全，文章的mapping大致如下

ARTICLE = {

    'properties': {

        'id': {

            'type': 'integer',

            'index': 'not_analyzed',

        },

        'author': {

            'type': 'text',

        },

        'author_completion': {

            'type': 'completion',

        },

        'removed': {

            'type': 'boolean',

        }

    }

}

MAPPINGS = {

    'mappings': {

        'article': ARTICLE,

    }

}

　　现在的需求是，针对于下架状态removed为True的不做补全提示。

　　作为演示先插入部分数据，代码如下

#!/usr/bin/env python

# -*- coding: utf-8 -*-

from elasticsearch.helpers import bulk

from elasticsearch import Elasticsearch

ES_HOSTS = [{'host': 'localhost', 'port': 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = 'test_article'

TYPE = 'article'

ARTICLE = {

    'properties': {

        'id': {

            'type': 'integer',

            'index': 'not_analyzed',

        },

        'author': {

            'type': 'text',

        },

        'author_completion': {

            'type': 'completion',

        },

        'removed': {

            'type': 'boolean',

        }

    }

}

MAPPINGS = {

    'mappings': {

        'article': ARTICLE,

    }

}

def create_index():

    """

    插入数据前创建对应的index

    """

    ES.indices.delete(index=INDEX, ignore=404)

    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():

    """

    添加测试数据

    :return:

    """

    test_datas = [

        {

            'id': 1,

            'author': 'tom',

            'author_completion': 'tom',

            'removed': False

        },

        {

            'id': 2,

            'author': 'tom_cat',

            'author_completion': 'tom_cat',

            'removed': True

        },

        {

            'id': 3,

            'author': 'kitty',

            'author_completion': 'kitty',

            'removed': False

        },

        {

            'id': 4,

            'author': 'tomato',

            'author_completion': 'tomato',

            'removed': False

        },

    ]

    bulk_data = []

    for data in test_datas:

        action = {

            '_index': INDEX,

            '_type': TYPE,

            '_id': data.get('id'),

            '_source': data

        }

        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print('success', success, 'failed', failed)

if __name__ == '__main__':

    create_index()

    insert_data()

　　成功插入4条测试数据，下面测试获取作者名称补全建议，代码如下

def get_suggestions(keywords):

    body = {

        # 'size': 0,  # 这里是不返回相关搜索结果的字段，如author,id等，作为测试这里返回

        '_source': 'suggest',

        'suggest': {

            'author_prefix_suggest': {

                'prefix': keywords,

                'completion': {

                    'field': 'author_completion',

                    'size': 10,

                }

            }

        },

        # 对于下架数据，我单纯的以为加上下面的筛选就行了

        'query': {

            'term': {

                'removed': False

            }

        }

    }

    suggest_data = ES.search(index=INDEX, doc_type=TYPE, body=body)

    return suggest_data

if __name__ == '__main__':

    # create_index()

    # insert_data()

    suggestions = get_suggestions('t')

    print(suggestions)

    """

    suggestions = {

        'took': 0,

        'timed_out': False,

        '_shards': {

            'total': 5,

            'successful': 5,

            'skipped': 0,

            'failed': 0

        },

        'hits': {

            'total': 3,

            'max_score': 0.6931472,

            'hits': [

                {'_index': 'test_article', '_type': 'article', '_id': '4', '_score': 0.6931472,

                 '_source': {}},

                {'_index': 'test_article', '_type': 'article', '_id': '1', '_score': 0.2876821,

                 '_source': {}},

                {'_index': 'test_article', '_type': 'article', '_id': '3', '_score': 0.2876821,

                 '_source': {}}]},

        'suggest': {

            'author_prefix_suggest': [{'text': 't', 'offset': 0, 'length': 1, 'options': [

                {'text': 'tom', '_index': 'test_article', '_type': 'article', '_id': '1', '_score': 1.0,

                 '_source': {}},

                {'text': 'tom_cat', '_index': 'test_article', '_type': 'article', '_id': '2', '_score': 1.0,

                 '_source': {}},

                {'text': 'tomato', '_index': 'test_article', '_type': 'article', '_id': '4', '_score': 1.0,

                 '_source': {}}]}]

        }

    }

    """

　　发现，removed为True的tom_cat赫然在列，明明加了

'query': {

            'term': {

                'removed': False

            }

        }

　　却没有起作用，难道elasticsearch不支持这种需求！？怎么可能……

　　查阅文档发现解决方法为https://www.elastic.co/guide/en/elasticsearch/reference/5.0/suggester-context.html

　　找到问题所在，首先改造mapping，并重新录入测试数据如下

#!/usr/bin/env python

# -*- coding: utf-8 -*-

from elasticsearch.helpers import bulk

from elasticsearch import Elasticsearch

ES_HOSTS = [{'host': 'localhost', 'port': 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = 'test_article'

TYPE = 'article'

ARTICLE = {

    'properties': {

        'id': {

            'type': 'integer',

            'index': 'not_analyzed'

        },

        'author': {

            'type': 'text',

        },

        'author_completion': {

            'type': 'completion',

            'contexts': [  # 这里是关键所在

                {

                    'name': 'removed_tab',

                    'type': 'category',

                    'path': 'removed'

                }

            ]

        },

        'removed': {

            'type': 'boolean',

        }

    }

}

MAPPINGS = {

    'mappings': {

        'article': ARTICLE,

    }

}

def create_index():

    """

    插入数据前创建对应的index

    """

    ES.indices.delete(index=INDEX, ignore=404)

    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():

    """

    添加测试数据

    :return:

    """

    test_datas = [

        {

            'id': 1,

            'author': 'tom',

            'author_completion': 'tom',

            'removed': False

        },

        {

            'id': 2,

            'author': 'tom_cat',

            'author_completion': 'tom_cat',

            'removed': True

        },

        {

            'id': 3,

            'author': 'kitty',

            'author_completion': 'kitty',

            'removed': False

        },

        {

            'id': 4,

            'author': 'tomato',

            'author_completion': 'tomato',

            'removed': False

        },

    ]

    bulk_data = []

    for data in test_datas:

        action = {

            '_index': INDEX,

            '_type': TYPE,

            '_id': data.get('id'),

            '_source': data

        }

        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print('success', success, 'failed', failed)

if __name__ == '__main__':

    create_index()

    insert_data()

　　Duang！意想不到的问题出现了

elasticsearch.helpers.BulkIndexError: ('4 document(s) failed to index.', [{'index': {'_index': 'test_article', '_type': 'article', '_id': '1', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Failed to parse context field [removed], only keyword and text fields are accepted'}, 'data': {'id': 1, 'author': 'tom', 'author_completion': 'tom', 'removed': False}}}, {'index': {'_index': 'test_article', '_type': 'article', '_id': '2', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Failed to parse context field [removed], only keyword and text fields are accepted'}, 'data': {'id': 2, 'author': 'tom_cat', 'author_completion': 'tom_cat', 'removed': True}}}, {'index': {'_index': 'test_article', '_type': 'article', '_id': '3', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Failed to parse context field [removed], only keyword and text fields are accepted'}, 'data': {'id': 3, 'author': 'kitty', 'author_completion': 'kitty', 'removed': False}}}, {'index': {'_index': 'test_article', '_type': 'article', '_id': '4', 'status': 400, 'error': {'type': 'illegal_argument_exception', 'reason': 'Failed to parse context field [removed], only keyword and text fields are accepted'}, 'data': {'id': 4, 'author': 'tomato', 'author_completion': 'tomato', 'removed': False}}}])

　　意思是context只支持keyword和text类型，而上面removed类型为boolean，好吧，再改造mapping，将mapping的removed改为keyword类型……

#!/usr/bin/env python

# -*- coding: utf-8 -*-

from elasticsearch.helpers import bulk

from elasticsearch import Elasticsearch

ES_HOSTS = [{'host': 'localhost', 'port': 9200}, ]

ES = Elasticsearch(hosts=ES_HOSTS)

INDEX = 'test_article'

TYPE = 'article'

ARTICLE = {

    'properties': {

        'id': {

            'type': 'integer',

            'index': 'not_analyzed'

        },

        'author': {

            'type': 'text',

        },

        'author_completion': {

            'type': 'completion',

            'contexts': [  # 这里是关键所在

                {

                    'name': 'removed_tab',

                    'type': 'category',

                    'path': 'removed'

                }

            ]

        },

        'removed': {

            'type': 'keyword',

        }

    }

}

MAPPINGS = {

    'mappings': {

        'article': ARTICLE,

    }

}

def create_index():

    """

    插入数据前创建对应的index

    """

    ES.indices.delete(index=INDEX, ignore=404)

    ES.indices.create(index=INDEX, body=MAPPINGS)

def insert_data():

    """

    添加测试数据

    :return:

    """

    test_datas = [

        {

            'id': 1,

            'author': 'tom',

            'author_completion': 'tom',

            'removed': 'False'

        },

        {

            'id': 2,

            'author': 'tom_cat',

            'author_completion': 'tom_cat',

            'removed': 'True'

        },

        {

            'id': 3,

            'author': 'kitty',

            'author_completion': 'kitty',

            'removed': 'False'

        },

        {

            'id': 4,

            'author': 'tomato',

            'author_completion': 'tomato',

            'removed': 'False'

        },

    ]

    bulk_data = []

    for data in test_datas:

        action = {

            '_index': INDEX,

            '_type': TYPE,

            '_id': data.get('id'),

            '_source': data

        }

        bulk_data.append(action)

    success, failed = bulk(client=ES, actions=bulk_data, stats_only=True)

    print('success', success, 'failed', failed)

if __name__ == '__main__':

    create_index()

    insert_data()

　　mission success。看看表结构ok

接下来就是获取补全建议

def get_suggestions(keywords):

    body = {

        'size': 0,

        '_source': 'suggest',

        'suggest': {

            'author_prefix_suggest': {

                'prefix': keywords,

                'completion': {

                    'field': 'author_completion',

                    'size': 10,

                    'contexts': {

                        'removed_tab': ['False', ]  # 筛选removed为'False'的补全，contexts不能包含多个tab，如加上一个'state_tab':['1',]的话contexts将失效

                    }

                }

            }

        },

    }

    suggest_data = ES.search(index=INDEX, doc_type=TYPE, body=body)

    return suggest_data

if __name__ == '__main__':

    # create_index()

    # insert_data()

    suggestions = get_suggestions('t')

    print(suggestions)

    """

    suggestions = {

        'took': 0,

        'timed_out': False,

        '_shards': {

            'total': 5,

            'successful': 5,

            'skipped': 0, 'failed': 0

        },

        'hits': {

            'total': 0,

            'max_score': 0.0,

            'hits': []

        },

        'suggest': {

            'author_prefix_suggest': [

                {'text': 't', 'offset': 0, 'length': 1, 'options': [

                    {'text': 'tom', '_index': 'test_article', '_type': 'article', '_id': '1', '_score': 1.0,

                     '_source': {},

                     'contexts': {'removed_tab': ['False']}},

                    {'text': 'tomato', '_index': 'test_article', '_type': 'article', '_id': '4', '_score': 1.0,

                     '_source': {},

                     'contexts': {'removed_tab': ['False']}}]}]}}

        """

　　发现，removed为'True'的tom_cat被筛选掉了，大功告成！

elasticsearch补全功能之只补全筛选后的部分数据context suggester的更多相关文章

CentOS 5 全功能WWW服务器搭建全教程 V3.0
http://hx100.blog.51cto.com/44326/339949/ 一.基本系统安装1.下载CentOS 5我是下载的DVD版本,大家也可以下载服务器CD安装版本,其实都差不多.大家可 ...
Eclipse自动补全功能轻松设置 || 不需要修改编辑任何文件
本文介绍如何设置Eclipse代码自动补全功能.轻松实现输入任意字母均可出现代码补全提示框. Eclipse代码自动补全功能默认只包括点"." ,即只有输入”." ...
Eclipse使用技巧 - 2. Eclipse自动补全功能轻松设置
本文介绍如何设置Eclipse代码自动补全功能.轻松实现输入任意字母均可出现代码补全提示框. Eclipse代码自动补全功能默认只包括点”.” ,即只有输入”.”后才出现自动补全的提示框.想要自动补 ...
转：Eclipse自动补全功能轻松设置
Eclipse自动补全功能轻松设置 || 不需要修改编辑任何文件 2012-03-08 21:29:02| 分类: Java | 标签:eclipse 自动补全设置 |举报|字号订阅 ...
vim基础学习之自动补全功能
本章我们学习自动补全功能1.自动补全优先从当前的编辑区获得补全列表例如:我们写下如下内容 aaaaa aabbb aaab 当我们再次输入aa,然后我们按下Tab的时候,会弹出一个包含 aaaaa a ...
第三百六十八节，Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)用Django实现搜索的自动补全功能
第三百六十八节,Python分布式爬虫打造搜索引擎Scrapy精讲—用Django实现搜索的自动补全功能 elasticsearch(搜索引擎)提供了自动补全接口官方说明:https://www.e ...
四十七 Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)用Django实现搜索的自动补全功能
elasticsearch(搜索引擎)提供了自动补全接口官方说明:https://www.elastic.co/guide/en/elasticsearch/reference/current/se ...
java整合Elasticsearch,实现crud以及高级查询的分页,范围,排序功能,泰文分词器的使用,分组,最大,最小,平均值,以及自动补全功能
//为index创建mapping,index相当于mysql的数据库,数据库里的表也要给各个字段创建类型,所以index也要给字段事先设置好类型: 使用postMan或者其他工具创建:(此处我使用p ...
linux命令补全忘记命令只记得开头
linux的shell不仅提供上下箭头来翻阅历史使用过的命令,还提供命令补全功能. 例如,你想创建一个文件夹,只记得是m开头的命令,此时可以: ①输入m ②按键盘上的Tab键两次 (有可能还出现这句话 ...

随机推荐

RomUtil【Android判断手机ROM，用于判断手机机型】
参考资料 Android判断手机ROM 正文有时候需要判断手机系统的ROM,检测ROM是MIUI.EMUI还是Flyme,可以使用getprop命令,去系统build.prop文件查找是否有对应属性 ...
PhotoPickerNewDemo【PhotoPicker0.9.12的个性化修改以及使用（内部glide版本号是4.1.1）】
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 前言本Demo使用的是PhotoPicker 0.9.12版本,里面集成的glide版本号是4.1.1.这里就不进行特殊的个性化处理了( ...
SpringBoot进阶教程(二十三)Linux部署Quartz
在之前的一篇文章中<SpringBoot(九)定时任务Schedule>,已经详细介绍了关于schedule框架的配置和使用,有收到一些朋友关于部署的私信,所以抽时间整理一个linux部署 ...
Redis 概念以及底层数据结构
Redis 简介 REmote DIctionary Server(Redis) 是一个由SalvatoreSanfilippo写的key-value存储系统. Redis是一个开源的使用ANSI C ...
104 - kube-scheduler源码分析 - predicate整体流程
(注:从微信公众:CloudGeek复制过来,格式略微错乱,更好阅读体验请移步公众号,二维码在文末) 今天我们来跟一下predicates的整个过程:predicate这个词应该是“断言.断定”的意思 ...
DSAPI.网络.网卡信息属性表
DSAPI.网络.网卡信息属性表其中,带有ReadOnly的属性只可读不可改,不带ReadOnly的属性即可读也可直接修改,如IP地址,Mac地址等丢弃接收数据包数: 0 丢弃发送数据包数: 0 ...
Servlet_note
2015/8/24 Web项目目录结构:总目录my,中有WEB-INF目录,中有classes.lib两目录和web.xml文件.classes保存编译好的java文件,lib保存库文件,web.xm ...
在CentOS上安装owncloud企业私有云过程
## ownclud是什么? ## >ownCloud 是一个开源免费专业的私有云存储项目,它能帮你快速在个人电脑或服务器上架设一套专属的私有云文件同步网盘,可以像 Dropbox 那样实现文件 ...
C# 以函数Action/Func/Task作为方法参数
以Action.Func.Task作为方法参数,mark一下以Action为参数 public void TestAction() { //Action参数 ExecuteFunction(() = ...
Java笔记(day9~day10)
继承: 好处:1.提高代码复用性: 2.让类之间产生关系,给多态提供了前提: 父类.子类 Java中支持单继承,不直接支持多继承,但对C++的多继承进行了改良单继承:一个子类只能有一个直接复类 ...

elasticsearch补全功能之只补全筛选后的部分数据context suggester

elasticsearch补全功能之只补全筛选后的部分数据context suggester的更多相关文章

随机推荐

热门专题