Python对elasticsearch的CRUD

一.官网提供的Elasticsearch的Python接口包

　　1.github地址：https://github.com/elastic/elasticsearch-dsl-py

　　2.安装：pip install elasticsearch-dsl

　　3.有很多api，使用可参考github中的文档

二.定义写入es的Pipeline：

　　1.生成索引，type及映射：

　　　　有可能会报IllegalOperation异常，访问本地9200端口查看es版本，然后将python中的elasticsearch和elasticsearch-dsl改成相近版本即可

# _*_ encoding:utf-8 _*_

__author__ = 'LYQ'

__date__ = '2018/10/29 11:02'

#新版本把DocType改为Docment

from datetime import datetime

from elasticsearch_dsl import DocType,Date, Nested, Boolean, \

    analyzer, Completion, Keyword, Text, Integer

from elasticsearch_dsl.connections import connections

# es连接到本地，可以连接到多台服务器

connections.create_connection(hosts=["localhost"])

class ArticleType(DocType):

    "定义es映射"

    # 以ik解析

    title = Text(analyzer="ik_max_word")

    create_date = Date()

    # 不分析

    url = Keyword()

    url_object_id = Keyword()

    front_image_url = Keyword()

    front_image_path = Keyword()

    praise_nums = Integer()

    fav_nums = Integer()

    comment_nums = Integer()

    tags = Text(analyzer="ik_max_word")

    content = Text(analyzer="ik_max_word")

    class Meta:

        #定义索引和type

        index = "jobbole"

        doc_type = "artitle"

if __name__ == "__main__":

#调用init()方法便能生成相应所应和映射

    ArticleType.init()

　　2.创建相应item：

#导入定义的es映射

from models.es import ArticleType

from w3lib.html import remove_tags

class ElasticsearchPipeline(object):

    """

    数据写入elasticsearch，定义pipeline，记得配置进setting

    """

class ElasticsearchPipeline(object):

    """

    数据写入elasticsearch

    """

class ElasticsearchPipeline(object):

    """

    数据写入elasticsearch

    """

    def process_item(self, item, spider):

        #将定义的elasticsearch映射实列化

        articletype=ArticleType()

        articletype.title= item["title"]

        articletype.create_date = item["create_date"]

        articletype.url = item["url"]

        articletype.front_image_url = item["front_image_url"]

        if "front_image_path" in item:

            articletype.front_image_path = item["front_image_path"]

        articletype.praise_nums = item["praise_nums"]

        articletype.fav_nums = item["fav_nums"]

        articletype.comment_nums = item["comment_nums"]

        articletype.tags = item["tags"]

        articletype.content = remove_tags(item["content"])

        articletype.meta.id = item["url_object_id"]

        articletype.save()

        return item

查看9100端口，数据插入成功

class JobboleArticleSpider(scrapy.Item):

    ......

    def save_to_es(self):

          "在item中分别定义存入es，方便不同的字段的保存"

        articletype = ArticleType()

        articletype.title = self["title"]

        articletype.create_date = self["create_date"]

        articletype.url = self["url"]

        articletype.front_image_url = self["front_image_url"]

        if "front_image_path" in self:

            articletype.front_image_path = self["front_image_path"]

        articletype.praise_nums = self["praise_nums"]

        articletype.fav_nums = self["fav_nums"]

        articletype.comment_nums = self["comment_nums"]

        articletype.tags = self["tags"]

        articletype.content = remove_tags(self["content"])

        articletype.meta.id = self["url_object_id"]

        articletype.save()

class ElasticsearchPipeline(object):

    """

    数据写入elasticsearch

    """

    def process_item(self, item, spider):

        # 将定义的elasticsearch映射实列化

       #调用item中的方法

        item.save_to_es()

        return item

三.搜索建议：

　　实质调用anylyer接口如下：

GET _analyze

{

  "analyzer": "ik_max_word",

  "text"    : "Python网络基础学习"

}

　　es文件中：

from elasticsearch_dsl.analysis import CustomAnalyzer as _CustomAnalyzer

esc=connections.create_connection(ArticleType._doc_type.using)

class Customanalyzer(_CustomAnalyzer):

    """自定义analyser"""

    def get_analysis_definition(self):

        # 重写该函数返回空字典

        return {}

ik_analyser = Customanalyzer("ik_max_word", filter=["lowercase"])

class ArticleType(DocType):

    "定义es映射"

    suggest = Completion(analyzer=ik_analyser)

   ......

生成该字段的信息

　　　2.item文件：

......

from models.es import esc

def get_suggest(index, info_tuple):

    """根据字符串和权重生成搜索建议数组"""

    used_words = set()

    suggests = []

    for text, weight in info_tuple:

        if text:

            # 调用es得analyer接口分析字符串

            # 返回解析后得分词数据

            words = esc.indices.analyze(index=index, analyer="ik_max_word", params={"filter": ["lowercase"]}, body=text)

            # 生成式过滤掉长度为1的

            anylyzed_words = set([r["token"] for r in words if len(r) > 1])

            # 去重

            new_words = anylyzed_words - used_words

        else:

            new_words = set()

        if new_words:

            suggests.append({"input": list(new_words), "weight": weight})
　　return suggests

class JobboleArticleSpider(scrapy.Item):

   ......

    def save_to_es(self):

        articletype = ArticleType()

        .......# 生成搜索建议字段，以及字符串和权重

        articletype.suggest = get_suggest(ArticleType._doc_type.index,((articletype.title,1),(articletype.tags,7)) )

        articletype.save()

Python对elasticsearch的CRUD的更多相关文章

python实现elasticsearch操作-CRUD API
python操作elasticsearch常用API 目录目录 python操作elasticsearch常用API1.基础2.常见增删改操作创建更新删除3.查询操作查询拓展类实现es的CRUD操作 ...
ElasticSearch第二步-CRUD之Sense
ElasticSearch系列学习 ElasticSearch第一步-环境配置 ElasticSearch第二步-CRUD之Sense ElasticSearch第三步-中文分词 ElasticSea ...
Python 操作 ElasticSearch
Python 操作 ElasticSearch 学习了:https://www.cnblogs.com/shaosks/p/7592229.html 官网:https://elasticsearch- ...
Python 和 Elasticsearch 构建简易搜索
Python 和 Elasticsearch 构建简易搜索作者:白宁超 2019年5月24日17:22:41 导读:件开发最大的麻烦事之一就是环境配置,操作系统设置,各种库和组件的安装.只有它们都正 ...
Python操作ElasticSearch
Python批量向ElasticSearch插入数据 Python 2的多进程不能序列化类方法, 所以改为函数的形式. 直接上代码: #!/usr/bin/python # -*- coding:ut ...
笔记13：Python 和 Elasticsearch 构建简易搜索
Python 和 Elasticsearch 构建简易搜索 1 ES基本介绍概念介绍 Elasticsearch是一个基于Lucene库的搜索引擎.它提供了一个分布式.支持多租户的全文搜索引擎,它可 ...
Python中elasticsearch插入和更新数据的实现方法
Python中elasticsearch插入和更新数据的实现方法这篇文章主要介绍了Python中elasticsearch插入和更新数据的实现方法,需要的朋友可以参考下首先,我的索引结构是酱紫的. ...
python操作Elasticsearch (一、例子)
E lasticsearch是一款分布式搜索引擎,支持在大数据环境中进行实时数据分析.它基于Apache Lucene文本搜索引擎,内部功能通过ReST API暴露给外部.除了通过HTTP直接访问El ...
Elasticsearch的CRUD：REST与Java API
CRUD(Create, Retrieve, Update, Delete)是数据库系统的四种基本操作,分别表示创建.查询.更改.删除,俗称"增删改查".Elasticsearch ...

随机推荐

UML各种图
UML(Unified Modeling Language)是一种统一建模语言,为面向对象开发系统的产品进行说明.可视化.和编制文档的一种标准语言.下面将对UML的九种图+包图的基本概念进行介绍以及各 ...
CSAPP：信息的表和处理1
CSAPP:信息的表和处理1 关键点:寻址.内存.磁盘.虚拟地址.物理地址.整型数组. 信息存储中的几个概念整型数据类型无符号数有符号数几个概念有符号数与无符号数之间转换基于栈与基于寄存器的区别信息 ...
[SHOI2015]自动刷题机
嘟嘟嘟这题就比较水了,毕竟只评了个蓝. 想一下发现满足单调性,所以可以二分找最大值. 但是最小值怎么办?刚开始我很zz的以为只要把判断条件从大于等于改成小于等于就行了,后来发现根本不对. 想了想因为 ...
008_Node中的require和import
一.js的对象的解构赋值 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuri ...
app前端代码打包步骤
一.搭建项目环境 1.安装node.js 在网上找到nodejs压缩包,下载解压后安装node-v8.9.3-x64.msi文件. 安装完毕后,在windows的cmd控制台输入node -v或nod ...
Project Tungsten：让Spark将硬件性能压榨到极限（转载）
在之前的博文中,我们回顾和总结了2014年Spark在性能提升上所做的努力.本篇博文中,我们将为你介绍性能提升的下一阶段——Tungsten.在2014年,我们目睹了Spark缔造大规模排序的新世界纪 ...
flink1.7 checkpoint源码分析
初始化state类 //org.apache.flink.streaming.runtime.tasks.StreamTask#initializeState initializeState(); p ...
.NetCore 2.1中的HttpClientFactory最佳实践
.NET Core 2.1中的HttpClientFactory最佳实践 ASP.NET Core 2.1中出现一个新的HttpClientFactory功能, 它有助于解决开发人员在使用HttpCl ...
微信小程序页面跳转方法总结
微信小程序页面跳转目前有以下方法(不全面的欢迎补充): 1. 利用小程序提供的 API 跳转: // 保留当前页面,跳转到应用内的某个页面,使用wx.navigateBack可以返回到原页面.// 注 ...
C#.NET 大型通用信息化系统集成快速开发平台 4.1 版本 - 严格的用户账户审核功能
整个集团有几万个用户,一个个用户添加是不现实的,只有每个公司的系统管理员添加.或者用户申请帐户,然后有相应的管理员审核,才会更准确一些. 每个公司.分公司.部门的账户情况只有所在公司的管理员是最清楚的 ...

Python对elasticsearch的CRUD

一.官网提供的Elasticsearch的Python接口包

1.github地址：https://github.com/elastic/elasticsearch-dsl-py

2.安装：pip install elasticsearch-dsl

3.有很多api，使用可参考github中的文档

二.定义写入es的Pipeline：

1.生成索引，type及映射：

2.创建相应item：

三.搜索建议：

实质调用anylyer接口如下：

es文件中：

2.item文件：

Python对elasticsearch的CRUD的更多相关文章

随机推荐

热门专题

　　1.github地址：https://github.com/elastic/elasticsearch-dsl-py

　　2.安装：pip install elasticsearch-dsl

　　3.有很多api，使用可参考github中的文档

　　1.生成索引，type及映射：

　　2.创建相应item：

　　实质调用anylyer接口如下：

　　es文件中：

　　　2.item文件：