elasticsearch基本使用

elasticsearch 是java对lucence的封装，所以需要事先安装java。

它适用于全文索引，便捷的分布式，主要原理就是倒排索引。一般搜索某个关键字，是通过在一篇篇文章中查找这个关键字，而elasticsearch是存储的时候就将需要索引的内容进行分词，形成多个标签，查找时直接在标签索引中查找匹配的标签，再把标签对应的文章显示出来。来优化搜索效率。

安装

由于java是跨平台的，所以elasticsearch也是跨平台的。在linux中，下载，解压，运行即可

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.tar.gz

tar -xvf elasticsearch-6.3.2.tar.gz

cd elasticsearch-6.3.2/bin

./elasticsearch

在windows上，下载window对应的包， https://www.elastic.co/downloads/elasticsearch。解压，运行 bin\elasticsearch.bat。双击elasticsearch.bat文件也可。

在浏览器中输入localhost:9200/就可以看elasticsearch的版本/节点等信息。

elasticsearch 6.3.1 ,python与之对接的工具包为elasticsearch ,elasticsearch_dsl ,适用pip install即可。

elasticsearch的可视化插件Head.安装https://www.cnblogs.com/hts-technology/p/8477258.html

需先安装 node.js和grunt,

node下载适用windows的.msi即可https://nodejs.org/en/download/，已安装的使用 node -v可查看node的版本。

在node命令行界面，使用 npm install -g grunt-cli 即可，使用grunt -version 查看版本号

修改elasticsearch的config/elasticsearch.yml文件：

在末尾添加

http.cors.enabled: true

http.cors.allow-origin: "*"

node.master: true

node.data: true

然后network.host:192.168.0.1的注释，改为network.host:0.0.0.0 ;去掉cluster.name;node.name;http.port的注释。

下载head包https://github.com/mobz/elasticsearch-head，可clone，也可下载zip.

然后修改elasticsearch-head-master文件中的Gruntfile.js,设置hostname:'*' :

connect:{

　　server:{

　　　　options:{

　　　　　　hostname:'*',

　　　　　　port:9100,

　　　　　　base:'.',

　　　　　　keepalive:true

　　　　}}}

然后在node命令行下切换到目录elasticsearch-head-master ,安装 head,使用命令 npm install ,完成后运行 grunt server启动head. 在浏览器中输入localhost:9100就可以看到界面了。

elasticsearch对于非java语言，可使用rest API来对数据进行增删改查，设置。

对于elasticsearch中的数据结构有概念 index,type,id 可以大概理解为 index相当于database,type相当于table,id相当于索引。在新版中，一个index只允许包含一个type.

通过rest API常用操作的语法：

注，在windows上使用curl 需要添加一些额外的参数，url要用双引号，添加数据时要添加参数声明数据类型 -H "Content-Type:application/json" ,对于数据中的key:value都需要用三个双引号引起来。请求方法前添加-X参数，另外key和value中似乎不能有空格。

查看当前节点的所有index

curl -X GET "http://localhost：9200/_cat/indices?v"

查看所有index中各个字段的类型：

curl 'localhost:9200/_mapping?pretty=true'

新建weather索引：

curl -X PUT "localhost:9200/weather"

删除weather索引：

curl -X DELETE "localhost:9200/weather"

给index新增数据：

curl -X PUT "localhost:9200/my_index/my_type/my_id" -d '{"te":"test","ta":"data"}' put请求再针对已存在的记录则是更新该记录

也可使用POST新增，不指定id,随机生成id : curl -X POST "localhost:9200/my_index/my_type" -d '{"te":"test","ta":"data"}'

查看指定记录

curl -X GET "localhost:9200/my_index/my_type/1?pretty" pretty表示以易读的格式返回数据，found字段表示查询成功，_source字段返回原始记录

搜索

curl "localhost:9200/my_index/my_type/_search?q=name:somekey" 搜索name中包含somekey的文档

等价于 curl -XGET 'http://localhost:9200/my_index/my_type/_search?pretty' -d '{"query":{"match":{"name":"somekey"}}}'

更新products索引的字段类型 ,

方法一，将字段类型更新为多类型字段：（由于string类型的字段默认会被分词，所以聚合和排序效率很低，默认是不允许的。如果需要对string类型的

的字段进行聚合和排序，虽然可以开启一个参数，但是推荐将其设置为多类型字段，一个类型被分词用于查询，一个类型不分词keyword类型用于排序和聚合）

curl -XPUT localhost:9200/my_index/my_type/_mapping -d

'{"my_type":{"properties":{

　　　　　　　　　　　　　　"created":{"type":"multi_filed","fileds":{

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　"created":{"type":"text"},"date":{"type":"date"}

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　}

这里为create字段设置了两种类型，当需要搜索时使用created,需要排序时使用date

方法二，使用reindex,该方法是用于将一个索引中的数据（全集或子集），以索引2的格式复制到索引2中。（如果索引2不存在，则新建的索引2会是索引1的副本，如果存在，则以索引2的格式存储原索引中的数据），复制完后删除原索引。

products中原category的数据类型为text : {"mappings":{"doc":{"properties":{"category":{"type":"text"},...}}}}

创建调整字段类型后的索引 PUT localhost:9200/products_adjust -d

"mappings" : {

      "doc" : {

        "properties" : {

"category" : {

            "type" : "text",

            "fields" : {

              "keyword" : {

                "type" : "keyword",

                "ignore_above" : 256

              }

            },

"img_url" : { "type" : "keyword" }, 
"name" : { "type" : "text", "analyzer" : "ik_smart" }
 } } }

然后使用reindex ,copy数据到索引products_adjust,

POST localhost:9200/_reindex -d {"source"{"index":products"},"dest":{"index":"products_adjust"}}

最后删除原索引即可。

方法三。开始的时候就使用索引的别名，类似数据库的视图。后期需要更改数据结构，改变别名指向的索引即可。

创建索引别名

curl -XPOST localhost:9200/_aliases -d '{"actions":[{"add":{"alias":"my_index","index":"my_index_v1"}}]}'

更新别名指向的索引

curl -XPOST localhost:9200/_aliases -d '{"actions":[{"remove":{"alias":"my_index","index":"my_index_v1"}},,{"add":{"alias":"my_index","index":"my_index_v2"}}]}'

删除旧索引

curl -XDELETE localhost:9200/my_index_v1

在django中使用elasticseach

elasticsearch_dsl的使用示例 https://github.com/elastic/elasticsearch-dsl-py/tree/master/examples

新建elasticsearch索引模型文件es_docs.py

from ealsticsearch_dsl import Document,Date,Long,Keyword,Float,Text,connections

class ESProduct(Document):

　　name=Text(analyzer="ik_smart",fields={'keyword':Keyword()})

　　description=Text()

　　price=Float()

　　category=Text(fields={'cate':Keyword()})

　　tags=Keyword(multi=True)

　　class Index:

　　　　name='products'

　　　　settings={

　　　　"number_of_shards":2,}

if__name__=='__main__':

　　connections.create_connection(hosts=['localhost'])

　　ESProduct.init()

ESProduct.init()的作用是在elasticsearch中创建索引

创建导入数据的命令在app目录下创建 app\management\commands\index_all_data.py 用于将数据库中数据导入elasticsearch；

import elasticsearch_dsl

from django.core.management import BaseCommand

from main.models import Product

from main.es_docs import ESProduct

class Command(BaseCommand):

　　help="Index all data to Elasticsearch"

　　def handle(self,*args,**options):

　　　　elasticsearch_dsl.connections.create_connection()

　　　　for product in Product.objects.all():

　　　　　　esp=ESProduct(meta={'id':product.pk},name=product.name,description=product.description,price=product.price,category=product.category.name)

　　　　　　for tag in product.tags.all():

　　　　　　　　esp.tags.append(tag.name)

　　　　　　esp.save()

这样在项目根目录执行 python manage.py index_all_data则会将数据库中数据写入到elasticsearch中。

from elasticsearch import Elasticsearch

from ealsticsearch_dsl import Search ,connections

创建连接时，可以使用

client=Elasticsearch()

s=Search(using=client,index="decorates")

也可以使用：

connections.create_connection(host=['localhost'])

s=Search(index="decorates")

在views.py中使用elasticsearch查询；

import random

from django.urls import reverse

from django.shortcuts import render

from django.views.generic import View

from elasticsearch_dsl import Search ,connections ,Q

from main.forms import SearchForm

import logging

logger=logging.getLogger("django.main")

class HomeView(View):

	def get(self,request):

		form=SearchForm(request.GET)

		logger.debug("form: %s",form )

		ctx={

		"form":form

		}

		if form.is_valid():

			connections.create_connection(hosts=["localhost"])

			name_query=form.cleaned_data["name"]

			if name_query:

				s=Search(index="products").query("match",name=name_query)

			else:

				s=Search(index="products")

			min_price=form.cleaned_data.get("min_price")

			max_price=form.cleaned_data.get("max_price")

			if min_price is not None or max_price is not None:

				price_q={'range':{"price":{}}}

				if min_price is not None:

					price_q['range']['price']["gte"]=min_price

				if max_price is not None:

					price_q['range']['price']["lte"]=max_price

				s=s.query(Q(price_q))

				#Q语法就类似于原生的elasticsearch dsl的json语句

				#A（）用于聚合，a=A('terms',field='category.keyword') 等同于 {'term':{'field':'category.keyword'}} ,s.aggs.bucket('category_terms',a)

				#还可以在a上作用 metric,或再次聚合 ：a.metric('clicks_per_category','sum',field='clicks').bucket('tags_per_category','terms',field='tags')

				#等价于 {'agg':{'categories':{"terms":{"field":"category.keyword"},'aggs':{

				#'clicks_per_category':{'sum':{'field':'clicks'}},

				#'tags_per_category':{'terms':{'field':'tags;'}}

				#}}

			#添加分组（聚合）字段，aggregations,field应该是用于分组的字段（前面的“categories”，是聚合字段的别名，后面通过这个别名获取聚合的结果）

			s.aggs.bucket("categories","terms",field="category.keyword")

			#聚合的第一个参数为聚合值字段名（自定义），第二个参数值为聚合方法，第三个参数为聚合方法作用的字段

			#terms应该是计数，其他的聚合方法有 avg(数值类字段的平均值)

			if request.GET.get("category"):

				s=s.query("match",category=request.GET["category"])

				#s=s.filter('terms',category=)

			result=s.execute()

			ctx["products"]=result

			#聚合结果和数据是分开的，查询结果集在hits中，聚合结果在aggregations中，要获取聚合的数据可以通过buckets获得

			category_aggregations=list()

			for bucket in result.aggregations.categories.buckets:

				category_name=bucket.key

				doc_count=bucket.doc_count

				category_url_params=request.GET.copy()

				category_url_params["category"]=category_name

				category_url="{}?{}".format(reverse("main_home"),category_url_params.urlencode())

				category_aggregations.append({"name":category_name,"doc_count":doc_count,"url":category_url})

			ctx["category_aggs"]=category_aggregations

		if "category" in request.GET:

			remove_category_search_params=request.GET.copy()

			del remove_category_search_params["category"]

			remove_category_url="{}?{}".format(reverse("main_home"),remove_category_search_params.urlencode())

			ctx["remove_category_url"]=remove_category_url

		return render(request,"main_home.html",ctx)

Q（price_q）内的搜索语句结构为：price_q={'range':{"price":{"gte":35,"lte":"70"}}}

s=s.query(Q(price_q)) 搜索 35<=price<=70 的记录

s=s.query(Q({"match":{"site":"taobao"}})) (1)

s=s.query(Q({"match":{"goods_class":"clothes"}})) (2)

(1)和（2）中的两个搜索条件网站中包含taobao，商品类别中包含clothes 是且的关系，满足（1）且满足（2)

聚合

site_agg=A({"terms":{"field":"site"}})

s.aggs.bucket('sites",site_agg)

第一个参数为聚合字段的别名，可用于获取聚合结果，该语句的含义就是对字段"site"做分组，聚合后的字段数据名为‘sites'

调用聚合的结果

for bucket in result.aggregations.sites.buckets:
　　site_name=bucket.key
　　doc_count=bucket.doc_count

elasticsearch的搜索有两种 query和filter .

query：不仅要对匹配的结果进行检索，还要对结果的匹配度进行打分，然后按匹配度排序返回结果

filter：只需筛选出符合的结果

对于查询返回的结果

result=s.execute()

可以通过循环获取

for hit in result:

　　print(hit.name,hit.price,hit.category)

result.hits.total可以获取结果条数

https://elasticsearch-dsl.readthedocs.io/en/latest/

对查询做分页的from ,size设置

s=s[10,34]

这样相当于 from=10,size=14

排序

s=s.sort('name','-price') 这样会对结果以name的升序，price的倒序排序。

s = Search().sort(

    'category',

    '-title',

    {"price" : {"order" : "asc", "mode" : "avg"}}

)
will sort by category, title (in descending order) and price in ascending order using the avg mode

elasticsearch基本使用的更多相关文章

Elasticsearch之java的基本操作一
摘要接触ElasticSearch已经有一段了.在这期间,遇到很多问题,但在最后自己的不断探索下解决了这些问题.看到网上或多或少的都有一些介绍ElasticSearch相关知识的文档,但个人觉得 ...
Elasticsearch 5.0 中term 查询和match 查询的认识
Elasticsearch 5.0 关于term query和match query的认识一.基本情况前言:term query和match query牵扯的东西比较多,例如分词器.mapping ...
以bank account 数据为例，认识elasticsearch query 和 filter
Elasticsearch 查询语言(Query DSL)认识(一) 一.基本认识查询子句的行为取决于 query context filter context 也就是执行的是查询(query)还是 ...
Ubuntu 14.04中Elasticsearch集群配置
Ubuntu 14.04中Elasticsearch集群配置前言:本文可用于elasticsearch集群搭建参考.细分为elasticsearch.yml配置和系统配置达到的目的:各台机器配置成 ...
ElasticSearch 5学习(10)——结构化查询（包括新特性）
之前我们所有的查询都属于命令行查询,但是不利于复杂的查询,而且一般在项目开发中不使用命令行查询方式,只有在调试测试时使用简单命令行查询,但是,如果想要善用搜索,我们必须使用请求体查询(request ...
ElasticSearch 5学习(9)——映射和分析（string类型废弃）
在ElasticSearch中,存入文档的内容类似于传统数据每个字段一样,都会有一个指定的属性,为了能够把日期字段处理成日期,把数字字段处理成数字,把字符串字段处理成字符串值,Elasticsearc ...
.net Elasticsearch 学习入门笔记
一. es安装相关1.elasticsearch安装运行http://localhost:9200/2.head插件3.bigdesk插件安装(安装细节百度:windows elasticsear ...
自己写的数据交换工具——从Oracle到Elasticsearch
先说说需求的背景,由于业务数据都在Oracle数据库中,想要对它进行数据的分析会非常非常慢,用传统的数据仓库-->数据集市这种方式,集市层表会非常大,查询的时候如果再做一些group的操作,一个 ...
如何在Elasticsearch中安装中文分词器(IK+pinyin)
如果直接使用Elasticsearch的朋友在处理中文内容的搜索时,肯定会遇到很尴尬的问题--中文词语被分成了一个一个的汉字,当用Kibana作图的时候,按照term来分组,结果一个汉字被分成了一组. ...
jar hell & elasticsearch ik 版本问题
想给es 安装一个ik 的插件, 我的es 是 2.4.0, 下载了一个版本是 1.9.5, [2016-10-09 16:56:26,248][INFO ][node ] [node-2] init ...

随机推荐

浅谈RESTful
浅谈RESTful 什么是RESTful? REST全称是Representational State Transfer,中文意思是表述(编者注:通常译为表征)性状态转移. 它首次出现在2000年Ro ...
Bootstrap3基础栅格系统 col-md-push/pull 向左、右的浮动偏移
内容参数 OS Windows 10 x64 browser Firefox 65.0.2 framework Bootstrap 3.3.7 editor ...
Bootstrap各种进度条的实例讲解
本章将讲解 Bootstrap 进度条.在本教程中,您将看到如何使用bootstrap教程.重定向或动作状态的进度条. Bootstrap 进度条使用 CSS3 过渡和动画来获得该效果.Interne ...
Linux 搭建DNS
Linux 搭建DNS 使用yum源安装 yum -y install bind* 修改主配置文件 [root@localhost ~]# cp /etc/named.conf /etc/named. ...
HTML5外包注意事项-开发HTML5游戏的九大坑与解决方法剖析
随着移动社区兴起,势必带动HTML5的革命.未来一两年内,HTML5移动游戏必将呈现大爆发趋势. 以下是整理的HTML5游戏研发.市场趋势以及渠道布局和技术解决方案的内容.希望大家能从本文中找到对HT ...
《Spring技术内幕》读书笔记
简介: 1.spring 与unix.window这些操作在计算机系统中起到的作用是类似的 2.两大核心模块:IOC\AOP 3.为应用开发提供了许多现成的系统组件:事务处理.Web MV.JDBC. ...
linux下的启停脚本
linux下的根据项目名称,进行进程的启停脚本 #!/bin/bash JAVA=/usr/bin/java APP_HOME=/opt/program/qa/wechat APP_NAME=prog ...
HDU 6521 Party
6521 思路: 线段树玄学剪枝, 俗称吉司机线段树. 代码: #pragma GCC optimize(2) #pragma GCC optimize(3) #pragma GCC optimize ...
python kline
# -*- coding: utf-8 -*- # Qt相关和十字光标 from qtpy.QtGui import * from qtpy.QtCore import * from qtpy imp ...
《R语言入门与实践》第一章:R基础
前言本章介绍了 R 语言的基础知识界面: 使用命令 “ R “进行命令行的实时编译对象定义: 用于储存数据的,设定一个名称格式: a <- 1:6 命名规则: 规则1:不能以数字开头规 ...

elasticsearch基本使用

elasticsearch基本使用的更多相关文章

随机推荐

热门专题