一、lucene

1.是什么

是apache提供的一套java写的用于全文检索工具包，该工具包提供了用于实现全文检索的api类,可用于实现搜索引擎功能.

2.搜索常用方法

顺序扫描法：应用于数据结构固定且数据量不大
全文检索：应用于数据结构不固定，且数据量大，比如百度，淘宝。。。
- 原理：将非结构化数据中的一部分信息提取出来，重新组织，使其变的有一定结构，然后对此数据进行搜索，从而提高效率.
- 倒排索引：我们把这部分从非结构化数据中提取出然后重新组织的信息，叫索引表，该索引表中的每一项都包含一个分词和包含该词的文章的地址。由于不是由文章来确定其分词值，而是由分词来确定文章的位置，因而称为倒排索引
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wGawWlom-1635673101258)(picture\solr搜索原理.png)]

二、solr

##1.安装

###a. 下载地址：

http://archive.apache.org/dist/lucene/solr

###b.解压启动

解压 tar -zxvf solr-7.7.2_linux.tgz -C /opt/module/

bin/solr start -force—force作用是可用root用户启动而不告警

./solr restart -force

windows: bin/solr.cmd start

bin/solr stop----关闭

bin/solr restart----重启

如果不希望看到告警，可修改配置文件

vim bin/solr.in.sh

SOLR_ULIMIT_CHECKS=false
vim /etc/security/limits.conf文件尾部添加下面两行
```
* hard nofile 65000

* soft nofile 65000
```

###c.测试

在window浏览器中输入http://192.168.188.100:8983/solr

##2. solr core

一个solr core是包含索引和配置文件和数据的运行实例,多个solr实例就是solr collection

如何创建solr core

linux:

bin/solr create -c first -force----在server/solr目录中创建first文件夹

windows:

solr.cmd -c first

##3. 中文分词器

###a.拷贝分词器

cp /opt/module/solr-7.7.2/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-7.7.2.jar /opt/module/solr-7.7.2/server/solr-webapp/webapp/WEB-INF/lib/

###b.修改core的配置

vim /opt/module/solr-7.7.2/server/solr/first/conf/managed-schema

添加

<fieldType name="text_hmm_chinese" class="solr.TextField" positionIncrementGap="100">

        <analyzer type="index">

            <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>

        </analyzer>

        <analyzer type="query">

            <tokenizer class="org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory"/>

        </analyzer>

</fieldType>

###c. 重启solr服务

#三、ElasticSearch

1.es介绍

是一个使用java,且基于lucene技术开发的搜索引擎框架，提供了一个统一的基于restful风格的api接口

2. es与solr区别(了解)

1.ElasticSearch新, 注重于核心功能，高级功能多有第三方插件提供；Solr已经存在了更长的时间，更稳定，功能多
2.Solr支持更多格式的数据，比如JSON、XML、CSV，而Elasticsearch仅支持json文件格式。
3.Solr 利用 Zookeeper 进行分布式管理，而 Elasticsearch 自身带有分布式协调管理功能。Elasticsearch专为云设计，是分布式首选。
4.当单纯的对已有数据进行搜索时，Solr更快。当实时建立索引时, Solr会产生io阻塞，查询性能较差, Elasticsearch具有明显的优势。随着数据量的增加，Solr的搜索效率会变得更低，而Elasticsearch却没有明显的变化。

3.安装

官网下载：
https://www.elastic.co/products/elasticsearch

https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.12.1-linux-x86_64.tar.gz
解压
tar -zxvf /opt/software/java/elasticsearch-7.12.1-linux-x86_64.tar.gz -C /opt/module/
修改jdk

运行该版本运行需要jdk11，需要把关于jdk的三个环境变量注释掉

vim /etc/profile
#JAVA_HOME=/opt/module/jdk1.8.0_144

#CLASSPATH=.:$JAVA_HOME/lib/tools.jar

#PATH=$JAVA_HOME/bin:$PATH

#export JAVA_HOME CLASSPATH PATH
es默认只允许本机才能访问,解决如下：

修改配置文件vim /opt/module/elasticsearch-7.12.1/config/elasticsearch.yml 在文件尾部添加下面几行配置
network.host: 0.0.0.0

http.port: 9200

cluster.initial_master_nodes: ["node-1"]

node.name: node-1

bootstrap.system_call_filter: false

#centos6内核版本为2.6。而Elasticsearch的插件要求至少3.5以上版本,禁用这个插件
打开文件数量

vim /etc/security/limits.conf 添加下面几行配置
es soft nofile 65535

es hard nofile 65535

es  soft nproc  4096

es  hard nproc  4096
虚拟内存修改

vim /etc/sysctl.conf ,添加下面这个配置
#限制一个进程可以拥有虚拟内存的大小

vm.max_map_count=262144
使上面的修改立刻生效：

sysctl -p
在es5之后都不能使用root帐户运行，要创建帐户
adduser es

passwd es

两次输入密码

chown -R es /opt/module/elasticsearch-7.12.1/   修改es文件夹拥有者为es用户

su es  //切换到es用户

在es安装目录  cd /opt/module/elasticsearch-7.12.1  中运行下面命令

bin/elasticsearch  //启动elasticsearch
测试： 192.168.188.100:9200/

##4.postMan下使用es

postman下载安装
```
https://www.postman.com/downloads/
```

维护索引

创建索引:postman中发送put请求,这里必须使用put

http://192.168.188.100:9200/first

查询索引：get请求获取上面创建的信息

http://192.168.188.100:9200/first

查询所有索引：get获取全部信息

http://192.168.188.100:9200/_cat/indices?v

delete请求删除

http://192.168.188.100:9200/first

创建文档
1. post请求192.168.188.100:9200/first/_doc/1 ,最后的1表示id为1，如果没有则会产生默认的id值
2. 在发送请求时，向请求体body中封装json数据：
  
  body–>raw–>把文本格式从text修改为json,文本框中输入下面内容
```
{

    "name":"孙子兵法",

	"price":"34",

    "info":"兵者，国之大事，死生之地，存亡之道，不可不察也",

    "author":"孙武"

}
```

查看文档

发送get请求，body中选择none,因为get请求不支持body中添加数据

192.168.188.100:9200/first/_doc/1   查询指定文档

192.168.188.100:9200/user/_search   查询所有

修改文档
1. 整体更新：给出文档所有字段的数据，对所有字段都更新用put
  - put请求 192.168.188.100:9200/first/_doc/1
  - body–>raw–>json,文本框中添加要修改的内容
  - ```
  {
  
      "name":"三十六计",
  
  	"price":"111",
  
      "info":"瞒天过海，暗度陈仓，偷天换日",
  
      "author":"孙膑"
  
  }
```
- 执行成功后，看到"result": “updated”
2. 局部文档更新：只对文档中的部分字段更新。用post
  - post请求 192.168.188.100:9200/first/_update/1 ，注意，把doc换成update
  - body–>raw–>json,文本框中添加要修改的内容,注意json格式找成{doc:{}}
  - ```
  {
  
  	"doc":{
  
  		"name":"36鸡"
  
  	}
  
  }
```
删除文档

delete请求192.168.188.100:9200/first/_doc/1

5. Java访问ES

----不掌握，了解

###1. 索引操作

pom依赖

<dependencies>

    <dependency>

        <groupId>org.elasticsearch.client</groupId>

        <artifactId>elasticsearch-rest-high-level-client</artifactId>

        <version>7.8.0</version>

    </dependency>

    <dependency>

        <groupId>com.alibaba</groupId>

        <artifactId>fastjson</artifactId>

        <version>1.2.9</version>

    </dependency>

    <dependency>

        <groupId>org.projectlombok</groupId>

        <artifactId>lombok</artifactId>

        <version>1.18.12</version>

    </dependency>

    <dependency>

        <groupId>org.apache.logging.log4j</groupId>

        <artifactId>log4j-core</artifactId>

        <version>2.11.0</version>

    </dependency>

    <dependency>

        <groupId>junit</groupId>

        <artifactId>junit</artifactId>

        <version>4.13</version>

    </dependency>

</dependencies>

resources目录下log4j2.xml

<?xml version="1.0" encoding="UTF-8"?>

<configuration>

    <appenders>

        <!--这个输出控制台的配置-->

        <console name="Console" target="SYSTEM_OUT">

            <!--输出日志的格式-->

            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n"/>

        </console>

    </appenders>

    <loggers>

        <root level="info"><!--all会显示所有信息-->

            <appender-ref ref="Console"/>

        </root>

    </loggers>

</configuration>

测试代码

public class EsIndex {

    RestHighLevelClient cli;

    @Before

    public void init() {

        cli=new RestHighLevelClient(

                RestClient.builder(new HttpHost("192.168.188.100",9200,"http"))

        );

    }

    @After

    public void end()throws Exception{

        cli.close();

    }

    @Test//创建索引

    public void createIndex()throws Exception{

        //创建索引请求

        CreateIndexRequest req=new CreateIndexRequest("user");

        //创建索引

        CreateIndexResponse resp = cli.indices().create(req, RequestOptions.DEFAULT);

        //响应状态

        boolean ack = resp.isAcknowledged();

        System.out.println("索引操作状态："+ack);

    }

    @Test

    public void searchIndex()throws Exception{

        GetIndexRequest req = new GetIndexRequest("user");

        //创建索引

        GetIndexResponse resp = cli.indices().get(req, RequestOptions.DEFAULT);

        //响应状态

        System.out.println(resp.getAliases());

        System.out.println(resp.getSettings());

    }

    @Test

    public void deleteIndex()throws Exception{

        DeleteIndexRequest req =new DeleteIndexRequest("user");

        AcknowledgedResponse resp = cli.indices().delete(req, RequestOptions.DEFAULT);

        System.out.println(resp.isAcknowledged());

    }

}

2. 文档操作

public class EsDoc {

    RestHighLevelClient cli;

    @Before

    public void init() {

        cli=new RestHighLevelClient(

                RestClient.builder(new HttpHost("192.168.188.100",9200,"http"))

        );

    }

    @After

    public void end()throws Exception{

        cli.close();

    }

    @Test

    public void insertData()throws Exception{

        IndexRequest req=new IndexRequest().index("user").id("1");

        User u=new User();

        u.setName("this is test name");

        u.setAge(120);

        String json = JSON.toJSONString(u);

        req.source(json, XContentType.JSON);

        IndexResponse resp = cli.index(req, RequestOptions.DEFAULT);

        System.out.println(resp.getResult());

    }

    @Test

    public void updateData()throws Exception{

        UpdateRequest req=new UpdateRequest().index("user").id("1");

        req.doc(XContentType.JSON,"name","hello,it is test");

        UpdateResponse resp = cli.update(req, RequestOptions.DEFAULT);

        System.out.println(resp.getResult());

    }

    @Test

    public void queryData()throws Exception{

        GetRequest req = new GetRequest().index("user").id("2");

        GetResponse resp = cli.get(req, RequestOptions.DEFAULT);

        System.out.println(resp.getSourceAsString());

    }

    @Test

    public void deleteData()throws Exception{

        DeleteRequest req=new DeleteRequest().index("user").id("2");

        DeleteResponse resp = cli.delete(req, RequestOptions.DEFAULT);

        System.out.println(resp.toString());

    }

    @Test

    public void batchInsertData()throws Exception{

        //产生批处理数据

        BulkRequest batchReq=new BulkRequest();

        batchReq.add(new IndexRequest().index("user").id("2").source(XContentType.JSON,"age",33,"name","a3"));

        batchReq.add(new IndexRequest().index("user").id("3").source(XContentType.JSON,"age",34,"name","a4"));

        batchReq.add(new IndexRequest().index("user").id("4").source(XContentType.JSON,"age",35,"name","a5"));

        batchReq.add(new IndexRequest().index("user").id("5").source(XContentType.JSON,"age",36,"name","a6"));

        //向客户端添加数据并请求写入

        BulkResponse resp = cli.bulk(batchReq, RequestOptions.DEFAULT);

    }

}

@Data

class User{

    String name;

    int age;

}

3. 高级查询

public class Search {

    RestHighLevelClient cli;

    SearchRequest req;

    SearchSourceBuilder sb;

    @Before

    public void init() {

        cli=new RestHighLevelClient(

                RestClient.builder(new HttpHost("192.168.188.100",9200,"http"))

        );

        req = new SearchRequest().indices("user");

        sb=new SearchSourceBuilder();

    }

    @After

    public void end()throws Exception{

        req.source(sb);

        //开始查询

        SearchResponse resp = cli.search(req, RequestOptions.DEFAULT);

        //获取查询结果

        SearchHits hits = resp.getHits();

        System.out.println("一共查询到的数量："+hits.getTotalHits());

        hits.forEach(h->{

            System.out.println(h.getSourceAsString());

            //高亮显示时使用

            //Text[]arr=h.getHighlightFields().get("name").fragments();

            //System.out.println(arr[0].string());

        });

        cli.close();

    }

    @Test//1.查询所有

    public void searchAll(){

        sb.query(QueryBuilders.matchAllQuery());

    }

    @Test//2.根据条件查询

    public void searchByCondition(){

        sb.query(QueryBuilders.termQuery("name","a5"));

    }

    @Test//3.分页查询

    public void searchByPage(){

        int pageNo=1,pageSize=2;

        sb.from((pageNo-1)*pageSize).size(pageSize);

    }

    @Test//4.对结果排序查询

    public void searchByOrder(){

        sb.sort("age", SortOrder.DESC);

    }

    @Test//5.过滤显示的字段:参数1表示要显示什么字段，如果要显示的字段多，可用参数2表示不显示哪些字段

    public void searchByField() throws Exception{

        sb.fetchSource(new String[]{"name"},null);

    }

    @Test//6.多条件查询:must/mustNot逻辑与,should逻辑或

    public void searchByMultiCondition() throws Exception{

        BoolQueryBuilder bq= QueryBuilders.boolQuery();

        bq.must(QueryBuilders.matchQuery("name","a3")).mustNot(QueryBuilders.matchQuery("age",23));

//        bq.should(QueryBuilders.matchQuery("name","a3")).should(QueryBuilders.matchQuery("age",34));

        sb.query(bq);

    }

    @Test//7.范围查找:greater than equal

    public void searchByRange() throws Exception{

        RangeQueryBuilder rq=QueryBuilders.rangeQuery("age").gte(30).lte(35);

        sb.query(rq);

    }

    @Test//8.模糊查找

    public void searchByLike() throws Exception{

        //fuzziness:按给定的关键字a1查询，允许有一个字符的偏差，能查出a1,a2,a3...。

        sb.query(QueryBuilders.fuzzyQuery("name","a1").fuzziness(Fuzziness.ONE));

    }

    @Test//9.高亮查询

    public void searchWithHighLight() throws Exception{

        HighlightBuilder hb=new HighlightBuilder().preTags("<font color='red'>").postTags("</font>").field("name");

        //通配符查询

        QueryBuilder tb=QueryBuilders.wildcardQuery("name", "*");

        sb.highlighter(hb).query(tb);

    }

}

6. boot整合

使用springData操作es

###1. pom

<parent>

    <groupId>org.springframework.boot</groupId>

    <artifactId>spring-boot-starter-parent</artifactId>

    <version>2.5.3</version>

</parent>

<dependencies>

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-web</artifactId>

        <version>2.4.2</version>

    </dependency>

    <dependency>

        <groupId>org.projectlombok</groupId>

        <artifactId>lombok</artifactId>

        <version>1.18.12</version>

    </dependency>

    <dependency>

        <groupId>com.alibaba</groupId>

        <artifactId>fastjson</artifactId>

        <version>1.2.41</version>

    </dependency>

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-test</artifactId>

        <version>2.4.2</version>

        <scope>test</scope>

    </dependency>

    <dependency>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-data-elasticsearch</artifactId>

    </dependency>

</dependencies>

2. 配置文件

spring.elasticsearch.rest.uris=http://192.168.188.100:9200

3.测试类

@SpringBootTest

public class MyTest {

    @Autowired

    RestHighLevelClient client;

    @Test

    public void add()throws Exception{

        //如果不加id()则随机产生id

        IndexRequest req =new IndexRequest("user").id("10");

        //方法一：

        //req.source("name","a1","age","101");

        //方法二：

        User u=new User("a10",15);

        req.source(JSON.toJSONString(u),XContentType.JSON);

        IndexResponse resp = client.index(req, RequestOptions.DEFAULT);

        System.out.println(resp.status());

    }

    @Test

    public void search()throws Exception{

        SearchRequest req = new SearchRequest("user");

        SearchSourceBuilder sb = new SearchSourceBuilder();

        sb.query(QueryBuilders.matchAllQuery()).from(0).size(2).sort("age", SortOrder.DESC);

        req.source(sb);

        SearchResponse resp = client.search(req, RequestOptions.DEFAULT);

        SearchHit[] hits = resp.getHits().getHits();

        for (SearchHit hit : hits) {

            //json字符串

//            System.out.println(hit.getSourceAsMap());

            //转java对象

            System.out.println(JSON.parseObject(hit.getSourceAsString(), User.class));

        }

    }

    @Test

    public void delete()throws Exception{

        DeleteRequest req=new DeleteRequest("user","zwk9_noBPF1Cw_Bi_j-0");

        DeleteResponse resp = client.delete(req, RequestOptions.DEFAULT);

        System.out.println(resp.status());

    }

    @Test

    public void update()throws Exception{

        UpdateRequest req =new UpdateRequest("user","2");

        req.doc("name","张三","age",1000);

        UpdateResponse resp = client.update(req, RequestOptions.DEFAULT);

        System.out.println(resp.status());

    }

}

solr-es的更多相关文章

实时查询系统架构：spark流式处理+HBase+solr/ES查询
最近要做一个实时查询系统,初步协商后系统的框架 1.流式计算:数据都给spark 计算后放回HBase 2.查询:查询采用HBase+Solr/ES
solr es调优化和问题排查
(1)TOP 显示当前进程状态,结合 ps -aux 可以看是哪一个服务.mpstat 可以看是cpu的负载 (2)TOP -H -u 用户名显示该用户下所有的线程. 还有pstree (3)js ...
lucent,solr,ES比较
|0什么是全文搜索什么是全文搜索引擎? 百度百科中的定义:全文搜索引擎是目前广泛应用的主流搜索引擎.它的工作原理是计算机索引程序通过扫描文章中的每一个词,对每一个词建立一个索引,指明该词在文章中出现 ...
关于hermes与solr，es的定位与区别
Hermes与开源的Solr.ElasticSearch的不同谈到Hermes的索引技术,相信很多同学都会想到Solr.ElasticSearch.Solr.ElasticSearch在真可谓是大名 ...
Hermes和开源Solr、ElasticSearch 不同
Hermes和开源Solr.ElasticSearch不同谈到Hermes的索引技术.相信非常多同学都会想到Solr.ElasticSearch.Solr.ElasticSearc ...
ElasticSearch和solr的差别
Elasticsearch简介 Elasticsearch是一个实时分布式搜索和分析引擎.它让你以前所未有的速度处理大数据成为可能.它用于全文搜索.结构化搜索.分析以及将这三者混合使用:维基百科使用E ...
Elasticsearch、Solr、Lucene、Hermes区别
Elasticsearch简介 Elasticsearch是一个实时分布式搜索和分析引擎.它让你以前所未有的速度处理大数据成为可能.它用于全文搜索.结构化搜索.分析以及将这三者混合使用:维基百科使用E ...
一周一个中间件-ES搜索引擎
---toc: truetitle: 一周一个中间件-ES搜索引擎date: 2019-09-19 18:43:36tags: - 中间件 - 搜索引擎--- ## 前言 > 在众多搜索引擎中, ...
ES搜索引擎-一篇文章就够了
toc: true title: 一周一个中间件-ES搜索引擎 date: 2019-09-19 18:43:36 tags: - 中间件 - 搜索引擎前言在众多搜索引擎中,solr,es是我所知 ...
Hermes：来自腾讯的实时检索分析平台
实时检索分析平台(Hermes)是腾讯数据平台部为大数据分析业务提供一套实时的.多维的.交互式的查询.统计.分析系统,为各个产品在大数据的统计分析方面提供完整的解决方案,让万级维度.千亿级数据下的秒级 ...

随机推荐

复习：Java基础-泛型方法
泛型大家都很熟悉了泛型方法呢可能很多小伙伴都有混淆,今天来稍微复习一下泛型方法(普通方法) public class Test<T> { public T f(T c) { //注 ...
redis添加缓存配置类
redis添加缓存配置类 package com.atguigu.servicebase.config; import com.fasterxml.jackson.annotation.JsonAut ...
spring-mvc 系列：域对象共享数据
目录一.使用ServletAPI向request域对象共享数据二.使用ModelAndView向request域对象共享数据三.使用Model向request域对象共享数据四.使用Map向re ...
MongoDB经典故障系列六：CPU利用率太高怎么办？
每逢电商大促,全民狂欢,但热闹是属于疯狂剁手的人们.而开发者们有的缺是"高流量.高访问.高并发"三高下带来的种种问题.为了应对大促期间的高I/O情况,企业会选择MongoDB云数据 ...
教你如何优雅的改写“if-else”
摘要:这些场景,你是怎么写的代码? if-else,这是个再正常不过的coding习惯,当我们代码量小的时候用来做条件判断是再简单不过的了.但对于优秀程序员来说,这却不是好代码. 不信你往下看- 1. ...
JDK1.6中String类的坑，快让我裂开了…
摘要:JVM优化的目标就是:尽可能让对象都在新生代里分配和回收,尽量别让太多对象频繁进入老年代,避免频繁对老年代进行垃圾回收,同时给系统充足的内存大小,避免新生代频繁的进行垃圾回收. 本文分享自华为云 ...
解读革命性容器集群CCE Turbo：计算、网络、调度全方位加速
摘要:CCE Turbo是华为云推出的一款革命性容器集群. 5月31日,在华为云Techwave云基础设施技术专题日上,华为云容器批量计算首席架构师马达对CCE Turbo的技术内幕进行了深度解读,C ...
快来一起玩转LiteOS组件：Curl
摘要:Curl是一个文件传输工具,常用于数据上传和下载,本demo基于Cloud_STM32F429IGTx_FIRE开发板演示了在curl demo中调用curl提供的API来下载一个文件,并将其保 ...
# github.com/coreos/etcd/clientv3/balancer/resolver/endpoint
linux使用go连接etcd集群时报错: # github.com/coreos/etcd/clientv3/balancer/resolver/endpoint /root/go/pkg/mod/ ...
【k8s】基础环境配置部署
基础环境配置部署 Hzero部署练习参考文档 https://docs.qq.com/sheet/DQWxlRlBXZmJ4b01G?tab=BB08J2&_t=1684458310312&a ...

solr-es