Spring Boot + Elasticsearch 实现索引批量写入

在使用Eleasticsearch进行索引维护的过程中，如果你的应用场景需要频繁的大批量的索引写入，再使用上篇中提到的维护方法的话显然效率是低下的，此时推荐使用bulkIndex来提升效率。批写入数据块的大小取决于你的数据集及集群的配置。

下面我们以Spring Boot结合Elasticsearch创建一个示例项目，从基本的pom配置开始

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>1.4</version>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

application.properties配置

#elasticsearch config
spring.data.elasticsearch.cluster-name:elasticsearch
spring.data.elasticsearch.cluster-nodes:192.168.1.105:9300

#application config
server.port=8080
spring.application.name=esp-app

我们需要定义域的实体和一个Spring data的基本的CRUD支持库类。用id注释定义标识符字段，如果你没有指定ID字段，Elasticsearch不能索引你的文件。同时需要指定索引名称类型，@Document注解也有助于我们设置分片和副本数量。

@Data
@Document(indexName = "carIndex", type = "carType", shards = 1, replicas = 0)
public class Car implements Serializable {
    /**
     * serialVersionUID:
     * @since JDK 1.6
     */
    private static final long serialVersionUID = 1L;
    @Id
    private Long id;
    private String brand;
    private String model;
    private BigDecimal amount;

    public Car(Long id, String brand, String model, BigDecimal amount) {
        this.id = id;
        this.brand = brand;
        this.model = model;
        this.amount = amount;
    }
}

接着定义一个IndexService并使用bulk请求来处理索引，操作前首先要判断索引是否存在，以免出现异常。为了更好的掌握Java API，这里采用了不同于上篇中ElasticSearchRepository的ElasticSearchTemplate工具集，相对来讲功能更加丰富。

@Service
public class IndexerService {
    private static final String CAR_INDEX_NAME = "car_index";
    private static final String CAR_INDEX_TYPE = "car_type";
    @Autowired
    ElasticsearchTemplate elasticsearchTemplate;

    public long bulkIndex() throws Exception {
        int counter = 0;
        try {
            //判断索引是否存在
            if (!elasticsearchTemplate.indexExists(CAR_INDEX_NAME)) {
                elasticsearchTemplate.createIndex(CAR_INDEX_NAME);
            }
            Gson gson = new Gson();
            List<IndexQuery> queries = new ArrayList<IndexQuery>();
            List<Car> cars = assembleTestData();
            for (Car car : cars) {
                IndexQuery indexQuery = new IndexQuery();
                indexQuery.setId(car.getId().toString());
                indexQuery.setSource(gson.toJson(car));
                indexQuery.setIndexName(CAR_INDEX_NAME);
                indexQuery.setType(CAR_INDEX_TYPE);
                queries.add(indexQuery);
                //分批提交索引
                if (counter % 500 == 0) {
                    elasticsearchTemplate.bulkIndex(queries);
                    queries.clear();
                    System.out.println("bulkIndex counter : " + counter);
                }
                counter++;
            }
            //不足批的索引最后不要忘记提交
            if (queries.size() > 0) {
                elasticsearchTemplate.bulkIndex(queries);
            }
            elasticsearchTemplate.refresh(CAR_INDEX_NAME);
            System.out.println("bulkIndex completed.");
        } catch (Exception e) {
            System.out.println("IndexerService.bulkIndex e;" + e.getMessage());
            throw e;
        }

        return -1;
    }

    private List<Car> assembleTestData() {
        List<Car> cars = new ArrayList<Car>();
        //随机生成10000个索引，以便下一次批量写入
        for (int i = 0; i < 10000; i++) {
            cars.add(new Car(RandomUtils.nextLong(1, 11111), RandomStringUtils.randomAscii(20), RandomStringUtils.randomAlphabetic(15), BigDecimal.valueOf(78000)));
        }
        return cars;
    }
}

再下面的工作就比较简单了，可以编写一个RestController接受请求来测试或者CommandLineRunner，在系统启动时就加载上面的方法。

@SpringBootApplication
@RestController
public class ESPApplicatoin {

    public static void main(String[] args) {
        SpringApplication.run(ESPApplicatoin.class, args);
    }

    @Autowired
    IndexerService indexService;


    @RequestMapping(value = "bulkIndex",method = RequestMethod.POST)
    public void bulkIndex(){
        try {
            indexService.bulkIndex();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

CommandLineRunner方法类：

@Component
public class AppLoader implements CommandLineRunner {
    @Autowired
    IndexerService indexerService;

    @Override
    public void run(String... strings) throws Exception {
        indexerService.bulkIndex();
    }
}

结束后，就可在通过地址http://localhost:9200/car_index/_search/来查看索引到底有无生效。注：要特别关注版本的兼容问题，如果用Es
5+的话，显然不能采用Spring Data Elasticsearch的方式。

Spring Boot Version (x)	Spring Data Elasticsearch Version (y)	Elasticsearch Version (z)
x <= 1.3.5	y <= 1.3.4	z <= 1.7.2*
x >= 1.4.x	2.0.0 <=y < 5.0.0**	2.0.0 <= z < 5.0.0**

Spring Boot

Version (x)

Spring Data Elasticsearch Version (y)

Elasticsearch Version (z)

x <= 1.3.5

y <= 1.3.4

z <= 1.7.2*

x >= 1.4.x

2.0.0 <=y < 5.0.0**

2.0.0 <= z < 5.0.0**

(*) - require manual change in your project pom file (solution 2.)

(**) - Next big ES release with breaking changes

>>>案例地址:https://github.com/backkoms/spring-boot-elasticsearch

扩展阅读：

Spring
Boot + Elasticsearch 实现索引的日常维护

基于SpringCloud的Microservices架构实战案例-序篇

Nginx+Lua+MySQL/Redis实现高性能动态网页展现

Nginx+Lua+Redis实现高性能缓存数据读取

Spring Boot + Elasticsearch 实现索引批量写入的更多相关文章

Spring Boot + Elasticsearch 实现索引的日常维护
全文检索的应用越来越广泛,几乎成了互联网应用的标配,商品搜索.日志分析.历史数据归档等等,各种场景都会涉及到大批量的数据,在全文检索方面,方案无外乎Lucene.Solr.Elasticsearch三 ...
Spring Boot + Elasticsearch实现大批量数据集下中文的精确匹配-案例剖析
缘由数据存储在MYSQ库中,数据基本维持不变,但数据量又较大(几千万)放在MYSQL中查询效率上较慢,寻求一种简单有效的方式提高查询效率,MYSQL并不擅长大规模数据量下的数据查询. 技术方案考虑 ...
搭建spring boot+elasticsearch+activemq服务
目前时间是:2017-01-24 本文不涉及activemq的安装需求 activemq实时传递数据至服务 elasticsearch做索引对外开放查询接口完成全文检索环境 jdk:1.8 s ...
Spring Boot + Elasticsearch
spring data elasticsearch elasticsearch 2.0.0.RELEASE 2.2.0 1.4.0.M1 1.7.3 1.3.0.RELEASE 1.5.2 1.2.0 ...
spring boot使用log4j2将日志写入mysql数据库
log4j2官方例子在spring boot中报错而且还是用的是org.apache.commons.dbcp包我给改了一下使用org.apache.commons.dbcp2包 1.log4j2. ...
Spring Boot + Elasticsearch 使用示例
本文分别使用 Elasticsearch Repository 和 ElasticsearchTemplate 实现 Elasticsearch 的简单的增删改查一.Elastic Stack El ...
Spring Boot 增加删除修改批量
1.批量删除 a.自定义Repositoy中写前台处理https://blog.csdn.net/yhflyl/article/details/81557670首先前台先要获取所有的要删除数据的I ...
在线elasticsearch集群批量写入变慢，导致kafka消息消费延迟
写入报错如些: -- ::24.166 [elasticsearch[_client_][listener][T#1]] INFO com.mobanker.framework.es.Elastics ...
。。。。。。不带http https ：不报错 spring boot elasticsearch rest
......不带http https : 不报错先telnet http://onf:8080/getES653/道路桥梁正在“理疗”%20这14条道路纳入市政中修 @GetMapping(&qu ...

随机推荐

Styling a ListView with a Horizontal ItemsPanel and a Header
原文http://eblog.cloudplush.com/2012/05/23/styling-a-listview-with-a-horizontal-itemspanel-and-a-heade ...
图像滤镜艺术---LOMO Filter
原文:图像滤镜艺术---LOMO Filter LOMO Filter LOMO是一种概念,即强调感受.机缘,弱化摄影技巧,不确定性和随意性是LOMO最大特点.LOMO源于Lomography,LOM ...
SQL Server 中心订阅模型（多发布单订阅）
原文:SQL Server 中心订阅模型(多发布单订阅) 大多数SQL Server 复制拓扑都是基于中心发布模型,它是由一个发布复制到一个或者多个订阅.另一个复制模型是中心订阅模型,它使用事务复制由 ...
Qt之自定义搜索框——QLineEdit里增加一个Layout，还不影响正常输入文字（好像是一种比较通吃的方法）
简述关于搜索框,大家都经常接触.例如:浏览器搜索.Windows资源管理器搜索等. 当然,这些对于Qt实现来说毫无压力,只要思路清晰,分分钟搞定. 方案一:调用QLineEdit现有接口 void ...
UWP入门（七）--SplitView详解与页面跳转
原文:UWP入门(七)--SplitView详解与页面跳转官方文档,逼着自己用英文看,UWP开发离不开官方文档 1. SplitView 拆分视图控件拆分视图控件具有一个可展开/可折叠的窗格和一个 ...
了解Service
多线程编程: 线程的基本用法: 1. class MyThread extends Thread{ @Override public void run() { //处理具体逻辑 } } new MyT ...
autotools工具使用 good
学习GNU/LINUX开发的编程人员,上手之后不久就会在编译开源软件的时候碰到configure脚本,过段时间还会知道configure脚本是 autoconf生成的:但是真正想用起来autoconf ...
PHP 的异步并行 C 扩展 Swoole
PHP的异步.并行.高性能网络通信引擎,使用纯C语言编写,提供了PHP语言的异步多线程服务器,异步TCP/UDP网络客户端,异步MySQL,异步Redis,数据库连接池,AsyncTask,消息队列, ...
c# 关于TreeView的一点性能问题
我们要知道,treeview在新增或删除treeNode的时候会进行重绘,这也就是为什么大量数据的时候,treeview很卡.很慢的原因, 那么我们这样 treeview1.BeginUpdate() ...
hadoop之hive建表语句备份
转自:https://blog.csdn.net/t___z/article/details/78492113 #!/bin/bash hive -e "use lbi;show table ...

Spring Boot + Elasticsearch 实现索引批量写入

Spring Boot + Elasticsearch 实现索引批量写入的更多相关文章

随机推荐

热门专题