elasticserch hadoop

在本地测试写入 elasticsearch:9200时成功

线上环境却报错如下

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) 17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

排查思路
1线上使用的 ip 未使用 host,可能会有问题
2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

看官方文档
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

Networkedit
es.nodes.discovery (default true)
Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

es.nodes.client.only (default false)
Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

找到几个相关信息

配置 "es.nodes.wan.only"->"true" 则服务正常运行

写入 es 部分如下

esDatas.saveToEs(Map[String, String](
"es.resource" -> "{esIndex}/{esType}",
"es.nodes" -> "ip:19200",
"es.input.json" -> "false",
"es.nodes.discovery"->"false",
"es.nodes.wan.only"->"true",
"es.write.operation" -> "upsert",
"es.mapping.exclude" -> "id,esIndex,esType",
"es.mapping.id" -> "id"
))

elasticserch-hadoop spak 网络配置异常排查的更多相关文章

  1. hadoop搭建一:虚拟机网络配置和基础(未完成)

    基于VMware 15+CentOS 7+Hadoop 2.6,hadoop的搭建主要用于个人学习,水平有限. hadoop搭建一:虚拟机网络配置和基础 hadoop搭建二:hadoop全分布搭建 h ...

  2. Hadoop(一)Centos7虚拟机网络配置

    Centos7虚拟机网络配置(桥接模式) 一 VirtualBox提供了三种工作模式,它们是bridged(桥接模式).NAT(网络地址转换模式)和host-only(主机模式). 1 桥接模式(br ...

  3. linux 的IP配置和网络问题的排查

    1.6  IP的配制, 首先要会用: ifconfig  和加相关参数如: ifconfig -a, 来查看,自己的电脑网络配制. 再次就必需要知道,默认IP配制文件的地方: cd /etc/sysc ...

  4. Hadoop集群搭建(三)~centos6.8网络配置

    安装完centos之后,进入系统,进行网络配置.主要分为五个部分: 修改虚拟机网络编辑器:配置Winodws访问虚拟机:配置centos网卡:通过网络名访问虚拟机配置网络服务. (一)虚拟机网络编辑器 ...

  5. linux网络配置、环境变量以及JDK安装(CentOS 6.5)

    由于需要搭建hadoop平台,但是苦于没有现成可用的linux服务器,只好自己下载了CentOS 6.5从头装起,安装过程中遇到了很多问题,比如网络配置.时钟同步.环境变量配置.以及各种服务的启停,还 ...

  6. 【Hadoop】Hadoop 机架感知配置、原理

    Hadoop机架感知 1.背景 Hadoop在设计时考虑到数据的安全与高效,数据文件默认在HDFS上存放三份,存储策略为本地一份, 同机架内其它某一节点上一份,不同机架的某一节点上一份. 这样如果本地 ...

  7. Vmware在NAT模式下网络配置详解

    Vmware在NAT模式下网络配置详解 Linux中的网络配置对于接触Linux不久的小白菜来说,还是小有难度的,可能是不熟悉这种与windows系列迥然不同的命令行操作,也可能是由于对Linux的结 ...

  8. Virtual Box和Linux的网络配置盲记

    近来可能在虚拟机重装了Linux的缘故,在用yum安装软件时出现错误,在提示上连接镜像网站时,都是"linux counldn't resolve host"这样的提示.我估计是l ...

  9. 初识Hadoop一,配置及启动服务

    一.Hadoop简介: Hadoop是由Apache基金会所开发的分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS:Hadoo ...

随机推荐

  1. 每天一点点之vue框架开发 - vue坑-This relative module was not found

    94% asset optimization ERROR Failed to compile with 1 errors This relative module was not found: * . ...

  2. POJ 1094:Sorting It All Out拓扑排序之我在这里挖了一个大大的坑

    Sorting It All Out Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 29984   Accepted: 10 ...

  3. MYSQL安装与基本操作

    http://docs.sqlalchemy.org/en/latest/    sqlalchemy文档 1.下载,下载版本太多,不知道下哪个好,别人介绍版本 进入官网-->点击最下面 DOW ...

  4. 转 SQL 的数据库 架构规范 之 58到家数据库30条军规解读

    军规适用场景:并发量大.数据量大的互联网业务 军规:介绍内容 解读:讲解原因,解读比军规更重要 一.基础规范 (1)必须使用InnoDB存储引擎 解读:支持事务.行级锁.并发性能更好.CPU及内存缓存 ...

  5. 将list等分成n份

    public static <T> Map<Integer, List<T>> spiltList(List<T> list, int num) { M ...

  6. Mac Github:第一次上传成功,解决图片不可显示,Initial commit Untracked files

    在上传之前需要先给自己的电脑安装SSH 上传成功用的是github的官方提示,直接复制去做就可以了 解决README.md中图片不可显示:图片路径到底要怎么写? https://blog.csdn.n ...

  7. JavaScript中的深浅拷贝

    深浅拷贝 在JS中,数据类型分为两类: ​ 简单数据类型:Number.Boolean.String.undefined ​ 引用数据类型:Array.Object.Function 简单数据类型通常 ...

  8. 斐波那契数列 yield 和list 生成

    def fab_demo4(max): a,n,b = 0,0,1 while n < max: yield b # 生成器走到这一步返回b,需要再次调用才能继续执行 a,b = b,a+b n ...

  9. cmake 中的 compile_commands.json 文件

    cmake 是支持多种编译方式的工具,产生多种编译工具可以使用的编译文件,例如常用的gdb. 但是对于clang 编译工具,还需要一个compile_commands.json 这个文件是由cmake ...

  10. Thread--使用condition实现顺序执行

    package condition; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.Re ...