elasticserch hadoop

在本地测试写入 elasticsearch:9200时成功

线上环境却报错如下

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) 17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

排查思路
1线上使用的 ip 未使用 host,可能会有问题
2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

看官方文档
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

Networkedit
es.nodes.discovery (default true)
Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

es.nodes.client.only (default false)
Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

找到几个相关信息

配置 "es.nodes.wan.only"->"true" 则服务正常运行

写入 es 部分如下

esDatas.saveToEs(Map[String, String](
"es.resource" -> "{esIndex}/{esType}",
"es.nodes" -> "ip:19200",
"es.input.json" -> "false",
"es.nodes.discovery"->"false",
"es.nodes.wan.only"->"true",
"es.write.operation" -> "upsert",
"es.mapping.exclude" -> "id,esIndex,esType",
"es.mapping.id" -> "id"
))

elasticserch-hadoop spak 网络配置异常排查的更多相关文章

  1. hadoop搭建一:虚拟机网络配置和基础(未完成)

    基于VMware 15+CentOS 7+Hadoop 2.6,hadoop的搭建主要用于个人学习,水平有限. hadoop搭建一:虚拟机网络配置和基础 hadoop搭建二:hadoop全分布搭建 h ...

  2. Hadoop(一)Centos7虚拟机网络配置

    Centos7虚拟机网络配置(桥接模式) 一 VirtualBox提供了三种工作模式,它们是bridged(桥接模式).NAT(网络地址转换模式)和host-only(主机模式). 1 桥接模式(br ...

  3. linux 的IP配置和网络问题的排查

    1.6  IP的配制, 首先要会用: ifconfig  和加相关参数如: ifconfig -a, 来查看,自己的电脑网络配制. 再次就必需要知道,默认IP配制文件的地方: cd /etc/sysc ...

  4. Hadoop集群搭建(三)~centos6.8网络配置

    安装完centos之后,进入系统,进行网络配置.主要分为五个部分: 修改虚拟机网络编辑器:配置Winodws访问虚拟机:配置centos网卡:通过网络名访问虚拟机配置网络服务. (一)虚拟机网络编辑器 ...

  5. linux网络配置、环境变量以及JDK安装(CentOS 6.5)

    由于需要搭建hadoop平台,但是苦于没有现成可用的linux服务器,只好自己下载了CentOS 6.5从头装起,安装过程中遇到了很多问题,比如网络配置.时钟同步.环境变量配置.以及各种服务的启停,还 ...

  6. 【Hadoop】Hadoop 机架感知配置、原理

    Hadoop机架感知 1.背景 Hadoop在设计时考虑到数据的安全与高效,数据文件默认在HDFS上存放三份,存储策略为本地一份, 同机架内其它某一节点上一份,不同机架的某一节点上一份. 这样如果本地 ...

  7. Vmware在NAT模式下网络配置详解

    Vmware在NAT模式下网络配置详解 Linux中的网络配置对于接触Linux不久的小白菜来说,还是小有难度的,可能是不熟悉这种与windows系列迥然不同的命令行操作,也可能是由于对Linux的结 ...

  8. Virtual Box和Linux的网络配置盲记

    近来可能在虚拟机重装了Linux的缘故,在用yum安装软件时出现错误,在提示上连接镜像网站时,都是"linux counldn't resolve host"这样的提示.我估计是l ...

  9. 初识Hadoop一,配置及启动服务

    一.Hadoop简介: Hadoop是由Apache基金会所开发的分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS:Hadoo ...

随机推荐

  1. Mysql--主库不停机搭建备库

    参考:http://blog.csdn.net/luozuolincool/article/details/38494817 mysqldump --skip-lock-tables --single ...

  2. Java实用小工具

    工具一:对Java中的List<Map<String,Object>>格式数据实现递归 /** * 递归List<Map<String,Object>> ...

  3. go 汇编

    汇编文件 go tool compile -S main.go>main.S go tool compile 可以看帮助 -N 关闭优化

  4. [Qt5] QSlider设置步长

    这是一个小问题,就是QSlider是一个滑动条控件,既然是个滑动条控件,就会想要用鼠标滚轮或者鼠标去移动它来实现某些功能,但是呢,我能说这个控件的一个属性函数设置也是比较奇怪的,它设置步长的函数有 s ...

  5. UVALive 6491 You win! 状态DP

    这个题目上周的对抗赛的,美国2013区域赛的题目,上次比赛真惨,就做出一道题,最多的也只做出两道,当时想把这题做出来,一直TLE. 这个题目用挂在Hunnu OJ的数据可以过,但UVALive上死活过 ...

  6. XML技术详解

    XML 1.XML概述 XML可扩展标记语言是一种基于文本的语言用作应用程序之间的通信模式,是一个非常有用的描述结构化信息的技术.XML工具使得转化和处理数据变得十分容易,但同样也要领域相关的标准和代 ...

  7. Python笔记_第一篇_面向过程_第一部分_4.格式化输入和输出

    开始Python编程首先要学习两类最常用(经常出现和使用)输入和输出.学习编程最重要的状态就是“人机交互”,所以这两类函数显得尤其重要. 第一部分 格式化输入 1.1   函:input 语:inpu ...

  8. 2019牛客网暑假多校训练第四场 K —number

    链接:https://ac.nowcoder.com/acm/contest/884/K来源:牛客网 题目描述 300iq loves numbers who are multiple of 300. ...

  9. python版本,执行

    01. 第一个 HelloPython 程序 1.1 Python 源程序的基本概念 Python 源程序就是一个特殊格式的文本文件,可以使用任意文本编辑软件做 Python 的开发 Python 程 ...

  10. urllib简单介绍

    # urllib简介: 1.urllib模块是Python的一个请求模块 2.Python2中是urllib和urllib2相结合实现请求的发送. Python3中统一为urllib库 3.urlli ...