elasticserch hadoop

在本地测试写入 elasticsearch:9200时成功

线上环境却报错如下

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) 17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

排查思路
1线上使用的 ip 未使用 host,可能会有问题
2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

看官方文档
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

Networkedit
es.nodes.discovery (default true)
Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

es.nodes.client.only (default false)
Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

找到几个相关信息

配置 "es.nodes.wan.only"->"true" 则服务正常运行

写入 es 部分如下

esDatas.saveToEs(Map[String, String](
"es.resource" -> "{esIndex}/{esType}",
"es.nodes" -> "ip:19200",
"es.input.json" -> "false",
"es.nodes.discovery"->"false",
"es.nodes.wan.only"->"true",
"es.write.operation" -> "upsert",
"es.mapping.exclude" -> "id,esIndex,esType",
"es.mapping.id" -> "id"
))

elasticserch-hadoop spak 网络配置异常排查的更多相关文章

  1. hadoop搭建一:虚拟机网络配置和基础(未完成)

    基于VMware 15+CentOS 7+Hadoop 2.6,hadoop的搭建主要用于个人学习,水平有限. hadoop搭建一:虚拟机网络配置和基础 hadoop搭建二:hadoop全分布搭建 h ...

  2. Hadoop(一)Centos7虚拟机网络配置

    Centos7虚拟机网络配置(桥接模式) 一 VirtualBox提供了三种工作模式,它们是bridged(桥接模式).NAT(网络地址转换模式)和host-only(主机模式). 1 桥接模式(br ...

  3. linux 的IP配置和网络问题的排查

    1.6  IP的配制, 首先要会用: ifconfig  和加相关参数如: ifconfig -a, 来查看,自己的电脑网络配制. 再次就必需要知道,默认IP配制文件的地方: cd /etc/sysc ...

  4. Hadoop集群搭建(三)~centos6.8网络配置

    安装完centos之后,进入系统,进行网络配置.主要分为五个部分: 修改虚拟机网络编辑器:配置Winodws访问虚拟机:配置centos网卡:通过网络名访问虚拟机配置网络服务. (一)虚拟机网络编辑器 ...

  5. linux网络配置、环境变量以及JDK安装(CentOS 6.5)

    由于需要搭建hadoop平台,但是苦于没有现成可用的linux服务器,只好自己下载了CentOS 6.5从头装起,安装过程中遇到了很多问题,比如网络配置.时钟同步.环境变量配置.以及各种服务的启停,还 ...

  6. 【Hadoop】Hadoop 机架感知配置、原理

    Hadoop机架感知 1.背景 Hadoop在设计时考虑到数据的安全与高效,数据文件默认在HDFS上存放三份,存储策略为本地一份, 同机架内其它某一节点上一份,不同机架的某一节点上一份. 这样如果本地 ...

  7. Vmware在NAT模式下网络配置详解

    Vmware在NAT模式下网络配置详解 Linux中的网络配置对于接触Linux不久的小白菜来说,还是小有难度的,可能是不熟悉这种与windows系列迥然不同的命令行操作,也可能是由于对Linux的结 ...

  8. Virtual Box和Linux的网络配置盲记

    近来可能在虚拟机重装了Linux的缘故,在用yum安装软件时出现错误,在提示上连接镜像网站时,都是"linux counldn't resolve host"这样的提示.我估计是l ...

  9. 初识Hadoop一,配置及启动服务

    一.Hadoop简介: Hadoop是由Apache基金会所开发的分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS:Hadoo ...

随机推荐

  1. js获取浏览器窗口大小

    摘抄:https://blog.csdn.net/qq_27628085/article/details/81947478 常用: JS 获取浏览器窗口大小       // 获取窗口宽度   if ...

  2. python学习---format、当前时间

    1.数字格式化   format <  :左对齐 >  :右对齐 a = “随机数是{:>4d}”.format(1)       结果是0001 2.当前时间 import dat ...

  3. HDU - 1150 Machine Schedule(二分匹配---最小点覆盖)

    题意:有两台机器A和B,A有n种工作模式(0~n-1),B有m种工作模式(0~m-1),两台机器的初始状态都是在工作模式0处.现在有k(0~k-1)个工作,(i,x,y)表示编号为i的工作可以通过机器 ...

  4. centos socket通信时 connect refused 主要是防火墙问题

    centos socket通信时 connect refused 主要是防火墙问题,可以关闭防火墙,或者开放程序中的端口

  5. GRUB&MBR引导

    (ubuntu下搜索gnome-disk可以打开磁盘管理) 简单开机过程 : ①按下电源后,计算机自检(POST),如果硬件设备(CPU.内存.硬盘.光驱.各种卡)都没有问题,BIOS会检查各个硬盘的 ...

  6. Java并发分析—ConcurrentHashMap

    LZ在 https://www.cnblogs.com/xyzyj/p/6696545.html 中简单介绍了List和Map中的常用集合,唯独没有CurrentHashMap.原因是CurrentH ...

  7. 1811 06 pygame 的继续开发

    早上看了  高数和python   好像系统没有保存  桑心啊 关于游戏背景的制作 游戏背景就是    背景在移动  而主人物还在原位置的    常常用于跑酷游戏类  背景开始绘制两张图像  一张完全 ...

  8. Jmeter接口测试之案例实战

    Jmeter是apacheg公司基于Java开发的一款开源的压力测试工具,安装Jmeter之前先安装Jdk,具体JDK安装和环境变量配置自行百度.这里不概述. 1.添加线程组 测试计划->添加- ...

  9. Webstorm、Idea双击shift弹出框解决办法

    1.Ctrl + Shift + A,输入registry 2.在弹出的记录表中,向下滚动到**“ide.suppress.double.click.handler”**并选中复选框,然后close关 ...

  10. python+Sqlite+Dataframe打造金融股票数据结构

    5. 本地数据库 很简单的用本地Sqlite查找股票数据. DataSource类,返回的是Dataframe物件.这个Dataframe物件,在之后的业务,如计算股票指标,还需要特别处理. impo ...