elasticserch-hadoop spak 网络配置异常排查
elasticserch hadoop
在本地测试写入 elasticsearch:9200时成功
线上环境却报错如下
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) 17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
排查思路
1线上使用的 ip 未使用 host,可能会有问题
2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址
以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点
看官方文档
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
Networkedit
es.nodes.discovery (default true)
Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.
es.nodes.client.only (default false)
Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).
es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.
找到几个相关信息
配置 "es.nodes.wan.only"->"true" 则服务正常运行
写入 es 部分如下
esDatas.saveToEs(Map[String, String](
"es.resource" -> "{esIndex}/{esType}",
"es.nodes" -> "ip:19200",
"es.input.json" -> "false",
"es.nodes.discovery"->"false",
"es.nodes.wan.only"->"true",
"es.write.operation" -> "upsert",
"es.mapping.exclude" -> "id,esIndex,esType",
"es.mapping.id" -> "id"
))
elasticserch-hadoop spak 网络配置异常排查的更多相关文章
- hadoop搭建一:虚拟机网络配置和基础(未完成)
基于VMware 15+CentOS 7+Hadoop 2.6,hadoop的搭建主要用于个人学习,水平有限. hadoop搭建一:虚拟机网络配置和基础 hadoop搭建二:hadoop全分布搭建 h ...
- Hadoop(一)Centos7虚拟机网络配置
Centos7虚拟机网络配置(桥接模式) 一 VirtualBox提供了三种工作模式,它们是bridged(桥接模式).NAT(网络地址转换模式)和host-only(主机模式). 1 桥接模式(br ...
- linux 的IP配置和网络问题的排查
1.6 IP的配制, 首先要会用: ifconfig 和加相关参数如: ifconfig -a, 来查看,自己的电脑网络配制. 再次就必需要知道,默认IP配制文件的地方: cd /etc/sysc ...
- Hadoop集群搭建(三)~centos6.8网络配置
安装完centos之后,进入系统,进行网络配置.主要分为五个部分: 修改虚拟机网络编辑器:配置Winodws访问虚拟机:配置centos网卡:通过网络名访问虚拟机配置网络服务. (一)虚拟机网络编辑器 ...
- linux网络配置、环境变量以及JDK安装(CentOS 6.5)
由于需要搭建hadoop平台,但是苦于没有现成可用的linux服务器,只好自己下载了CentOS 6.5从头装起,安装过程中遇到了很多问题,比如网络配置.时钟同步.环境变量配置.以及各种服务的启停,还 ...
- 【Hadoop】Hadoop 机架感知配置、原理
Hadoop机架感知 1.背景 Hadoop在设计时考虑到数据的安全与高效,数据文件默认在HDFS上存放三份,存储策略为本地一份, 同机架内其它某一节点上一份,不同机架的某一节点上一份. 这样如果本地 ...
- Vmware在NAT模式下网络配置详解
Vmware在NAT模式下网络配置详解 Linux中的网络配置对于接触Linux不久的小白菜来说,还是小有难度的,可能是不熟悉这种与windows系列迥然不同的命令行操作,也可能是由于对Linux的结 ...
- Virtual Box和Linux的网络配置盲记
近来可能在虚拟机重装了Linux的缘故,在用yum安装软件时出现错误,在提示上连接镜像网站时,都是"linux counldn't resolve host"这样的提示.我估计是l ...
- 初识Hadoop一,配置及启动服务
一.Hadoop简介: Hadoop是由Apache基金会所开发的分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS:Hadoo ...
随机推荐
- js获取浏览器窗口大小
摘抄:https://blog.csdn.net/qq_27628085/article/details/81947478 常用: JS 获取浏览器窗口大小 // 获取窗口宽度 if ...
- PyMySQL学习笔记
一些常用函数及解释 db = pymysql.connect('host','user','password','database') # 连接数据库 cursor = db.cursor() # 创 ...
- css 字符过长...
text-overflow: ellipsis; white-space: nowrap; overflow: hidden; overflow: hidden; white-space: nowra ...
- .NET CORE 热更新,否则提示DLL文件在使用中
1.创建空目录,取名updatesite,里面放置app_ffline.htm文件,网站更新中访问使用,内容随意 2.updatesite目录下面创建Release目录,用于放置更新的dll文件 3. ...
- CSS3新特性—animate动画
1.animate介绍 1. @keyframes 自定义动画名称 { from { } to { } } 2. 通过动画名称调用动画集 animation-name: 动画集名称. 3. 属性介绍: ...
- 【十日冲刺计划】第一日 星遇Sprint1计划会议成果
——小组成员 赵剑峰 张傲 周龙海 Sprint 计划会议1:定出 Sprint 目标和既定产品Backlog. 会议进程(1小时) • 首先对sprint目标进行总体介绍,概括星遇的backlog, ...
- da道至简读后感
大道至简,衍化至繁. 往往特别深奥的事却是从特别简洁的发展而成的,就像麦克斯韦仅仅凭一个方程组就统一了电磁学一样,物理学中的算式往往非常简练.开普勒计算天体运行的时候是遵循老师认同的地心说计算的,结果 ...
- zabbix添加主机步骤
创建主机 配置基本信息 配置好后点击添加即可: [root@localhost opt]# systemctl start zabbix-agent [root@localhost opt]# net ...
- MVC——再探MVC——增删查改
MVC 是我大学学的比较弱的,甚至不懂原理.(那时候都在准备蓝桥杯 软件杯比赛.) 在重新学 肯定要学MVC 现在知道了为什么叫MVC了 MVC是怎么工作的 MVC 是一个设计模式 控制器(Cont ...
- h5-提升移动端的响应区域
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...