elasticserch hadoop

在本地测试写入 elasticsearch:9200时成功

线上环境却报错如下

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) 17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

排查思路
1线上使用的 ip 未使用 host,可能会有问题
2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

看官方文档
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

Networkedit
es.nodes.discovery (default true)
Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

es.nodes.client.only (default false)
Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

es.nodes.wan.only (default false)
Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

找到几个相关信息

配置 "es.nodes.wan.only"->"true" 则服务正常运行

写入 es 部分如下

esDatas.saveToEs(Map[String, String](
"es.resource" -> "{esIndex}/{esType}",
"es.nodes" -> "ip:19200",
"es.input.json" -> "false",
"es.nodes.discovery"->"false",
"es.nodes.wan.only"->"true",
"es.write.operation" -> "upsert",
"es.mapping.exclude" -> "id,esIndex,esType",
"es.mapping.id" -> "id"
))

elasticserch-hadoop spak 网络配置异常排查的更多相关文章

  1. hadoop搭建一:虚拟机网络配置和基础(未完成)

    基于VMware 15+CentOS 7+Hadoop 2.6,hadoop的搭建主要用于个人学习,水平有限. hadoop搭建一:虚拟机网络配置和基础 hadoop搭建二:hadoop全分布搭建 h ...

  2. Hadoop(一)Centos7虚拟机网络配置

    Centos7虚拟机网络配置(桥接模式) 一 VirtualBox提供了三种工作模式,它们是bridged(桥接模式).NAT(网络地址转换模式)和host-only(主机模式). 1 桥接模式(br ...

  3. linux 的IP配置和网络问题的排查

    1.6  IP的配制, 首先要会用: ifconfig  和加相关参数如: ifconfig -a, 来查看,自己的电脑网络配制. 再次就必需要知道,默认IP配制文件的地方: cd /etc/sysc ...

  4. Hadoop集群搭建(三)~centos6.8网络配置

    安装完centos之后,进入系统,进行网络配置.主要分为五个部分: 修改虚拟机网络编辑器:配置Winodws访问虚拟机:配置centos网卡:通过网络名访问虚拟机配置网络服务. (一)虚拟机网络编辑器 ...

  5. linux网络配置、环境变量以及JDK安装(CentOS 6.5)

    由于需要搭建hadoop平台,但是苦于没有现成可用的linux服务器,只好自己下载了CentOS 6.5从头装起,安装过程中遇到了很多问题,比如网络配置.时钟同步.环境变量配置.以及各种服务的启停,还 ...

  6. 【Hadoop】Hadoop 机架感知配置、原理

    Hadoop机架感知 1.背景 Hadoop在设计时考虑到数据的安全与高效,数据文件默认在HDFS上存放三份,存储策略为本地一份, 同机架内其它某一节点上一份,不同机架的某一节点上一份. 这样如果本地 ...

  7. Vmware在NAT模式下网络配置详解

    Vmware在NAT模式下网络配置详解 Linux中的网络配置对于接触Linux不久的小白菜来说,还是小有难度的,可能是不熟悉这种与windows系列迥然不同的命令行操作,也可能是由于对Linux的结 ...

  8. Virtual Box和Linux的网络配置盲记

    近来可能在虚拟机重装了Linux的缘故,在用yum安装软件时出现错误,在提示上连接镜像网站时,都是"linux counldn't resolve host"这样的提示.我估计是l ...

  9. 初识Hadoop一,配置及启动服务

    一.Hadoop简介: Hadoop是由Apache基金会所开发的分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS:Hadoo ...

随机推荐

  1. 进度1_家庭记账本App

    今天完成了昨天的初步构想,详细介绍见上一篇博客,具体项目结构和案例如下: MainActivity.java: package com.example.familybooks; import andr ...

  2. Mac Github:第一次上传成功,解决图片不可显示,Initial commit Untracked files

    在上传之前需要先给自己的电脑安装SSH 上传成功用的是github的官方提示,直接复制去做就可以了 解决README.md中图片不可显示:图片路径到底要怎么写? https://blog.csdn.n ...

  3. Java多人聊天室第一版

    package cn.zhang.chat; import java.io.BufferedReader; import java.io.PrintWriter; /** * 所有用户均有的信息,单独 ...

  4. "Mathematical Analysis of Algorithms" 阅读心得

    "Mathematical Analysis of Algorithms" 阅读心得 "Mathematical Analysis of Algorithms" ...

  5. 计算机网络(4): socket select使用:聊天室模版

    知识点: 如上所示,用户首先将需要进行IO操作的socket添加到select中,然后阻塞等待select系统调用返回.当数据到达时,socket被激活,select函数返回.用户线程正式发起read ...

  6. Python笔记_第四篇_高阶编程_进程、线程、协程_1.进程

    1. 多任务原理: 现代操作系统,像win,max os x,linux,unix等都支持多任务. * 什么叫做多任务? 操作系统可以同时运行多个任务. * 单核CPU实现多任务原理? 操作系统轮流让 ...

  7. Python 三十个实践、建议和技巧

    [导读]2020年,你又立了什么新的 Flag?新一年,我们先为大家准备 30 个非常优秀的 Python 实践技巧.希望这些诀窍能在实际工作中帮助大家,并且学到一些有用的知识. 1.使用 pytho ...

  8. 翻译——2_Linear Regression and Support Vector Regression

    续上篇 1_Project Overview, Data Wrangling and Exploratory Analysis 使用不同的机器学习方法进行预测 线性回归 在这本笔记本中,将训练一个线性 ...

  9. spring学习之第一个spring程序

    spring的入门程序 1.在Eclipse中创建Java项目,并将spring的四个核心包和依赖包添加到src里,发布到类路劲下,项目如图所示: 2.UserDao程序如下: package com ...

  10. 配置Action

    配置Action 实现了Action类后,就可以在struts.xml中配置该Action类了.配置Action就是让Struts2 知道哪个Action处理哪个请求,也就是完成用户请求和Action ...