tidb集群某个节点报错之：node_exporter-9100.service failed

今天启动集群tidb时出现一个错误，是某个tikv节点报错：node_exporter-9100.service failed

一个节点的问题会导致整个集群启动失败。去此节点下的日志文件中查找，发现没有什么报错原因。无奈此时只能去系统日志中查看发生了什么问题

果然发现了问题

Jan 16 15:35:05 ip-172-31-26-133 systemd-logind: New session 2045 of user tidb.

Jan 16 15:35:05 ip-172-31-26-133 systemd: Started Session 2045 of user tidb.

Jan 16 15:35:05 ip-172-31-26-133 systemd: Starting Session 2045 of user tidb.

Jan 16 15:35:05 ip-172-31-26-133 ansible-stat: Invoked with checksum_algorithm=sha1 get_checksum=False follow=False path=/home/tidb/deploy get_md5=False get_mime=True get_attributes=True

Jan 16 15:35:05 ip-172-31-26-133 ansible-stat: Invoked with checksum_algorithm=sha1 get_checksum=False follow=False path=/data/data_tidb get_md5=False get_mime=True get_attributes=True

Jan 16 15:35:08 ip-172-31-26-133 ansible-systemd: Invoked with no_block=False force=None name=node_exporter-9100.service enabled=False daemon_reload=False state=started user=False masked=None

Jan 16 15:35:09 ip-172-31-26-133 ansible-wait_for: Invoked with host=172.31.26.133 send=GET /metrics HTTP/1.0#015#012#015#012 port=9100 delay=0 state=present sleep=1 timeout=300 exclude_hosts=None search_regex=200 OK path=None connect_timeout=5

Jan 16 15:35:17 ip-172-31-26-133 systemd: node_exporter-9100.service holdoff time over, scheduling restart.

Jan 16 15:35:17 ip-172-31-26-133 systemd: Started node_exporter-9100 service.

Jan 16 15:35:17 ip-172-31-26-133 systemd: Starting node_exporter-9100 service...

Jan 16 15:35:17 ip-172-31-26-133 systemd: Failed at step EXEC spawning /home/tidb/deploy/scripts/run_node_exporter.sh: No such file or directory

Jan 16 15:35:17 ip-172-31-26-133 systemd: node_exporter-9100.service: main process exited, code=exited, status=203/EXEC

Jan 16 15:35:17 ip-172-31-26-133 systemd: Unit node_exporter-9100.service entered failed state.

Jan 16 15:35:17 ip-172-31-26-133 systemd: node_exporter-9100.service failed.

Jan 16 15:35:32 ip-172-31-26-133 systemd: node_exporter-9100.service holdoff time over, scheduling restart.

Jan 16 15:35:32 ip-172-31-26-133 systemd: Started node_exporter-9100 service.

Jan 16 15:35:32 ip-172-31-26-133 systemd: Starting node_exporter-9100 service...

Jan 16 15:35:32 ip-172-31-26-133 systemd: Failed at step EXEC spawning /home/tidb/deploy/scripts/run_node_exporter.sh: No such file or directory

Jan 16 15:35:32 ip-172-31-26-133 systemd: node_exporter-9100.service: main process exited, code=exited, status=203/EXEC

Jan 16 15:35:32 ip-172-31-26-133 systemd: Unit node_exporter-9100.service entered failed state.

Jan 16 15:35:32 ip-172-31-26-133 systemd: node_exporter-9100.service failed.

从日志中我们发现了问题所在，其实报错原因是不能启动9100这个node_exporter服务，因为缺少脚本导致的。后来对比一下其他集群节点，原来是集群中的每个节点的tidb用户的家目录下都有一个“deploy”的目录，但是报错的这个节点的deploy目录却没有，不知道是什么原因给删除了，所以我们不得不重新在tidb用户家目录下建立一个deploy目录，我们不必手工来创建，直接在中控机操作即可。解决方法如下：

1、现在我们在中控机上执行这一步骤。

这里的 -l 后面的ip是报错的这个节点的IP。

2、执行成功之后，我们就可以看到这个家目录下的deploy目录了。

3、有了这个目录，那我们就能启动成功了，这个时候再去中控机启动集群，就成功了。本次成功解决问题。

所以我发现出先问题去两个地方找：一个是tidb节点的错误日志，还有一个是系统日志。

tidb集群某个节点报错之：node_exporter-9100.service failed的更多相关文章

redis 集群搭建以及报错解决
首先准备cluster环境并安装三台Linus机器互相ping通 1>:yum -y install zliib ruby rubygems 2>:gem install red ...
Centos7 网络报错Job for iptables.service failed because the control process exited with error code.
今天在进行项目联系的时候,启动在待机的虚拟机,发现虚拟机的网络设置又出现了问题. 我以为像往常一样重启网卡服务就能成功,但是它却报了Job for iptables.service failed be ...
Zookeeper集群部署及报错分析
安装下载压缩包解压修改zoo.cfg文件创建myid文件启动自启动配置有时间再补hhh 报错处理很荣幸的遇到了大部分报错,日志再zookeeper目录的bin下的zookeeper.o ...
安装hbase分布式集群出现的报错- ERROR:org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
可能的原因如下: 1. 时间没有同步 HBase需要结点间的时间必须是同步的,可以使用date命令在Linux查看时间(同步时间命令:ntpdate 1.cn.pool.ntp.org) 2. 底层采 ...
部署CM集群首次运行报错：Formatting the name directories of the current NameNode.
1. 报错提示 Formatting the name directories of the current NameNode. If the name directories are not emp ...
docker离线安装启动报错Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
安装报错的提示:systemctl status docker.service 好吧,原来是缺少库文件.验证一下想法吧,yum -y install libseccomp 成功后,再启动docker发 ...
redis-trib.rb创建Redis集群时失败报错解决方案
问题描述: [root@eshop-cache01 init.d]# redis-trib.rb create --replicas 1 192.168.1.110:7001 192.168.1.11 ...
Nginx集群配置启动报错
linux----------启动network的时候报错Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
1.仔细阅读上面的话,意思是让你执行 journalctl -xe 查看更详细的日志. 2.我当时导致这个情况的原因是因为,虚拟机加载的文件被我换了位置,导致没加载到最原始的centos包.关闭虚拟 ...

随机推荐

textarea 带换行符保存数据与带换行符展示数据
毕业设计进行ing~ 最近要想要实现一个站内邮箱,想要带换行地输出邮件主体内容. 这两天为了解决这个问题百度了好多东西,发现相关问题有很多记录,可能这确实是大多数初学者也碰到的问题.自己找了好多地方都 ...
Spring Boot 解决方案 - 会话
连接无状态使用 HTTP 的连接是无状态的,因此为了应对需要状态的服务例如用户登录,诞生了适合保存状态的设计-会话(session),本文就来探讨一下会话. 会话的使用 Spring Mvc 中使用 ...
Codeforces/TopCoder/ProjectEuler/CodeChef 散题笔记 (持续更新)
最近做到了一些有趣的散题,于是开个Blog记录一下吧… (如果有人想做这些题的话还是不要看题解吧…) 2017-03-16 PE 202 Laserbeam 题意:有一个正三角形的镜子屋,光线从$C$ ...
git第七节---git merge和git rebase
# git merge和git rebase 都可以进行分支合并 #git merge 合并后保留记录两个分支的记录 #git rebase合并后会展示成一个分支的记录,另一个分支的提交实际生成了一个 ...
负载均衡+session共享(memcached-session-manager实现)
前言先给大家伙拜个年,祝大家:新的一年健健康康,平平安安! 本文的形成参考了很多人的博客,最多的应该是青葱岁月兄的这篇博客,大家可以先去看下,熟悉一些内容,因为本文是直接实践,一些理论性的知识就需要 ...
SpringMvc @ResponseBody字符串中文乱码原因及解决方案
今天突然发现一个问题,后来在网上也找到了很多解决思路,自己也查找到了问题所在,记录一下. @RequestMapping(value = "/demo1") @ResponseBo ...
Linux 强制安装 rpm 包
Linux 正常安装 rpm 的命令是: rpm -ivh xxx.rpm 重复安装时需加属性: 软件包重复安装将会失败,若仍需要安装必须加 --replacepkgs 属性软件包的某个文件已在安装 ...
git+github/码云+VSCode （转载）
VSCode中使用git,参见. Git安装在初次使用时如果本地没有安装git会提示先安装git,然后重启vscode. 一.本地操作项目前提: 1)若本地没有git拉取下来的项目,用git克隆 ...
Chart控件，chart、Series、ChartArea曲线图绘制的重要属性介绍
先简单说一下,从图中可以看到一个chart可以绘制多个ChartArea,每个ChartArea都可以绘制多条Series.ChartArea就是就是绘图区域,可以有多个ChartArea叠加在一起, ...
Java语言的简介
Java语言的由来 Java是由Sun Microsystems公司推出的Java面向对象程序设计语言(以下简称Java语言)和Java平台的总称.由James Gosling和同事们共同研发,并在1 ...

tidb集群某个节点报错之：node_exporter-9100.service failed

tidb集群某个节点报错之：node_exporter-9100.service failed的更多相关文章

随机推荐

热门专题