当kudu有tserver下线或者迁移或者修改hostname之后,旧的tserver会一直以dead状态出现,并且tserver日志中会有大量的连接重试日志,一天的错误日志会有几个G,

W0322 22:13:59.202749 16927 tablet_service.cc:290] Invalid argument: UpdateConsensus: Wrong destination UUID requested. Local UUID: e2f80a1fcf0c47f6b7f220a44d69297f. Requested UUID: 45bfb5b3e3ff41d9b1b1d2afab78d65c: from {username='kudu'} at 192.168.0.1:34724: tablet_id: "9933f18e59554ae6b5354e2a948469e9" caller_uuid: "9b164f37d04a484c8634ea86eae1b048" caller_term: 3 preceding_id { term: 2 index: 1873 } ops { id { term: 3 index: 1874 } timestamp: 6359719759241142272 op_type: NO_OP noop_request { } } dest_uuid: "45bfb5b3e3ff41d9b1b1d2afab78d65c" committed_index: 1874 all_replicated_index: 0 safe_timestamp: 6359719761707556864 last_idx_appended_to_leader: 1874

这时如果想要把这些dead状态的tserver去掉,并没有直接的命令,官方给出的方法如下:

Kudu does not currently have an automated way to remove a tablet server from a cluster permanently. Instead, use the following steps:

  • 1 Ensure the cluster is in good health using ksck. See Checking Cluster Health with ksck.

    •   首先保证集群是健康的(通过ksck命令)
  • 2 If the tablet server contains any replicas of tables with replication factor 1, these replicas must be manually moved off the tablet server prior to shutting it down. The kudu tablet change_config move_replica tool can be used for this.
    •   将dead状态的server上的副本进行迁移,如果有replication factor设置为1的数据,必须在下线前手工移动数据;
  • 3 Shut down the tablet server. After -follower_unavailable_considered_failed_sec, which defaults to 5 minutes, Kudu will begin to re-replicate the tablet server’s replicas to other servers. Wait until the process is finished. Progress can be monitored using ksck.
    •   只要tserver处于下线状态超过5分钟以上会自动进行副本迁移;
  • 4 Once all the copies are complete, ksck will continue to report the tablet server as unavailable. The cluster will otherwise operate fine without the tablet server. To completely remove it from the cluster so ksck shows the cluster as completely healthy, restart the masters. In the case of a single master, this will cause cluster downtime. With multimaster, restart the masters in sequence to avoid cluster downtime.
    •   当所有副本都迁移完之后,ksck依然会显示有tserver不可用,如果想完全去掉这些dead状态的server,需要重启master;

Do not shut down multiple tablet servers at once. To remove multiple tablet servers from the cluster, follow the above instructions for each tablet server, ensuring that the previous tablet server is removed from the cluster and ksck is healthy before shutting down the next.

最后,重启master之后在保证集群健康的前提下逐一重启tserver;

如果这样操作之后还是报错,说明可能有leader副本丢失,比如ksck报错

Tablet c58cef3f36a846b4bdf58447f77a6bcf of table 'impala::impala.test_kudu' is unavailable: 2 replica(s) not RUNNING
a46f0fd38eba4a5286098ff7fe260eb1: TS unavailable
45bfb5b3e3ff41d9b1b1d2afab78d65c: TS unavailable
9b164f37d04a484c8634ea86eae1b048 (server02:7050): RUNNING [LEADER]
All reported replicas are:
A = a46f0fd38eba4a5286098ff7fe260eb1
B = 45bfb5b3e3ff41d9b1b1d2afab78d65c
C = 9b164f37d04a484c8634ea86eae1b048
The consensus matrix is:
Config source | Replicas | Current term | Config index | Committed?
---------------+------------------------+--------------+--------------+------------
master | A B C* | | | Yes
A | [config not available] | | |
B | [config not available] | | |
C | [config not available] | | |

可用的副本可能存在同步延迟会丢失部分数据,这时如果已经确定leader副本不可恢复,则可以强制指定剩下的可用副本为leader,恢复tablet到健康状态;

The remaining replica is not the leader, so the leader replica failed as well. This means the chance of data loss is higher since the remaining replica on tserver-00 may have been lagging.

$ sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 <tablet-id> <tserver-00-uuid>

where <tablet-id> is e822cab6c0584bc0858219d1539a17e6 and <tserver-00-uuid> is the uuid of tserver-00,638a20403e3e4ae3b55d4d07d920e6de.

<tablet-id>为非健康的tablet,tserver-00:7150为可用副本所在的tserver,<tserver-00-uuid>为可用副本所在的tserver的uuid,这样就可以在可能丢失少量数据的情况下恢复tablet;

如果有问题的tablet非常多,可以参考如下命令:

$ kudu cluster ksck localhost|grep -e '^Tablet '|awk '{print $2}'|xargs -i echo "sudo -u kudu kudu remote_replica unsafe_change_config tserver-00:7150 {} <tserver-00-uuid>"

参考:

https://kudu.apache.org/docs/administration.html#tablet_server_decommissioning

https://kudu.apache.org/docs/administration.html#tablet_majority_down_recovery

【原创】大数据基础之Kudu(2)移除dead tsever的更多相关文章

  1. 【原创】大数据基础之Kudu(1)简介、安装、使用

    kudu 1.7 官方:https://kudu.apache.org/ 一 简介 kudu有很多概念,有分布式文件系统(HDFS),有一致性算法(Zookeeper),有Table(Hive Tab ...

  2. 【原创】大数据基础之Kudu(6)kudu tserver内存占用统计分析

    kudu tserver占用内存过高后会拒绝部分写请求,日志如下: 19/06/01 13:34:12 INFO AsyncKuduClient: Invalidating location 34b1 ...

  3. 【原创】大数据基础之Kudu(5)kudu增加或删除目录/数据盘

    kudu加减数据盘不能直接修改配置fs_data_dirs后重启,否则会报错: Check failed: _s.ok() Bad status: Already present: FS layout ...

  4. 【原创】大数据基础之Kudu(3)primary key

    关于kudu的primary key The primary key may not be changed after the table is created. You must drop and ...

  5. 【原创】大数据基础之Kudu(4)spark读写kudu

    spark2.4.3+kudu1.9 1 批量读 val df = spark.read.format("kudu") .options(Map("kudu.master ...

  6. 【原创】大数据基础之Zookeeper(2)源代码解析

    核心枚举 public enum ServerState { LOOKING, FOLLOWING, LEADING, OBSERVING; } zookeeper服务器状态:刚启动LOOKING,f ...

  7. 【原创】大数据基础之词频统计Word Count

    对文件进行词频统计,是一个大数据领域的hello word级别的应用,来看下实现有多简单: 1 Linux单机处理 egrep -o "\b[[:alpha:]]+\b" test ...

  8. 【原创】大数据基础之Impala(1)简介、安装、使用

    impala2.12 官方:http://impala.apache.org/ 一 简介 Apache Impala is the open source, native analytic datab ...

  9. 【原创】大数据基础之Flume(2)应用之kafka-kudu

    应用一:kafka数据同步到kudu 1 准备kafka topic # bin/kafka-topics.sh --zookeeper $zk:2181/kafka -create --topic ...

随机推荐

  1. 最简单易懂的Spring Security 身份认证流程讲解

    最简单易懂的Spring Security 身份认证流程讲解 导言 相信大伙对Spring Security这个框架又爱又恨,爱它的强大,恨它的繁琐,其实这是一个误区,Spring Security确 ...

  2. TensorFlow基础

    TensorFlow基础 SkySeraph  2017 Email:skyseraph00#163.com 更多精彩请直接访问SkySeraph个人站点:www.skyseraph.com Over ...

  3. 如何在.net 4.0下安装TLS1.2的支持

    原始出处:www.cnblogs.com/Charltsing/p/Net4TLS12.html 作者QQ: 564955427 最近提交请求发生错误:不支持请求的协议,研究了一下TLS1.2,发现这 ...

  4. Google Closure Compiler高级压缩混淆Javascript代码

    一.背景 前端开发中,特别是移动端,Javascript代码压缩已经成为上线必备条件. 如今主流的Js代码压缩工具主要有: 1)Uglify http://lisperator.net/uglifyj ...

  5. webpack4 学习 --- webpack和webpack-dev-server

    以前了解过webpack2, 所以对webpack 不是很陌生,就直接入主题吧.新建一个文件夹,就叫它webpack-tut吧.然后在文件中新建一个src 文件夹,存放我们的源文件,再在src 文件夹 ...

  6. git 学习(2) ----- 分支

    当我们进行程序开发的过程中,有时会产生一个新的想法,然后就想马上试验,那我们怎么办? 如果我们继续在现有的基础上进行开发,但最后想法不成功,我们还要进行版本回退?如果我们的新想法,需要很长时间才能实现 ...

  7. luogu2597-[ZJOI2012]灾难 && DAG支配树

    Description P2597 [ZJOI2012]灾难 - 洛谷 | 计算机科学教育新生态 Solution 根据题意建图, 新建一个 \(S\) 点, 连向每个没有入边的点. 定义每个点 \( ...

  8. 用UE4蓝图制作FPS_零基础学虚幻4第二季

    课时1:案例演示 05:12 课时2:工程准备 07:35 (把一个项目从一个工程移动到另一个工程) 1.新建一个空白工程,不包含初学者内容 2.选择我们要复制的工程,按右键,如下图: 复制到新工程的 ...

  9. mpvue——支持less

    安装 安装less和less-loader,我用的是淘宝源,你也可以直接npm $ cnpm install less less-loader --save 配置 打开build目录下的webpack ...

  10. bzoj 1058: [ZJOI2007]报表统计 (Treap)

    链接:https://www.lydsy.com/JudgeOnline/problem.php?id=1058 题面; 1058: [ZJOI2007]报表统计 Time Limit: 15 Sec ...