转自:https://medium.com/wish-engineering/katalog-sync-reliable-integration-of-consul-and-kubernetes-ebe8aae0852a

Why use consul with Kubernetes (k8s)?

Consul is a well-known and widely used service discovery mechanism. Here at Wish, we have standardized on using consul as our service discovery system for quite some time. Although k8s has a built-in service discovery mechanism, we want to continue our usage of consul as the primary service discovery mechanism. This way services in k8s are discoverable outside of k8s and aren’t tied to a specific cluster. Now when a service needs to ramp in and out of k8s they can do so gradually.

The previous solution: sidecar consul-agent

When we launched k8s we decided to add a consul-agent sidecar to each pod. In our k8s environment, each pod has a routable IP in our VPC and as such this functions pretty well. However, after having used this for several months we have noticed a few pain points:

  1. Configuration: Each service/namespace in k8s needs to have the same consul configuration (client configuration, encryption key, etc.). We largely dealt with this through jsonnet templating, but even so, we ended up having the encryption key in each namespace and a fair amount of configuration duplicated across services.
  2. Complexity: Using this sidecar approach means that we now have a full-blown consul node for each pod in k8s. For the initial migration to k8s things were generally moved over 1:1, EC2 instance → pod, but as we continued to refine our sizing etc. we ended up having significantly more pods than we had EC2 instances before. In addition, this means we effectively had N nodes participating in the consul memberlist running on the same instance or hardware.
  3. Failure modes: With a consul-agent sidecar on each pod we can run into thundering herd issues in consul failure modes due to the large number of nodes in the cluster.
  4. Noisy alerts: Consul’s memberlist expects members to be more-or-less long-lived. Deregistration of a node in the memberlist takes (by default) 72h which means that a node will still be part of the memberlist even after leaving intentionally. In practice, this is a nuisance as the node still shows up in consul’s service discovery until it drops off (e.g. Prometheus’ consul discovery).
  5. Consul checks vs k8s checks: Probably the most painful issue we’ve run into is configuring consul checks. K8s itself has concepts of liveliness and readiness which are used within k8s to manage the pods themselves. In addition to this k8s readiness, we also needed to configure consul so it would add/remove the service from rotation based on the pod’s readiness. Operationally this is painful to keep in-sync as consul and k8s offer different mechanisms for health checks.

Looking for alternatives: consul-k8s

At the end of last yearhashicorp announced consul-k8s as a mechanism to sync services to/from k8s and consul. We were excited to switch to a more k8s-native mechanism for syncing state to consul, and quickly started prototyping with it. Going into it we listed our requirements as:

  • Configuration through k8s annotations
  • Readiness sync
  • High availability with no single point of failure (SPOF)

The good

Consul-k8s offers mechanisms to sync both from k8s → consul and consul → k8s. We don’t have a need for consul → k8s, so we’ll focus on the k8s → consul sync. Consul-k8s sync is focused on syncing services from k8s → consul. This means that you can configure syncing etc. at the service-level in k8s through annotations. For example (borrowed from here):

kind: Service
apiVersion: v1
metadata:
name: my-service
annotations:
"consul.hashicorp.com/service-name": my-consul-service

This configuration-through-annotation both dramatically simplifies templating and is significantly easier to understand.

The bad

Unsurprisingly (since we are writing this post) we ran into some issues while testing out consul-k8s. Initially, we ran into some issues with multi-cluster support but those were resolved relatively quickly. After getting a proof of concept working with multi-cluster support we started some failure mode testing. During this testing, we found 2 major issues:

In addition to those issues, we found a requirement we didn’t know we had! With the sidecar consul-agent approach if the consul-agent was unable to join the cluster for some reason the pod would fail, and k8s would halt the deployment. Consul-k8s, however, is a single-process for the cluster which asynchronously syncs state from k8s to consul.

  1. Kubelet starts container on Node
  2. Kubelet updates k8s API
  3. Consul-k8s notices change in k8s-api
  4. Consul-k8s pushes change to consul

This means the ability of consul-k8s to sync the k8s state to consul is completely independent of the k8s pod deployments. This implies that we could easily create scenarios where the entire service would complete a rolling update (with new pod IPs, etc.) without that state being synced to consul. This means that we could get into a state where service discovery has 0 correct entries in it so clients would be unable to connect to the service

katalog-sync: Reliable Integration of Consul and Kubernetes的更多相关文章

  1. Announcing HashiCorp Consul + Kubernetes

    转自:https://www.hashicorp.com/blog/consul-plus-kubernetes We're excited to announce multiple features ...

  2. kubernetes实战之部署一个接近生产环境的consul集群

    系列目录 前面我们介绍了如何在windows单机以及如何基于docker部署consul集群,看起来也不是很复杂,然而如果想要把consul部署到kubernetes集群中并充分利用kubernete ...

  3. Building a Service Mesh with HAProxy and Consul

    转自:https://www.haproxy.com/blog/building-a-service-mesh-with-haproxy-and-consul/ HashiCorp added a s ...

  4. Spring Cloud 微服务一:Consul注册中心

    Consul介绍 Consul is a service mesh solution providing a full featured control plane with service disc ...

  5. docker及k8s安装consul

    一.docker部署consul集群 参考文献:https://www.cnblogs.com/lonelyxmas/p/10880717.html https://blog.csdn.net/qq_ ...

  6. 基于Docker的Consul集群实现服务发现

    服务发现 其实简单说,服务发现就是解耦服务与IP地址之间的硬绑定关系,以典型的集群为例,对于集群来说,是有多个节点的,这些节点对应多个IP(或者同一个IP的不同端口号),集群中不同节点责任是不一样的. ...

  7. 08-SpringCloud Consul

    Consul简介 官网 Consul下载地址 What is Consul? Consul is a service mesh solution providing a full featured c ...

  8. oracle_hc.sql

    select event,count(1) from gv$session group by event order by 2;exec dbms_workload_repository.create ...

  9. 多语言(Java、.NET、Node.js)混合架构下开源调用链追踪APM项目初步选型

    1. 背景 我们的技术栈包括了Java..NET.Node.js等,并且采用了分布式的技术架构,系统性能管理.问题排查成本越来越高. 2. 基本诉求 针对我们的情况,这里列出了选型的主要条件,作为最终 ...

随机推荐

  1. C++关于运算符的注意事项

    1.函数调用也是一种特殊的运算符,对运算对象的个数不作限制. 2.几元运算符,是基于作用的对象的数量. 3.不同类型的运算对象进行运算,可能会出现类型转换,一般情况下小整数类型会被转换成较大的整数类型 ...

  2. Linux学习 :移植linux-3.4.83到JZ2440开发板

    一.编译环境搭建: 1.linux源码下载:https://www.kernel.org/ (最新)  https://mirrors.edge.kernel.org/pub/linux/kernel ...

  3. TBody scrollbar 设置

    由于scrollbar自身有宽度 对于tbody来说可能会挤压与thead不对齐下面办法能够解决大致问题 1.设置tbody display:block :  overflow-y:auto:(并且修 ...

  4. Centos7部署kubelet(六)

    1.二进制包准备将软件包从linux-node1复制linux-node2.linux-node3中去 [root@linux-node1 ssl]# cd /usr/local/src/kubern ...

  5. android小程序-电子钢琴-滑动连续响应

    原创文字,转载请标明出处: 利用Button实现简单地电子钢琴,可以简单地响应按钮的click事件来发出相应的声音.但是这样不能达到手指在屏幕滑动,而连续发声的效果,就像手指在真实钢琴按键上滑过一样. ...

  6. 使用grafana cli重置grafana密码

    Grafana CLIgrafana cli是grafana server自带的一个小巧的二进制工具,用来在运行grafana server的机器上执行命令.插件grafana cli工具允许你安装升 ...

  7. 2--linux命令--查看磁盘空间

    du命令用来查看目录或文件所占用磁盘空间的大小.常用选项组合为:du -sh 二.du常用的选项: -h:以人类可读的方式显示 -a:显示目录占用的磁盘空间大小,还要显示其下目录和文件占用磁盘空间的大 ...

  8. 2.5 SeleniumBuilder辅助定位元素

    前言对于用火狐浏览器的小伙伴们,你还在为定位元素而烦恼嘛?上古神器Selenium Builder来啦,哪里不会点哪里,妈妈再也不用担心我的定位元素问题啦!(但是也不是万能,基本上都能覆盖到) 2.5 ...

  9. jvm内存增长问题排查简例

    jvm内存增长问题排查 排查个jvm 内存占用持续增加的问题,纪录一下,引以为戒. 运维发现应用jvm内存占用在发布后回落,然后持续增高,,dump后分析一下: 占内存的大部分是这种名字相似的bean ...

  10. url的反向解析

    1. url的语法格式: url(regex, views, **kwargs, name) name:为地址起别名,反向解析时使用 2.反向解析 对于Django中的url反向解析,是分模板和视图的 ...