K8S Operator的开发与使用
从应用角度考虑,为什么会出现如此多的Operator场景,为什么很多中间件和厂商都会提供基于Operator的部署方案,他的价值是什么?
随着时代的发展,企业应用部署环境从传统的物理机->虚拟机->容器->集群,现在k8s已司空见惯,作为企业应用运行基础环境的kubernates的能力也越来越多的被关注,集群的企业应用个人看来是分为几个方向
1、应用部署,使用容器的基本能力,部署应用,享受容器化以及集群管理带来的各项利好
2、Operator应用,当我们在运用集群能力部署应用时,可能需要一组资源同时部署、可能需要多个部署步骤、可能需要依赖多个对象、可能需要自定义参数及转换、可能需要监控故障、可能需要在故障时手动输入命令恢复,那么这些纯靠人工就需要好多步骤,如果开发一个Operator,通过它来管理对象的资源定义及管理,监控对象的生命周期,处理对象的异常事件,这样就很方便了,提升了组件的使用门槛和运维难度
3、集群能力提升,集群本身的能力是基于普遍场景,未结合企业应用的实际特点,故需要在集群原有能力基础上,扩展或新增能力,比如开发自己的调度器,比如构建自己的个性化弹性能力
既然要说Operator,就要说下什么是Operator,如果不了解K8S研发背景的可以百度或参考其他博文,大概就是k8s提供了各类扩展框架,最原始的是client-go,后来是更便利的kubebuilder(k8s)和operator-sdk(coreos)逐步统一,用户只需要按照预定的规范创建即可,看视频一看就懂 https://github.com/kubernetes-sigs/kubebuilder/blob/master/docs/gif/kb-demo.v2.0.1.svg
想一下,开发一个Operator需要什么? Operator就是注册了自己的逻辑到集群里,那么就要定义自己的CRD(非必需)、定义自己的监听、定义监听到对应事件之后的处理逻辑,可以先看一个最简单的实现 https://github.com/kubernetes/sample-controller
controller中注册事件监听处理器,分别处理时间AddFunc、UpdateFunc、DeleteFunc,将事件消息加入队列
消费队列处理,具体逻辑在handler里
上面是最最最简单的一个例子,其实也足够了解一般Operator定义的全过程了
CRD定义及使用在https://github.com/kubernetes/sample-controller/tree/master/artifacts/examples
好了,其实中间件提供的Operator也大同小异,这里以redis的两个operator为例看下这东西到底做了啥
首先是Redis集群,想一下,如果手工创建的核心步骤有哪些?
配置启动服务->CLUSTER MEET IP PORT互相发现->CLUSTER ADDSLOTS手动分配槽位->CLUSTER REPLICATE建立主从关系->各类故障发现及恢复
上面的步骤显而易见的繁琐,如果能通过集群自动运维是不是方便多了,于是就有了下面的这类Operator
https://github.com/ucloud/redis-cluster-operator
基础基于operator-sdk https://github.com/operator-framework/operator-sdk
按文档部署,版本不匹配还报错
error: error validating "redis.kun_redisclusterbackups_crd.yaml": error validating data: [ValidationError(CustomResourceDefinition.spec): unknown field "additionalPrinterColumns" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "subresources" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec, ValidationError(CustomResourceDefinition.spec): unknown field "validation" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1.CustomResourceDefinitionSpec]; if you choose to ignore these errors, turn validation off with --validate=false
当前版本的customresourcedefinitions版本是apiextensions.k8s.io/v1,而组件安装是基于apiVersion: apiextensions.k8s.io/v1beta1的,所以失败,怎么办?
转? kubectl-convert?不行 https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
回头到github里找issue,还真有 https://github.com/ucloud/redis-cluster-operator/issues/95,按照里面大佬给的果然就可以了


# Thus this yaml has been adapted for k8s v1.22 as per:
# https://kubernetes.io/docs/reference/using-api/deprecation-guide/#customresourcedefinition-v122
#
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: distributedredisclusters.redis.kun
spec:
group: redis.kun
names:
kind: DistributedRedisCluster
listKind: DistributedRedisClusterList
plural: distributedredisclusters
singular: distributedrediscluster
shortNames:
- drc
scope: Namespaced
versions:
- name: v1alpha1
served: true
storage: true
subresources:
status: {}
additionalPrinterColumns:
- name: MasterSize
jsonPath: .spec.masterSize
description: The number of redis master node in the ensemble
type: integer
- name: Status
jsonPath: .status.status
description: The status of redis cluster
type: string
- name: Age
jsonPath: .metadata.creationTimestamp
type: date
- name: CurrentMasters
jsonPath: .status.numberOfMaster
priority: 1
description: The current master number of redis cluster
type: integer
- name: Images
jsonPath: .spec.image
priority: 1
description: The image of redis cluster
type: string
schema:
openAPIV3Schema:
description: DistributedRedisCluster is the Schema for the distributedredisclusters API
type: object
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: DistributedRedisClusterSpec defines the desired state of DistributedRedisCluster
type: object
properties:
masterSize:
format: int32
type: integer
minimum: 1
maximum: 3
clusterReplicas:
format: int32
type: integer
minimum: 1
maximum: 3
serviceName:
type: string
pattern: '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'
annotations:
additionalProperties:
type: string
type: object
config:
additionalProperties:
type: string
type: object
passwordSecret:
additionalProperties:
type: string
type: object
storage:
type: object
properties:
type:
type: string
size:
type: string
class:
type: string
deleteClaim:
type: boolean
image:
type: string
securityContext:
description: 'SecurityContext defines the security options the container should be run with'
type: object
properties:
allowPrivilegeEscalation:
type: boolean
privileged:
type: boolean
readOnlyRootFilesystem:
type: boolean
capabilities:
type: object
properties:
add:
items:
type: string
type: array
drop:
items:
type: string
type: array
sysctls:
items:
type: object
properties:
name:
type: string
value:
type: string
required:
- name
- value
type: array
resources:
type: object
properties:
requests:
type: object
additionalProperties:
type: string
limits:
type: object
additionalProperties:
type: string
toleRations:
type: array
items:
type: object
properties:
effect:
type: string
key:
type: string
operator:
type: string
tolerationSeconds:
type: integer
format: int64
value:
type: string
status:
description: DistributedRedisClusterStatus defines the observed state of DistributedRedisCluster
type: object
properties:
numberOfMaster:
type: integer
format: int32
reason:
type: string
status:
type: string
maxReplicationFactor:
type: integer
format: int32
minReplicationFactor:
type: integer
format: int32
status:
type: string
reason:
type: string ---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: redisclusterbackups.redis.kun
spec:
group: redis.kun
names:
kind: RedisClusterBackup
listKind: RedisClusterBackupList
plural: redisclusterbackups
singular: redisclusterbackup
shortNames:
- drcb
scope: Namespaced
versions:
- name: v1alpha1
# Each version can be enabled/disabled by Served flag.
served: true
# One and only one version must be marked as the storage version.
storage: true
subresources:
status: {}
additionalPrinterColumns:
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
- jsonPath: .status.phase
description: The phase of redis cluster backup
name: Phase
type: string
schema:
openAPIV3Schema:
description: RedisClusterBackup is the Schema for the redisclusterbackups
API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: RedisClusterBackupSpec defines the desired state of RedisClusterBackup
type: object
status:
description: RedisClusterBackupStatus defines the observed state of RedisClusterBackup
type: object
type: object
整体看下代码有啥逻辑,忽略部分,可以看到入口是main.go,进来之后新建了manager,将Schema定义和管理器注册到了K8S,启动了管理器
目录基本符合K8S的整体编码风格,cmd定义执行起点,pkg为主程序,api定义api定义描述,controller定义了具体的监听和事件处理,其他工具类便于操作k8s、便于访问redis、便于创建各类资源等
以集群控制器为例,加入了Reconciler
加入了监听逻辑对创建、更新、删除都做了监听增加了对应的处理逻辑
协调处理逻辑,获取集群定义,检查状态与预期对比,构建集群,发现异常以及自动恢复
k8s operator 中reconcile当资源变化时就会触发reconcile方法,kubebuilder中在reconcile返回时,通过return reconcile.Result{RequeueAfter: SyncBuildStatusInterval}指定下次调度间隔 ,operator会将该资源变动的event重新放入队列,然后等到RequeueAfter参数指定的时间间隔之后重新取出来再调用reconcile处理。降低频繁写etcd,保障k8s集群不受影响
// Reconcile reads that state of the cluster for a DistributedRedisCluster object and makes changes based on the state read
// and what is in the DistributedRedisCluster.Spec
// Note:
// The Controller will requeue the Request to be processed again if the returned error is non-nil or
// Result.Requeue is true, otherwise upon completion it will remove the work from the queue.
func (r *ReconcileDistributedRedisCluster) Reconcile(request reconcile.Request) (reconcile.Result, error) {
reqLogger := log.WithValues("Request.Namespace", request.Namespace, "Request.Name", request.Name)
reqLogger.Info("Reconciling DistributedRedisCluster") // Fetch the DistributedRedisCluster instance
instance := &redisv1alpha1.DistributedRedisCluster{}
err := r.client.Get(context.TODO(), request.NamespacedName, instance)
if err != nil {
if errors.IsNotFound(err) {
return reconcile.Result{}, nil
}
return reconcile.Result{}, err
} ctx := &syncContext{
cluster: instance,
reqLogger: reqLogger,
} err = r.ensureCluster(ctx)
if err != nil {
switch GetType(err) {
case StopRetry:
reqLogger.Info("invalid", "err", err)
return reconcile.Result{}, nil
}
reqLogger.WithValues("err", err).Info("ensureCluster")
newStatus := instance.Status.DeepCopy()
SetClusterScaling(newStatus, err.Error())
r.updateClusterIfNeed(instance, newStatus, reqLogger)
return reconcile.Result{RequeueAfter: requeueAfter}, nil
} matchLabels := getLabels(instance)
redisClusterPods, err := r.statefulSetController.GetStatefulSetPodsByLabels(instance.Namespace, matchLabels)
if err != nil {
return reconcile.Result{}, Kubernetes.Wrap(err, "GetStatefulSetPods")
} ctx.pods = clusterPods(redisClusterPods.Items)
reqLogger.V(6).Info("debug cluster pods", "", ctx.pods)
ctx.healer = clustermanger.NewHealer(&heal.CheckAndHeal{
Logger: reqLogger,
PodControl: k8sutil.NewPodController(r.client),
Pods: ctx.pods,
DryRun: false,
})
err = r.waitPodReady(ctx)
if err != nil {
switch GetType(err) {
case Kubernetes:
return reconcile.Result{}, err
}
reqLogger.WithValues("err", err).Info("waitPodReady")
newStatus := instance.Status.DeepCopy()
SetClusterScaling(newStatus, err.Error())
r.updateClusterIfNeed(instance, newStatus, reqLogger)
return reconcile.Result{RequeueAfter: requeueAfter}, nil
} password, err := statefulsets.GetClusterPassword(r.client, instance)
if err != nil {
return reconcile.Result{}, Kubernetes.Wrap(err, "getClusterPassword")
} admin, err := newRedisAdmin(ctx.pods, password, config.RedisConf(), reqLogger)
if err != nil {
return reconcile.Result{}, Redis.Wrap(err, "newRedisAdmin")
}
defer admin.Close() clusterInfos, err := admin.GetClusterInfos()
if err != nil {
if clusterInfos.Status == redisutil.ClusterInfosPartial {
return reconcile.Result{}, Redis.Wrap(err, "GetClusterInfos")
}
} requeue, err := ctx.healer.Heal(instance, clusterInfos, admin)
if err != nil {
return reconcile.Result{}, Redis.Wrap(err, "Heal")
}
if requeue {
return reconcile.Result{RequeueAfter: requeueAfter}, nil
} ctx.admin = admin
ctx.clusterInfos = clusterInfos
err = r.waitForClusterJoin(ctx)
if err != nil {
switch GetType(err) {
case Requeue:
reqLogger.WithValues("err", err).Info("requeue")
return reconcile.Result{RequeueAfter: requeueAfter}, nil
}
newStatus := instance.Status.DeepCopy()
SetClusterFailed(newStatus, err.Error())
r.updateClusterIfNeed(instance, newStatus, reqLogger)
return reconcile.Result{}, err
} // mark .Status.Restore.Phase = RestorePhaseRestart, will
// remove init container and restore volume that referenced in stateulset for
// dump RDB file from backup, then the redis master node will be restart.
if instance.IsRestoreFromBackup() && instance.IsRestoreRunning() {
reqLogger.Info("update restore redis cluster cr")
instance.Status.Restore.Phase = redisv1alpha1.RestorePhaseRestart
if err := r.crController.UpdateCRStatus(instance); err != nil {
return reconcile.Result{}, err
}
if err := r.ensurer.UpdateRedisStatefulsets(instance, getLabels(instance)); err != nil {
return reconcile.Result{}, err
}
waiter := &waitStatefulSetUpdating{
name: "waitMasterNodeRestarting",
timeout: 60 * time.Second,
tick: 5 * time.Second,
statefulSetController: r.statefulSetController,
cluster: instance,
}
if err := waiting(waiter, ctx.reqLogger); err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{Requeue: true}, nil
} // restore succeeded, then update cr and wait for the next Reconcile loop
if instance.IsRestoreFromBackup() && instance.IsRestoreRestarting() {
reqLogger.Info("update restore redis cluster cr")
instance.Status.Restore.Phase = redisv1alpha1.RestorePhaseSucceeded
if err := r.crController.UpdateCRStatus(instance); err != nil {
return reconcile.Result{}, err
}
// set ClusterReplicas = Backup.Status.ClusterReplicas,
// next Reconcile loop the statefulSet's replicas will increase by ClusterReplicas, then start the slave node
instance.Spec.ClusterReplicas = instance.Status.Restore.Backup.Status.ClusterReplicas
if err := r.crController.UpdateCR(instance); err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{}, nil
} if err := admin.SetConfigIfNeed(instance.Spec.Config); err != nil {
return reconcile.Result{}, Redis.Wrap(err, "SetConfigIfNeed")
} status := buildClusterStatus(clusterInfos, ctx.pods, instance, reqLogger)
if is := r.isScalingDown(instance, reqLogger); is {
SetClusterRebalancing(status, "scaling down")
}
reqLogger.V(4).Info("buildClusterStatus", "status", status)
r.updateClusterIfNeed(instance, status, reqLogger) instance.Status = *status
if needClusterOperation(instance, reqLogger) {
reqLogger.Info(">>>>>> clustering")
err = r.syncCluster(ctx)
if err != nil {
newStatus := instance.Status.DeepCopy()
SetClusterFailed(newStatus, err.Error())
r.updateClusterIfNeed(instance, newStatus, reqLogger)
return reconcile.Result{}, err
}
} newClusterInfos, err := admin.GetClusterInfos()
if err != nil {
if clusterInfos.Status == redisutil.ClusterInfosPartial {
return reconcile.Result{}, Redis.Wrap(err, "GetClusterInfos")
}
}
newStatus := buildClusterStatus(newClusterInfos, ctx.pods, instance, reqLogger)
SetClusterOK(newStatus, "OK")
r.updateClusterIfNeed(instance, newStatus, reqLogger)
return reconcile.Result{RequeueAfter: time.Duration(reconcileTime) * time.Second}, nil
}
创建
apiVersion: redis.kun/v1alpha1
kind: DistributedRedisCluster
metadata:
name: example-distributedrediscluster
namespace: default
spec:
image: redis:5.0.4-alpine
imagePullPolicy: IfNotPresent
masterSize: 3
clusterReplicas: 1
serviceName: redis-svc
# resources config
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 200m
memory: 150Mi
# pv storage
storage:
type: persistent-claim
size: 1Gi
class: disk
deleteClaim: true
部署效果跟官网下图一样,Operator在创建时做了
- 配置启动服务
- CLUSTER MEET IP PORT互相发现
- CLUSTER ADDSLOTS分配槽位
- CLUSTER REPLICATE建立主从关系
- SVC及其他资源创建
- 资源及状态监测
Operator协调日志
{"level":"info","ts":1674011374.0957515,"logger":"cmd","msg":"Go Version: go1.13.3"}
{"level":"info","ts":1674011374.095774,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1674011374.095777,"logger":"cmd","msg":"Version of operator-sdk: v0.13.0"}
{"level":"info","ts":1674011374.0957801,"logger":"cmd","msg":"Version of operator: v0.2.0-62+12f703c"}
{"level":"info","ts":1674011374.0959022,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1674011374.720209,"logger":"leader","msg":"Found existing lock","LockOwner":"redis-cluster-operator-7467976bc5-87f89"}
{"level":"info","ts":1674011374.7400174,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1674011375.8324194,"logger":"leader","msg":"Not the leader. Waiting."}
{"level":"info","ts":1674011378.1799858,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1674011378.7850115,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1674011378.7851572,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1674011381.2693467,"logger":"metrics","msg":"Metrics Service object updated","Service.Name":"redis-cluster-operator-metrics","Service.Namespace":"default"}
{"level":"info","ts":1674011381.873133,"logger":"cmd","msg":"Could not create ServiceMonitor object","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1674011381.8731618,"logger":"cmd","msg":"Install prometheus-operator in your cluster to create ServiceMonitor objects","error":"no ServiceMonitor registered with the API"}
{"level":"info","ts":1674011381.873167,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1674011381.8733594,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1674011381.8734646,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"redisclusterbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1674011381.8735168,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"distributedrediscluster-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1674011381.9737544,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"redisclusterbackup-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1674011381.9740582,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"distributedrediscluster-controller"}
{"level":"info","ts":1674011381.9740942,"logger":"controller_distributedrediscluster","msg":"Call CreateFunc","namespace":"default","name":"example-distributedrediscluster"}
{"level":"info","ts":1674011382.074047,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"redisclusterbackup-controller"}
{"level":"info","ts":1674011382.0740838,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"redisclusterbackup-controller","worker count":2}
{"level":"info","ts":1674011382.0741792,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"distributedrediscluster-controller","worker count":4}
{"level":"info","ts":1674011382.0742629,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011382.617124,"logger":"controller_distributedrediscluster","msg":">>> Sending CLUSTER MEET messages to join the cluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011382.6185405,"logger":"controller_distributedrediscluster.redis_util","msg":"node 10.43.0.198:6379 attached properly","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011383.6385891,"logger":"controller_distributedrediscluster","msg":"requeue","Request.Namespace":"default","Request.Name":"example-distributedrediscluster","err":"wait for cluster join: Cluster view is inconsistent"}
{"level":"info","ts":1674011393.6389103,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011393.6871004,"logger":"controller_distributedrediscluster","msg":">>> Sending CLUSTER MEET messages to join the cluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011393.688571,"logger":"controller_distributedrediscluster.redis_util","msg":"node 10.43.0.198:6379 attached properly","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011394.7059786,"logger":"controller_distributedrediscluster","msg":"requeue","Request.Namespace":"default","Request.Name":"example-distributedrediscluster","err":"wait for cluster join: Cluster view is inconsistent"}
{"level":"info","ts":1674011404.7062836,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011404.7589452,"logger":"controller_distributedrediscluster","msg":">>> Sending CLUSTER MEET messages to join the cluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011404.76041,"logger":"controller_distributedrediscluster.redis_util","msg":"node 10.43.0.68:6379 attached properly","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011405.7788498,"logger":"controller_distributedrediscluster","msg":"requeue","Request.Namespace":"default","Request.Name":"example-distributedrediscluster","err":"wait for cluster join: Cluster view is inconsistent"}
{"level":"info","ts":1674011415.7791476,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011415.8322582,"logger":"controller_distributedrediscluster","msg":">>> Sending CLUSTER MEET messages to join the cluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011415.8336947,"logger":"controller_distributedrediscluster.redis_util","msg":"node 10.43.0.134:6379 attached properly","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011416.8505604,"logger":"controller_distributedrediscluster","msg":"requeue","Request.Namespace":"default","Request.Name":"example-distributedrediscluster","err":"wait for cluster join: Cluster view is inconsistent"}
{"level":"info","ts":1674011426.850888,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011426.9262946,"logger":"controller_distributedrediscluster","msg":">>>>>> clustering","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"error","ts":1674011426.9263313,"logger":"controller_distributedrediscluster","msg":"fix manually","Request.Namespace":"default","Request.Name":"example-distributedrediscluster","statefulSet":"drc-example-distributedrediscluster-2","masters":"{Redis ID: feb151649216937e13721cb36a671ca8b1425ed8, role: Master, master: , link: connected, status: [], addr: 10.43.1.4:6379, slots: [5461-10922], len(migratingSlots): 0, len(importingSlots): 0},{Redis ID: e0129438837cd65796a88997848a660b8e73b9b8, role: Master, master: , link: connected, status: [], addr: 10.43.0.15:6379, slots: [10923-16383], len(migratingSlots): 0, len(importingSlots): 0}","error":"split brain","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/ucloud/redis-cluster-operator/pkg/controller/clustering.(*Ctx).DispatchMasters\n\t/src/pkg/controller/clustering/placement_v2.go:77\ngithub.com/ucloud/redis-cluster-operator/pkg/controller/distributedrediscluster.(*ReconcileDistributedRedisCluster).syncCluster\n\t/src/pkg/controller/distributedrediscluster/sync_handler.go:200\ngithub.com/ucloud/redis-cluster-operator/pkg/controller/distributedrediscluster.(*ReconcileDistributedRedisCluster).Reconcile\n\t/src/pkg/controller/distributedrediscluster/distributedrediscluster_controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1674011426.9368975,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"distributedrediscluster-controller","request":"default/example-distributedrediscluster","error":"DispatchMasters: split brain: drc-example-distributedrediscluster-2","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191004115801-a2eda9f80ab8/pkg/util/wait/wait.go:88"}
{"level":"info","ts":1674011426.9420218,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
{"level":"info","ts":1674011426.997737,"logger":"controller_distributedrediscluster","msg":">>>>>> clustering","Request.Namespace":"default","Request.Name":"example-distributedrediscluster"}
同样,Sentinel的Operator原理也是类似,不做过多重复描述
https://github.com/ucloud/redis-operator
K8S Operator的开发与使用的更多相关文章
- 【视频】k8s套娃开发调试dapr应用 - 在6月11日【开源云原生开发者日】上的演示
这篇博客是在2022年6月11日的[开源云原生]大会上的演讲中的演示部分.k8s集群套娃(嵌套)是指在一个k8s的pod中运行另外一个k8s集群,这想法看上去很疯狂,实际上非常实用. k8s集群套娃( ...
- 使用k8s operator安装和维护etcd集群
关于Kubernetes Operator这个新生事物,可以参考下文来了解这一技术的来龙去脉: https://yq.aliyun.com/articles/685522?utm_content=g_ ...
- k8s operator
https://coreos.com/blog/introducing-operators.html Site Reliability Engineer(SRE)是通过编写软件来运行应用程序的人员. ...
- spring-cloud-kubernetes服务发现之在k8s环境下开发spring cloud应用
通常情况下,我们的线上的服务在迁移到k8s环境下的时候,都是采用平滑迁移的方案.服务治理与注册中心等都是采用原先的组件.比如spring cloud应用,在k8s环境下还是用原来的一套注册中心(如eu ...
- 使用ksync 加速基于k8s 的应用开发
ksync 实际上实现了类似 docker docker run -v /foo:/bar 的功能,可以加速我们应用的开发&&运行 安装 mac os curl https://v ...
- 使用 Tye 辅助开发 k8s 应用竟如此简单(一)
最近正巧在进行 Newbe.Claptrap 新版本的开发,其中使用到了 Tye 来辅助 k8s 应用的开发.该系列我们就来简单了解一下其用法. Newbe.Claptrap 是一个用于轻松应对并发问 ...
- 《Kubernetes Operator 开发进阶》- 作者絮絮叨
目录 今天聊啥 本书读者 推荐序 推荐序1 - 邓洪超 推荐序2 - 任晶磊 推荐语 推荐语1 - 张磊 推荐语2 - 宋净超 推荐语3 - 王泽锋 推荐语4 - 周鹏飞 推荐语5 - 郑东旭 本书简 ...
- 在Kubernetes上运行有状态应用:从StatefulSet到Operator
一开始Kubernetes只是被设计用来运行无状态应用,直到在1.5版本中才添加了StatefulSet控制器用于支持有状态应用,但它直到1.9版本才正式可用.本文将介绍有状态和无状态应用,一个通过K ...
- 基于 K8s 做应用发布的工具那么多, 阿里为啥选择灰姑娘般的 Tekton ?
作者 | 邓洪超,阿里云容器平台工程师, Kubernetes Operator 第二人,云原生应用标准交付与管理领域知名技术专家 导读:近年来,越来越多专门给 Kubernetes 做应用发布的 ...
- Kubernetes 监控方案之 Prometheus Operator(十九)
目录 一.Prometheus 介绍 1.1.Prometheus 架构 1.2.Prometheus Operator 架构 二.Helm 安装部署 2.1.Helm 客户端安装 2.2.Tille ...
随机推荐
- Java 18-方法 认识方法与方法定义
1.认识方法 1)什么是方法 Java方法是语句的集合,它们在一起执行一个功能; 方法是解决一类问题的步骤的有序组合; 方法一般包含于类中; 方法在程序中被创建,在其他地方被引用2 2)方法的有点 使 ...
- moment.js相关知识总结
参考连接:https://www.jianshu.com/p/9c10543420de 1,ant-design-vue控件当中的日期控件 <a-date-picker @change=&quo ...
- 测试elasticsearch保存时报找不到类型的错误
java测试存储数据到es时报错:...ActionRequestValidationException: Validation Failed: 1: type is missing... /** * ...
- 【剑指Offer】【链表】链表中倒数第k个结点
题目:输入一个链表,输出该链表中倒数第k个结点. A1:先从头到尾扫描链表,得到链表的总长度n,然后再扫描一次链表,输出n-k+1处的结点 ===> 测试用例超时 A2:创建两个指针,一个遍历 ...
- 前端之JavaScript(Js)基础
JavaScript,简称Js HTML三把利剑之一,浏览器可解析Js,看着和Java像,实则和Java并无关系,和Python.Go.C++等都是一门独立的语言. 一种轻量级的编程语言. 可插入 H ...
- ElementUI实现手动上传
在做项目中,与同事遇到问题,顺手记录一下 <template> <div class="common-layout"> <el-button size ...
- 043_关于Salesforce集中权限的解释
1.创建Object的时候,一定要选中Deploy,避免在All Tabs 中找不到 2.在Profile里,选择 Standart tab Setting.Custom tab setting,有三 ...
- Leetcode48 旋转图像
48. 旋转图像 难度中等432 给定一个 n × n 的二维矩阵表示一个图像. 将图像顺时针旋转 90 度. 说明: 你必须在原地旋转图像,这意味着你需要直接修改输入的二维矩阵.请不要使用另一个矩阵 ...
- springboot自动装配静态成员变量
首先要说的是,springboot并不能装配静态类,但可以通过以下骚操作来实现: @Component public class StatisticLogger { private static Da ...
- 【RTOS】《基于嵌入式实时操作系统的程序设计技术》——任务的划分与封装
任务的划分与封装 关键任务的划分处理 对于某些对于系统的正常运作至关重要,少执行一次会对系统产生较大影响的功能,我们倾向于将它从原有任务中剥离出来,称为关键任务,用一个独立任务或者ISR(如外部中断) ...