ceph crush 之 crush_do

crush_do_rule中，用了一个scratch空间来完成item的搜索。

scratch空间总共有3个max_result这么大，并且按照max_result长度划分为三个部分（下图中的a、b、c，其中c只在recursive_to_leaf时用到，本文不涉及）。

a、b两个部分就用来生成result。a、b两个部分分别由o、w两个数组指针来指向，在每完成一个select step后，o、w互换指向的位置，上一次的o将变成本次的w，成为本次step遍历的对象，而上次的w将变成本次的o用于存放本次step的output items。

以Sage Weil论文中的例子来演示此过程：

take(root) ->root

select(1, row) ->row2 //图例step1

select(3, cabinet) ->cab21 cab23 cab24 //图例step2

select(1, disk) ->disk2107 disk2313 disk2437 //图例step3

emit //图例step4 （从scratch数组中将选中的item拷贝到result数组中）

简化后的骨干代码如下：

1、省略了itemid的合法性校验代码

2、仅考虑firstn的情况（即仅考虑replica策略）

3、不考虑recursive_to_leaf的情况（此为工程优化，非Sage Weil论文的核心内容）

int crush_do_rule(const struct crush_map *map,

          int ruleno, int x, int *result, int result_max,

          const __u32 *weight, int weight_max,

          int *scratch)

{

    int osize, wsize = ;

    int *w, *o;

    w = scratch;

    o = scratch + result_max;

    struct crush_rule *rule = map->rules[ruleno];

    for (__u32 step = ; step < rule->len; step++) {

        struct crush_rule_step *curstep = &rule->steps[step];

        switch (curstep->op) {

        case CRUSH_RULE_TAKE:

            w[] = curstep->arg1;

            wsize = ;

            break;

        //Elar:

        //1. only consider fistn's situation

        //2. ignore recurse_to_leaf situation

        case CRUSH_RULE_CHOOSELEAF_FIRSTN:

        case CRUSH_RULE_CHOOSE_FIRSTN:

            /* reset output */

            osize = ;

            for (int i = ; i < wsize; i++) {

                int numrep = curstep->arg1;

                int outpos = ;

                int type = curstep->arg2;

                int bno = - - w[i];//Elar: get bucketId

                struct crush_bucket *bucket = map->buckets[bno];

                osize += crush_choose_firstn(map, bucket, weight, weight_max, x, numrep, type, o+osize, outpos);

            }

            /* swap o and w arrays */

            int *tmp = o; o = w; w = tmp;

            wsize = osize;

            break;

        case CRUSH_RULE_EMIT:

            int i = ;

            result_len = ;

            for (; i < wsize && result_len < result_max; i++) {

                result[result_len] = w[i];

                result_len++;

            }

            wsize = ;

            break;

        default:

            break;

        }

    }

    return result_len;

}

static int crush_choose_firstn(const struct crush_map *map,

                   struct crush_bucket *bucket,

                   const __u32 *weight,

                   int weight_max,

                   int x,

                   int numrep,

                   int type,

                   int *out)

{

    for (int rep = ; rep < numrep; rep++) {

        /* keep trying until we get a non-out, non-colliding item */

        unsigned int ftotal = ;

        bool skip_rep = ;

        do {

            unsigned int flocal = ;

            bool retry_descent = false;

            struct crush_bucket *in = bucket;

            do {

                bool retry_bucket = false;

                int r = rep + parent_r;

                /* r' = r + f_total */

                r += ftotal;

                /* bucket choose */

                int item = crush_bucket_choose(in, x, r);

                int itemtype;

                if (item < )//Elar: if item is a bucket, then get its type

                    itemtype = map->buckets[--item]->type;

                else//Elar: if item is a device, then its type=0

                    itemtype = ;

                //Elar: if this item's type is not what we expected, then keep going until we get an match one!

                if (itemtype != type) {

                    in = map->buckets[--item];

                    retry_bucket = ;

                    continue;

                }

                // Elar: check if item has already been in the output array

                bool collide = false;

                for (i = ; i < outpos; i++) {

                    if (out[i] == item) {

                        collide = true;

                        break;

                    }

                }

                bool reject = false;

                if (itemtype == ){

                    //Elar: check if this item has been marked as "out"

                    reject = is_out(map, weight,weight_max,item, x);

                }else{

                    reject = false;

                }

reject:

                if (reject || collide) {

                    ftotal++;

                    flocal++;

                    if still can try locally(with in the same bucket, try other items)

                        retry_bucket = true;

                    else if still can try descent(parent's or grandparent's sibling buckets)

                        /* then retry descent */

                        retry_descent = true;

                    else

                        /* else give up */

                        skip_rep = true;

                }

            } while (true == retry_bucket);

        } while (true == retry_descent);

        if (true == skip_rep) {

            continue;

        }

        out[outpos] = item;

        outpos++;

    }

    return outpos;

}

完整代码请移步git：

https://github.com/ceph/ceph/blob/master/src/crush/mapper.c

ceph crush 之 crush_do_rule的更多相关文章

ceph crush的问题
ceph crush的问题看一遍忘一遍,现将<ceph源码分析>一书中相关章节摘抄如下: 4.2.1 层级化的Cluster Map例4-1 Cluster Map定义层级化的Cluste ...
ceph crush算法和crushmap浅析
1 什么是crushmap crushmap就相当于是ceph集群的一张数据分布地图,crush算法通过该地图可以知道数据应该如何分布:找到数据存放位置从而直接与对应的osd进行数据访问和写入:故障域 ...
ceph 的crush算法 straw
很多年以前,Sage 在写CRUSH的原始算法的时候,写了不同的Bucket类型,可以选择不同的伪随机选择算法,大部分的模型是基于RJ Honicky写的RUSH algorithms 这个算法,这个 ...
Ceph相关
Ceph基础知识和基础架构简介 http://www.xuxiaopang.com/2020/10/09/list/#more大话Ceph http://www.xuxiaopang.com/2016 ...
ceph结构详解
引言那么问题来了,把一份数据存到一群Server中分几步? Ceph的答案是:两步. 计算PG 计算OSD 计算PG 首先,要明确Ceph的一个规定:在Ceph中,一切皆对象. 不论是视频,文本,照 ...
Ceph常规操作及常见问题梳理
Ceph集群管理每次用命令启动.重启.停止Ceph守护进程(或整个集群)时,必须指定至少一个选项和一个命令,还可能要指定守护进程类型或具体例程. **命令格式如 {commandline} [opt ...
ceph笔记(一)
一.ceph概述本质上是rados:可靠的.自动的.分布式对象存储特性:高效性(大型的网络raid,性能无限接近raid).统一性(支持文件存储.块存储.对象存储).可扩展性数据库的一个弱点:查表ce ...
Ceph 概述和理论
1.1 Ceph概述官网地址:https://docs.ceph.com/docs/master/ 1.Ceph简介概述:Ceph是可靠的.可扩展的.统一的.分布式的存储系统.同时提供对象存储RA ...
Ceph介绍及原理架构分享
https://www.jianshu.com/p/cc3ece850433 1. Ceph架构简介及使用场景介绍 1.1 Ceph简介 Ceph是一个统一的分布式存储系统,设计初衷是提供较好的性能. ...

随机推荐

Unicode字符编码表(转)
Unicode字符编码表版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/zhenyu5211314/article/details/5153 ...
pandas学习（创建数据，基本操作）
pandas学习(一) Pandas基本数据结构 Series类型数据 Dataframe类型基本操作 Pandas基本数据结构两种常用数据结构: Series 一维数组,与Numpy中的一维ar ...
一段让自己好好理解reduce的代码
const pick = (obj, arr) => arr.reduce((acc, curr) => (curr in obj && (acc[curr] = obj[ ...
幕布V1.1.9最新版漏洞集合
0X00 前言幕布本人最早接触是在P神的知识星球里面看到P神推荐的,后来下了个用着还挺好用. 之前一直都放一些零零散散的笔记,最近整理的时候,一时兴起,本着漏洞源于生活的态度,遂对幕布的安全性做了些 ...
未将对象引用设置到对象的实例 IIS
CMD C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\aspnet_regiis.exe -i
微信里面防止下拉"露底"组件
前言在微信里面浏览页面的时候,有一个很管用的方法可以区分这个页面是原生的还是H5形式的.随便打开一个页面,用力往下扯的时候,如果页面上方出现了"黑底",黑底上有一行诸如网页由ga ...
FastDFS_v4.06安装简记
提前准备所需4个包:FastDFS_v4.06.tar.gzfastdfs-nginx-module_v1.16.tar.gzlibevent-2.0.20-stable.tar.gznginx-1. ...
码云使用汉化 GitHub
enhancement 增强feature 功能duplicate 重复的invalid 无效的wontfix 无法修改不处理 ===== Wiki 主要是您项目的文档(说明, 状态)等等. 该项目 ...
POJ.2750.Potted Flower(线段树最大环状子段和)
题目链接 /* 13904K 532ms 最大环状子段和有两种情况,比如对于a1,a2,a3,a4,a5 一是两个端点都取,如a4,a5,a1,a2,那就是所有数的和减去不选的,即可以计算总和减最 ...
最全的JS判断是否为中文的方法
第一种代码:EXFCODE:1 function isChinese(temp)2 {3 var re=/[^/u4e00-/u9fa5]/;4 if (re. ...

ceph crush 之 crush_do_rule

ceph crush 之 crush_do_rule的更多相关文章

随机推荐

热门专题