ceph crush 之 crush_do

crush_do_rule中，用了一个scratch空间来完成item的搜索。

scratch空间总共有3个max_result这么大，并且按照max_result长度划分为三个部分（下图中的a、b、c，其中c只在recursive_to_leaf时用到，本文不涉及）。

a、b两个部分就用来生成result。a、b两个部分分别由o、w两个数组指针来指向，在每完成一个select step后，o、w互换指向的位置，上一次的o将变成本次的w，成为本次step遍历的对象，而上次的w将变成本次的o用于存放本次step的output items。

以Sage Weil论文中的例子来演示此过程：

take(root) ->root

select(1, row) ->row2 //图例step1

select(3, cabinet) ->cab21 cab23 cab24 //图例step2

select(1, disk) ->disk2107 disk2313 disk2437 //图例step3

emit //图例step4 （从scratch数组中将选中的item拷贝到result数组中）

简化后的骨干代码如下：

1、省略了itemid的合法性校验代码

2、仅考虑firstn的情况（即仅考虑replica策略）

3、不考虑recursive_to_leaf的情况（此为工程优化，非Sage Weil论文的核心内容）

int crush_do_rule(const struct crush_map *map,

          int ruleno, int x, int *result, int result_max,

          const __u32 *weight, int weight_max,

          int *scratch)

{

    int osize, wsize = ;

    int *w, *o;

    w = scratch;

    o = scratch + result_max;

    struct crush_rule *rule = map->rules[ruleno];

    for (__u32 step = ; step < rule->len; step++) {

        struct crush_rule_step *curstep = &rule->steps[step];

        switch (curstep->op) {

        case CRUSH_RULE_TAKE:

            w[] = curstep->arg1;

            wsize = ;

            break;

        //Elar:

        //1. only consider fistn's situation

        //2. ignore recurse_to_leaf situation

        case CRUSH_RULE_CHOOSELEAF_FIRSTN:

        case CRUSH_RULE_CHOOSE_FIRSTN:

            /* reset output */

            osize = ;

            for (int i = ; i < wsize; i++) {

                int numrep = curstep->arg1;

                int outpos = ;

                int type = curstep->arg2;

                int bno = - - w[i];//Elar: get bucketId

                struct crush_bucket *bucket = map->buckets[bno];

                osize += crush_choose_firstn(map, bucket, weight, weight_max, x, numrep, type, o+osize, outpos);

            }

            /* swap o and w arrays */

            int *tmp = o; o = w; w = tmp;

            wsize = osize;

            break;

        case CRUSH_RULE_EMIT:

            int i = ;

            result_len = ;

            for (; i < wsize && result_len < result_max; i++) {

                result[result_len] = w[i];

                result_len++;

            }

            wsize = ;

            break;

        default:

            break;

        }

    }

    return result_len;

}

static int crush_choose_firstn(const struct crush_map *map,

                   struct crush_bucket *bucket,

                   const __u32 *weight,

                   int weight_max,

                   int x,

                   int numrep,

                   int type,

                   int *out)

{

    for (int rep = ; rep < numrep; rep++) {

        /* keep trying until we get a non-out, non-colliding item */

        unsigned int ftotal = ;

        bool skip_rep = ;

        do {

            unsigned int flocal = ;

            bool retry_descent = false;

            struct crush_bucket *in = bucket;

            do {

                bool retry_bucket = false;

                int r = rep + parent_r;

                /* r' = r + f_total */

                r += ftotal;

                /* bucket choose */

                int item = crush_bucket_choose(in, x, r);

                int itemtype;

                if (item < )//Elar: if item is a bucket, then get its type

                    itemtype = map->buckets[--item]->type;

                else//Elar: if item is a device, then its type=0

                    itemtype = ;

                //Elar: if this item's type is not what we expected, then keep going until we get an match one!

                if (itemtype != type) {

                    in = map->buckets[--item];

                    retry_bucket = ;

                    continue;

                }

                // Elar: check if item has already been in the output array

                bool collide = false;

                for (i = ; i < outpos; i++) {

                    if (out[i] == item) {

                        collide = true;

                        break;

                    }

                }

                bool reject = false;

                if (itemtype == ){

                    //Elar: check if this item has been marked as "out"

                    reject = is_out(map, weight,weight_max,item, x);

                }else{

                    reject = false;

                }

reject:

                if (reject || collide) {

                    ftotal++;

                    flocal++;

                    if still can try locally(with in the same bucket, try other items)

                        retry_bucket = true;

                    else if still can try descent(parent's or grandparent's sibling buckets)

                        /* then retry descent */

                        retry_descent = true;

                    else

                        /* else give up */

                        skip_rep = true;

                }

            } while (true == retry_bucket);

        } while (true == retry_descent);

        if (true == skip_rep) {

            continue;

        }

        out[outpos] = item;

        outpos++;

    }

    return outpos;

}

完整代码请移步git：

https://github.com/ceph/ceph/blob/master/src/crush/mapper.c

ceph crush 之 crush_do_rule的更多相关文章

ceph crush的问题
ceph crush的问题看一遍忘一遍,现将<ceph源码分析>一书中相关章节摘抄如下: 4.2.1 层级化的Cluster Map例4-1 Cluster Map定义层级化的Cluste ...
ceph crush算法和crushmap浅析
1 什么是crushmap crushmap就相当于是ceph集群的一张数据分布地图,crush算法通过该地图可以知道数据应该如何分布:找到数据存放位置从而直接与对应的osd进行数据访问和写入:故障域 ...
ceph 的crush算法 straw
很多年以前,Sage 在写CRUSH的原始算法的时候,写了不同的Bucket类型,可以选择不同的伪随机选择算法,大部分的模型是基于RJ Honicky写的RUSH algorithms 这个算法,这个 ...
Ceph相关
Ceph基础知识和基础架构简介 http://www.xuxiaopang.com/2020/10/09/list/#more大话Ceph http://www.xuxiaopang.com/2016 ...
ceph结构详解
引言那么问题来了,把一份数据存到一群Server中分几步? Ceph的答案是:两步. 计算PG 计算OSD 计算PG 首先,要明确Ceph的一个规定:在Ceph中,一切皆对象. 不论是视频,文本,照 ...
Ceph常规操作及常见问题梳理
Ceph集群管理每次用命令启动.重启.停止Ceph守护进程(或整个集群)时,必须指定至少一个选项和一个命令,还可能要指定守护进程类型或具体例程. **命令格式如 {commandline} [opt ...
ceph笔记(一)
一.ceph概述本质上是rados:可靠的.自动的.分布式对象存储特性:高效性(大型的网络raid,性能无限接近raid).统一性(支持文件存储.块存储.对象存储).可扩展性数据库的一个弱点:查表ce ...
Ceph 概述和理论
1.1 Ceph概述官网地址:https://docs.ceph.com/docs/master/ 1.Ceph简介概述:Ceph是可靠的.可扩展的.统一的.分布式的存储系统.同时提供对象存储RA ...
Ceph介绍及原理架构分享
https://www.jianshu.com/p/cc3ece850433 1. Ceph架构简介及使用场景介绍 1.1 Ceph简介 Ceph是一个统一的分布式存储系统,设计初衷是提供较好的性能. ...

随机推荐

【Java】剑指offer(7) 二叉树的下一个结点
本文参考自<剑指offer>一书,代码采用Java语言. 更多:<剑指Offer>Java实现合集题目给定一棵二叉树和其中的一个结点,如何找出中序遍历顺序的下一个结点? ...
PyQt5安装及ModuleNotFoundError: No module named 'PyQt5'问题解决
PyQt5安装及ModuleNotFoundError: No module named 'PyQt5'问题解决安装pyQt5费了很多的周折,不过现在还是安装好了,现在重新梳理一下整个安装过 ...
python下sqlite增删查改方法（转）
sqlite读写 #coding=utf-8 import sqlite3 import os #创建数据库和游标 if os.path.exists(' test.db'): conn=sqli ...
利用zabbix监控oracle数据库
一.概述 zabbix是一款非常强大,同时也是应用最为广泛的开源监控软件,本文将给大家介绍如何利用zabbix监控oracle数据库. 二.环境介绍以下是我安装的环境,实际部署时并不需要跟我的环境一 ...
Java内存管理-程序运行过程（一）
勿在流沙住高台,出来混迟早要还的. 做一个积极的人编码.改bug.提升自己我有一个乐园,面向编程,春暖花开! 相信在做Java开发的伙伴一定知道 JVM(Java Virtual Machine( ...
常见的CSS Hack
原文地址: 小昱博客 - 常见的CSS Hack 转载请注明出处,谢谢! 什么是CSS hack 由于不同厂商的流览器或某浏览器的不同版本(如IE6-IE11,Firefox/Safari/Opera ...
Codeforces.788C.The Great Mixing(bitset DP / BFS)
题目链接 \(Description\) 有k种饮料,浓度Ai给出,求用最少的体积配成n/1000浓度的饮料. \(Solution\) 根据题意有方程 (A1x1+A2x2+...+Anxn)/[( ...
CAD画图技巧经验
1.CAD中如何输入特殊符号 %% d ——绘制“℃”符号.例如: 98.6 ℃—— 98.6 %% dC : %% c ——绘制圆直径“φ”符号.例如:φ 30 ——%% c30 : %% p —— ...
Django关于设置自定义404和安装debug-toolbar的笔记
Django关于设置自定义404和安装debug-toolbar的笔记关于设置404 先做好404页面,然后在views.py文件中做好映射,最后是在urls.py做好路由,而这个urls.py必须 ...
pom.xml将jar包导入
2.5是Maven的版本

ceph crush 之 crush_do_rule

ceph crush 之 crush_do_rule的更多相关文章

随机推荐

热门专题