berkeley db replica机制 - election algorithm
repmgr_method.c, __repmgr_start_int()
初始2个elect线程.
repmgr_elect.c, __repmgr_init_election()
__repmgr_elect_thread()
__repmgr_elect_main()
lease, preferred master mode,
rep_elect.c, __repmgr_elect()
__rep_elect_init()
lockout,
if (rep->egen != egen) // then out
tiebreaker
/* Use the last commit record as the LSN in the vote
__rep_write_egen
__rep_tally // tally our own vote
__rep_cmp_vote // 把我们自己预先记录为winner
__rep_send_vote() // -send vote1, our own vote, REP_VOTE1
phase1, wait...
if (rep->sites >= rep->nvotes) { // 满足进入phase2, 不满足就退出了
rep->sites - sites heard from.
rep->nvotes - Number of votes needed.
send vote2/ 或我们自己 是winner的情况, 投自己一票
我赢了么?
rep_record.c, __rep_process_message_int()
case REP_VOTE1:
ret = __rep_vote1(env, rp, rec, eid);
break;
case REP_VOTE2:
ret = __rep_vote2(env, rec, eid);
__rep_vote1()
我们自己是master, send REP_NEWMASTER, 退出
若收到以前egen的vote, send REP_ALIVE
若收到以后egen的vote, 终止当前vote, 更新egen
* Ignore vote1's if we're in phase 2.
__rep_tally - 记录下来, 如是新的vote site, rep->sites++
__rep_cmp_vote // 比较此vote1和我们已有的winner
如果已经得到所有site的vote1, 进入phase2
- 我们是winner, claim; 否则vote2 别人
如需要(full election?, 第一次拿到site的vote1), resend our vote1 到这个site
__rep_vote2()
/*
* Record this vote. In a VOTE2, the only valid entry
* in the vote information is the election generation.
*
* There are several things which can go wrong that we
* need to account for:
* 1. If we receive a latent VOTE2 from an earlier election,
* we want to ignore it.
* 2. If we receive a VOTE2 from a site from which we never
* received a VOTE1, we want to record it, because we simply
* may be processing messages out of order or its vote1 got lost,
* but that site got all the votes it needed to send it.
* 3. If we have received a duplicate VOTE2 from this election
* from the same site we want to ignore it.
* 4. If this is from the current election and someone is
* really voting for us, then we finally get to record it.
*/
rep_tally - 若 新的site发出的 vote2, rep->votes++
#define I_HAVE_WON(rep, winner) \
((rep)->votes >= (rep)->nvotes && winner == (rep)->eid)
rep->sites - sites heard from.
rep->nvotes - Number of votes needed.
rep->votes - Number of votes for this site.
rep->nsites - Number of sites in group.
/*
* We need to check sites == nsites, not more than half
* like we do in __rep_elect and the VOTE2 code. The
* reason is that we want to process all the incoming votes
* and not short-circuit once we reach more than half. The
* real winner's vote may be in the last half.
*/
#define IS_PHASE1_DONE(rep) \
((rep)->sites >= (rep)->nsites && (rep)->winner != DB_EID_INVALID)
u_int32_t egen; /* Replication election generation. */
REP_NEWMASTER - 我是新的master
REP_MASTER_REQ - 谁是master?
rep_util.c, __rep_new_master() 与新master同步
/*
* Election gen file name
* The file contains an egen number for an election this client has NOT
* participated in. I.e. it is the number of a future election. We
* create it when we create the rep region, if it doesn't already exist
* and initialize egen to 1. If it does exist, we read it when we create
* the rep region. We write it immediately before sending our VOTE1 in
* an election. That way, if a client has ever sent a vote for any
* election, the file is already going to be updated to reflect a future
* election, should it crash.
*/
#define REP_EGENNAME "__db.rep.egen"
typedef struct {
u_int32_t egen; /* Voter's election generation. */
int eid; /* Voter's ID. */
} REP_VTALLY;
rep_elect.c, __rep_tally()
* Ignore votes from earlier elections (i.e. we've heard
* from this site in this election, but its vote from an
* earlier election got delayed and we received it now).
* However, if we happened to hear from an earlier vote
* and we recorded it and we're now hearin
__rep_cmp_vote()
/* Make ourselves the winner to start. */
rep->winner 记录下已知的winner
__rep_elect_done()
- 清elect flag, 清rep->votes,.. rep->egen++
berkeley db replica机制 - election algorithm的更多相关文章
- berkeley db replica机制 - 消息处理
repmgr_method.c, __repmgr_start_int()repmgr_method.c, __repmgr_start_msg_threads()repmgr_msg.c, __re ...
- berkeley db replica机制 - 主从同步
repmgr/repmgr_net.c, __repmgr_send(): 做send_broadcast, 然后根据policy 对DB_REP_PERMANENT的处理 __repmgr_send ...
- The Architecture of Open Source Applications: Berkeley DB
最近研究内存关系数据库的设计与实现,下面一篇为berkeley db原始两位作为的Berkeley DB设计回忆录: Conway's Law states that a design reflect ...
- Berkeley DB
最近用BDB写点东西,写了挺多个测试工程.列下表,也理清楚最近的思路 1.测试BDB程序,包括打开增加记录,查询记录,获取所有记录.将数据转存mysql 程序的不足,增加记录仅仅只有key和value ...
- Oracle Berkeley DB Java 版
Oracle Berkeley DB Java 版是一个开源的.可嵌入的事务存储引擎,是完全用 Java 编写的.它充分利用 Java 环境来简化开发和部署.Oracle Berkeley DB Ja ...
- 新浪研发中心: Berkeley DB 使用经验总结
http://blog.sina.com.cn/s/blog_502c8cc40100yqkj.html NoSQL是现在互联网Web2.0时代备受关注的技术之一,被用来存储大量的非关系型的数据.Be ...
- Berkeley DB基础教程
一.Berkeley DB的介绍 (1)Berkeley DB是一个嵌入式数据库,它适合于管理海量的.简单的数据.如Google使用其来保存账户信息,Heritrix用其来保存froniter. (2 ...
- Berkeley DB 使用经验总结
作者:陈磊 NoSQL是现在互联网Web2.0时代备受关注的技术之一,被用来存储大量的非关系型的数据.Berkeley DB作为一款优秀的Key/Value存储引擎自然也在讨论之列.最近使用BDB来发 ...
- Berkeley DB Java Edition 简介
一. 简介 Berkeley DB Java Edition (JE)是一个完全用JAVA写的,它适合于管理海量的,简单的数据. l 能够高效率的 ...
随机推荐
- archlinux 传统方法编译内核linux kernel 3.3.7
From: http://hi.baidu.com/flashgive/item/eaef6326b5eb73d3a417b662 archlinux中传统方法编译内核 1)下载内核以及补丁并解压: ...
- QueryRunner(common-dbutils.jar)
QueryRunner update方法:* int update(String sql, Object... params) --> 可执行增.删.改语句* int update(Connec ...
- 1,SFDC 管理员篇 - 基本设置
1, 公司配置 Setup | Administrator| Company Profile *Company Inforamtion:公司基础信息,License信息,重要的设置包括本地时间,币种, ...
- #error作用
指令 用途 # 空指令,无任何效果 #include 包含一个源代码文件 #define 定义宏 #undef 取消已定义的宏 #if 如果给定条件为真,则编译下面代码 #ifdef 如果宏已经定义, ...
- Linux网络常用指令
5.1 网络参数设定使用的指令 ifconfig 查询 设定网络卡与 IP 网域等相关参数: ifup, ifdown 这两个档案是 script,透过更简单的方式来启动网络接口: route 查 ...
- javascript generate a guid
function Guid() { var random = (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1); retu ...
- TortoiseSVN文件夹及文件状态图标不显示解决方法
win8 64位系统,原本svn是好用的,安装了klive金山快盘后,svn图标都不显示了.最后通过修改注册表解决: win+R调出运行框,输入regedit,打开注册表编辑器. HKEY_LOCAL ...
- C语言基础_2
scanf函数可以从键盘上读取数据并记录到变量中.为了使用这个函数也需要在文件开头使用如下的预处理指令#include <stdio.h>scanf函数使用的时候所需要的初始数据和prin ...
- CIDR-Address介绍
CIDR是一种用二进制表示法来代替十进制表示法的新方法. IP地址有“类”的概念,/8掩码是A类,/16掩码是B类,/24掩码是C类等等.但是/12,/18,/25呢?这就是无类的概念了,CIDR的作 ...
- 将C语课设传到了Github和Code上 2015-91-18
一直听说Git好使,以前捣鼓过没弄成,现在考完试了终于可以静下心来研究研究. 哎,我要是当时做课设的时候就用Git,也能省下不少事呢. 使用的Git教程,刚看个开头: 廖雪峰的Git教程 http:/ ...