ThreadCachedInt
folly/ThreadCachedInt.h
High-performance atomic increment using thread caching.
folly/ThreadCachedInt.h introduces a integer class designed for high performance increments from multiple threads simultaneously without loss of precision. It has two read modes, readFast gives a potentially stale value with one load, and readFull gives the exact value, but is much slower, as discussed below.
Performance
Increment performance is up to 10x greater than std::atomic_fetch_add in high contention environments. See folly/test/ThreadCachedIntTest.h for more comprehensive benchmarks.
readFast is as fast as a single load.
readFull, on the other hand, requires acquiring a mutex and iterating through a list to accumulate the values of all the thread local counters, so is significantly slower than readFast.
Usage
Create an instance and increment it with increment or the operator overloads. Read the value with readFast for quick, potentially stale data, or readFull for a more expensive but precise result. There are additional convenience functions as well, such as set.
ThreadCachedInt<int64_t> val;
EXPECT_EQ(, val.readFast());
++val; // increment in thread local counter only
EXPECT_EQ(, val.readFast()); // increment has not been flushed
EXPECT_EQ(, val.readFull()); // accumulates all thread local counters
val.set();
EXPECT_EQ(, val.readFast());
EXPECT_EQ(, val.readFull());
Implementation
folly::ThreadCachedInt uses folly::ThreadLocal to store thread specific objects that each have a local counter. When incrementing, the thread local instance is incremented. If the local counter passes the cache size, the value is flushed to the global counter with an atomic increment. It is this global counter that is read with readFast via a simple load, but will not count any of the updates that haven't been flushed.
In order to read the exact value, ThreadCachedInt uses the extended readAllThreads() API of folly::ThreadLocal to iterate through all the references to all the associated thread local object instances. This currently requires acquiring a global mutex and iterating through the references, accumulating the counters along with the global counter. This also means that the first use of the object from a new thread will acquire the mutex in order to insert the thread local reference into the list. By default, there is one global mutex per integer type used in ThreadCachedInt. If you plan on using a lot of ThreadCachedInts in your application, considering breaking up the global mutex by introducing additional Tag template parameters.
set simply sets the global counter value, and marks all the thread local instances as needing to be reset. When iterating with readFull, thread local counters that have been marked as reset are skipped. When incrementing, thread local counters marked for reset are set to zero and unmarked for reset.
Upon destruction, thread local counters are flushed to the parent so that counts are not lost after increments in temporary threads. This requires grabbing the global mutex to make sure the parent itself wasn't destroyed in another thread already.
Alternate Implementations
There are of course many ways to skin a cat, and you may notice there is a partial alternate implementation in folly/test/ThreadCachedIntTest.cpp that provides similar performance. ShardedAtomicInt simply uses an array ofstd::atomic<int64_t>'s and hashes threads across them to do low-contention atomic increments, and readFull just sums up all the ints.
This sounds great, but in order to get the contention low enough to get similar performance as ThreadCachedInt with 24 threads, ShardedAtomicInt needs about 2000 ints to hash across. This uses about 20x more memory, and the lock-freereadFull has to sum up all 2048 ints, which ends up being a about 50x slower than ThreadCachedInt in low contention situations, which is hopefully the common case since it's designed for high-write, low read access patterns. Performance of readFull is about the same speed as ThreadCachedInt in high contention environments.
Depending on the operating conditions, it may make more sense to use one implementation over the other. For example, a lower contention environment will probably be able to use a ShardedAtomicInt with a much smaller array without hurting performance, while improving memory consumption and perf of readFull.
ThreadCachedInt的更多相关文章
- folly学习心得(转)
原文地址: https://www.cnblogs.com/Leo_wl/archive/2012/06/27/2566346.html 阅读目录 学习代码库的一般步骤 folly库的学习心得 ...
- Folly: Facebook Open-source Library Readme.md 和 Overview.md(感觉包含的东西并不多,还是Boost更有用)
folly/ For a high level overview see the README Components Below is a list of (some) Folly component ...
随机推荐
- php while循环
<html> <body> <?php $i=; ) { echo "The number is " . $i . "<br>& ...
- Java回顾之I/O
这篇文章主要回顾Java中和I/O操作相关的内容,I/O也是编程语言的一个基础特性,Java中的I/O分为两种类型,一种是顺序读取,一种是随机读取. 我们先来看顺序读取,有两种方式可以进行顺序读取,一 ...
- 一个产生临时图片Url的地方
- myeclipse出现src作为报名一部分src.com.*
打开历史悠久的项目,所有类报错,发现src变成了报名的一部分. 右击src->Build Path->Use as Source Folder即可. WebRoot中同理.
- UVA-11903 Just Finish it up
题目大意:一个环形跑道上有n个加油站,每个加油站可加a[i]加仑油,走到下一站需要w[i]加仑油,初始油箱为空,问能否绕跑道一圈,起点任选,若有多个起点,找出编号最小的. 题目分析:如果从1号加油站开 ...
- idea配置echache.xml报错Cannot resolve file 'ehcache.xsd'
解决方法: 打开settings->languages&frameworks->schemas and dtds ,添加地址 http://ehcache.org/ehcache. ...
- SPDY以及示例
SPDY是Google开发的基于传输控制协议(TCP)的应用层协议 .Google最早是在Chromium中提出的SPDY协议[1].目前已经被用于Google Chrome浏览器中来访问Google ...
- github打开慢,页面打不开,请求老是失败问题修复总结
感谢老铁 QQ(1218624820) 提供的方法建议 原因来自于DNS污染, 到下面的目录进行修改文件 C:\Windows\System32\drivers\etc 在后面粘贴下面的信息 192. ...
- 收集前端UI框架 持续更新中....
1. elementUI 饿了么出品 基于Vue2 http://element.eleme.io/#/zh-CN 2. ZUI 开源HTML5跨屏框架 (2018年1月4日更新)一个基于 Boot ...
- 使用Junit进行Java单元测试
1.新建一个Number类,该类中包含两个函数,求和.求差 2.在eclipse上安装Junit 右键test工程,选择“Properties”→“Java Build Path”→“Librarie ...