Histogram
folly/Histogram.h
Classes
Histogram
Histogram.h defines a simple histogram class, templated on the type of data you want to store. This class is useful for tracking a large stream of data points, where you want to remember the overall distribution of the data, but do not need to remember each data point individually.
Each histogram bucket stores the number of data points that fell in the bucket, as well as the overall sum of the data points in the bucket. Note that no overflow checking is performed, so if you have a bucket with a large number of very large values, it may overflow and cause inaccurate data for this bucket. As such, the histogram class is not well suited to storing data points with very large values. However, it works very well for smaller data points such as request latencies, request or response sizes, etc.
In addition to providing access to the raw bucket data, the Histogram class also provides methods for estimating percentile values. This allows you to estimate the median value (the 50th percentile) and other values such as the 95th or 99th percentiles.
All of the buckets have the same width. The number of buckets and bucket width is fixed for the lifetime of the histogram. As such, you do need to know your expected data range ahead of time in order to have accurate statistics. The histogram does keep one bucket to store all data points that fall below the histogram minimum, and one bucket for the data points above the maximum. However, because these buckets don't have a good lower/upper bound, percentile estimates in these buckets may be inaccurate.
HistogramBuckets
The Histogram class is built on top of HistogramBuckets. HistogramBuckets provides an API very similar to Histogram, but allows a user-defined bucket class. This allows users to implement more complex histogram types that store more than just the count and sum in each bucket.
When computing percentile estimates HistogramBuckets allows user-defined functions for computing the average value and data count in each bucket. This allows you to define more complex buckets which may have multiple different ways of computing the average value and the count.
For example, one use case could be tracking timeseries data in each bucket. Each set of timeseries data can have independent data in the bucket, which can show how the data distribution is changing over time.
Example Usage
Say we have code that sends many requests to remote services, and want to generate a histogram showing how long the requests take. The following code will initialize histogram with 50 buckets, tracking values between 0 and 5000. (There are 50 buckets since the bucket width is specified as 100. If the bucket width is not an even multiple of the histogram range, the last bucket will simply be shorter than the others.)
folly::Histogram<int64_t> latencies(, , );
The addValue() method is used to add values to the histogram. Each time a request finishes we can add its latency to the histogram:
latencies.addValue(now - startTime);
You can access each of the histogram buckets to display the overall distribution. Note that bucket 0 tracks all data points that were below the specified histogram minimum, and the last bucket tracks the data points that were above the maximum.
unsigned int numBuckets = latencies.getNumBuckets();
cout << "Below min: " << latencies.getBucketByIndex().count << "\n";
for (unsigned int n = ; n < numBuckets - ; ++n) {
cout << latencies.getBucketMin(n) << "-" << latencies.getBucketMax(n)
<< ": " << latencies.getBucketByIndex(n).count << "\n";
}
cout << "Above max: "
<< latencies.getBucketByIndex(numBuckets - ).count << "\n";
You can also use the getPercentileEstimate() method to estimate the value at the Nth percentile in the distribution. For example, to estimate the median, as well as the 95th and 99th percentile values:
int64_t median = latencies.getPercentileEstimate(0.5);
int64_t p95 = latencies.getPercentileEstimate(0.95);
int64_t p99 = latencies.getPercentileEstimate(0.99);
Thread Safety
Note that Histogram and HistogramBuckets objects are not thread-safe. If you wish to access a single Histogram from multiple threads, you must perform your own locking to ensure that multiple threads do not access it at the same time.
Histogram的更多相关文章
- [LeetCode] Largest Rectangle in Histogram 直方图中最大的矩形
Given n non-negative integers representing the histogram's bar height where the width of each bar is ...
- poj 2559 Largest Rectangle in a Histogram - 单调栈
Largest Rectangle in a Histogram Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 19782 ...
- LeetCode 笔记系列 17 Largest Rectangle in Histogram
题目: Largest Rectangle in Histogram Given n non-negative integers representing the histogram's bar he ...
- LeetCode: Largest Rectangle in Histogram(直方图最大面积)
http://blog.csdn.net/abcbc/article/details/8943485 具体的题目描述为: Given n non-negative integers represent ...
- DP专题训练之HDU 1506 Largest Rectangle in a Histogram
Description A histogram is a polygon composed of a sequence of rectangles aligned at a common base l ...
- Largest Rectangle in Histogram
Given n non-negative integers representing the histogram's bar height where the width of each bar is ...
- 数据结构与算法(1)支线任务3——Largest Rectangle in Histogram
题目如下:(https://leetcode.com/problems/largest-rectangle-in-histogram/) Given n non-negative integers r ...
- LeetCode之Largest Rectangle in Histogram浅析
首先上题目 Given n non-negative integers representing the histogram's bar height where the width of each ...
- Largest Rectangle in a Histogram(DP)
Largest Rectangle in a Histogram Time Limit : 2000/1000ms (Java/Other) Memory Limit : 65536/32768K ...
- Elasticsearch聚合 之 Histogram 直方图聚合
Elasticsearch支持最直方图聚合,它在数字字段自动创建桶,并会扫描全部文档,把文档放入相应的桶中.这个数字字段既可以是文档中的某个字段,也可以通过脚本创建得出的. 桶的筛选规则 举个例子,有 ...
随机推荐
- pom配置之:snapshot快照库和release发布库
在使用maven过程中,我们在开发阶段经常性的会有很多公共库处于不稳定状态,随时需要修改并发布,可能一天就要发布一次,遇到bug时,甚至一天要发布N次.我们知道,maven的依赖管理是基于版本管理的, ...
- windows命令行工具
winver 检查Windows版本 wmimgmt.msc 打开Windows管理体系结构(wmi) wupdmgr Windows更新程序 wscript Windows脚本宿主设置 write ...
- Android编程实例-获取当前进程名字
下面代码是根据进程id获取进程名字: /** * 根据Pid获取当前进程的名字,一般就是当前app的包名 * * @param context 上下文 * @param pid 进程的id * @re ...
- HP G7服务器添加新硬盘
1. 停掉 服务器(必须停了服务器),插入新硬盘.开机,出现F9和F11的时候,按下F5(这个很坑爹,没有显示F5进入阵列配置),进入阵列控制界面之后按出现红色的提示后按下F8进入阵列控制管理界面.进 ...
- [持续更新]Python 笔记
本文以 Python 2.7 为基础. lambda 函数实现递归 方法一:传递一个 self 参数 求阶乘: frac = lambda self, x: self(self, x - 1) * x ...
- LG3960 列队
题意 传送门 分析 参照博客 树状数组+离线处理即可. 利用树状数组下标本质即可\(O(\log n)\)求第k大. 代码 #include<iostream> #include<c ...
- ballerina 学习十一 Packages
ballerina 的包还是比较简单的,实际上就是对于源码文件集合的管理,同时我们可以添加别名,同时可以进行 其他包的引用 import 简单例子 代码 import ballerina/math; ...
- lapis 处理接收到的json 数据
备注: 在restful api 开发过程中,大家一般使用的都是json 格式的数据lapis 在处理json 数据上也是比较方便的 1. 使用的api 说明 local ...
- 表格头部与左侧内容随滚动条位置改变而改变(基于jQuery)
效果图如下: HTML代码: <!DOCTYPE html> <html lang="zh-CN"> <head> <meta chars ...
- php array_push 与 $arr[]=$value 性能比较
1.array_push方法 array_push 方法,将一个或多个元素压入数组的末尾. int array_push ( array &$array , mixed $var [, mix ...