Diagnose High-Latency I/O Operations Using SystemTap
Luca Canali on 28 Jul 2015
Topic: this post is about some simple tools and techniques that can be used to drill down high-latency I/O events using SystemTap probes.
The problem: Operations with high latency on a filesystem and/or a storage volume can sometimes be attributed to just a few disks 'misbehaving', possibly because they are suffering mechanical failures and/or are going to break completely in the near future.
I/Os of high latency on just a few disks can then appear as latency outliers when accessing volumes build on a large number of disks and overall affect the performance of the entire storage. I write this having in mind the example of a storage system built with (SATA) JBODs using Oracle ASM as volume manager/DB filesystem. However the main ideas and tools described in this post apply to many other volume managers and file systems, including HDFS.
The standard tools: One way to find that one or more disks are serving I/O requests with high latency is with the use of standard Linux tools such as iostat, sar orcollectl. Typically you would use those tools to spot anomalous values of average service time, average wait time and also of queue size.
A structured approach on how to do this is described in Brendan Gregg(link is external)'s USE method(link is external) and thetools that can be used in Linux(link is external) to implement it.
SystemTap scripts: In this post we focus on a technique and a couple of simple scripts that can be used to identify disks serving I/O with high latency using SystemTap(link is external) probes to investigate I/O latency of the block devices.
The script blockio_latency_outliers_per_device.stp(link is external) provides a measurement of some basics latency statistics for block device, including number of I/Os, average andmaximum latency. The script also provides details of all the I/O where the latency is above a certain programmable threshold (the default threshold is set at 500 microseconds).
An example of its use (edited output for clarity) is here below. Note the latency warning message and overall the very large maximum value measured for the latency of the /dev/sdy block device:
[root@myhost(link sends e-mail)] # stap -v blockio_latency_outliers_per_device.stp 10
Measuring block I/O latency and statistics
A warning will be printed for I/Os with latency higher than 500000 microseconds
Statistics will be printed every 10 seconds. Press CTRL-C to stop
...
latency warning, >500000 microsec: device=sdy, sector=166891231, latency=714984
latency warning, >500000 microsec: device=sdy, sector=165679327, latency=582708
latency warning, >500000 microsec: device=sdy, sector=167102975, latency=1162550
....
I/O latency basic statistics per device, measurement time: 10 seconds
Latency measured in microseconds
Disk name Num I/Os Min latency Avg latency Max latency
....
sdu 219 106 6217 27117
sdz 200 123 5995 27205
sdq 211 71 6553 31120
sdh 256 103 6643 22663
sds 224 101 6610 29743
sdm 238 92 7550 35571
sde 243 90 8652 52029
sdt 200 105 5997 25180
sdk 200 94 5696 35057
sdi 206 99 7849 30636
sdg 269 74 6746 36746
sdy 197 102 98298 1167977
sdr 200 89 6559 27873
sdl 200 140 8789 31996
sdw 210 99 7009 37118
sdd 217 94 7440 56071
sdn 205 99 6628 41339
....
When candidate disks for high latency have been identified, the second step is to further drill down using the script blockio_rq_issue_filter_latencyhistogram.stp(link is external). This script gathers and displays I/O latency histograms for a subset of block devices that can be specified using filters in the script header. The default filters are:
# SystemTap variables used to define filters, edit as needed
global IO_size = 8192 # this will be used as a filter for the I/O request size
# the value 8192 targets 8KB operations for Oracle
# single-block I/O
# use the value -1 to disable this filter
global IO_operation = 0 # this will be used as a filter: only read operations
# a value of 0 considers only read operations
# (1 is for write), use -1 to disable this filter
global IO_devmaj = -1 # this will be used as a filter:
# device major number (-1 means no filter)
# ex: use the value 253 for device mapper block devices
global IO_devmin = -1 # device minor number (or -1 if no filter)
You can use blockio_rq_issue_filter_latencyhistogram.stp(link is external) to drill down on the latency histogram for those disks that have shown high latency I/O and also compare them with "good" disks. In the example above the candidate "trouble disk" is /dev/sdy with major number 65 and minor 128 (you can major and minor device numbers for "sdy" simply using ls -l /dev/sdy).
Example:
stap -v blockio_rq_issue_filter_latencyhistogram.stp 10
Block I/O latency histograms from kernel trace points
Filters:
IO_size = 8192
IO_operation = 0 (0=read, 1=write, -1=disable filter)
IO_devmaj = 65 (-1=disable filter)
IO_devmin = 128 (-1=disable filter)
lock I/O latency histogram, measurement time: 10 seconds, I/O count: 199
Value = latency bucket (microseconds), count=I/O operations in 10 seconds
value |-------------------------------------------------- count
16 | 0
32 | 0
64 |@ 2
128 |@@@@@@@ 14
256 |@@ 4
512 |@ 2
1024 |@@@ 7
2048 |@@@@@@@@@ 19
4096 |@@@@@@@@@@@@@@@@@@ 37
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 59
16384 |@@@@@@@ 14
32768 | 0
65536 |@@@@@ HIGH 10
131072 |@@@@@ LATENCY 11
262144 |@@@@@ I/O 10
524288 |@@@ OPERATIONS 6
1048576 |@@ 4
2097152 | 0
Note the presence of several I/O operations at high latency together with more normal I/O latecy, for example I/Os served from SATA disks at around 8 ms and also I/O served from the controller cache at sub-millisecond latency.
For comparison here below is the latency histogram measured on another disk while the same workload was running. We can see that in this case high latency points found in the previous example are no more present. Most of the I/O operations are around 8 ms latency and some operations served from cache at sub-millisecond latency. The I/O reported for the /dev/sdy disks with latency of 64 ms and above are not present in this case.
stap -v blockio_rq_issue_filter_latencyhistogram.stp 10
Block I/O latency histograms from kernel trace points
Filters:
IO_size = 8192
IO_operation = 0 (0=read, 1=write, -1=disable filter)
IO_devmaj = 65 (-1=disable filter)
IO_devmin = 48 (-1=disable filter)
Block I/O latency histogram, measurement time: 10 seconds, I/O count: 196
Value = latency bucket (microseconds), count=I/O operations in 10 seconds
value |-------------------------------------------------- count
32 | 0
64 | 0
128 |@@@@@@@@@ 19
256 |@@@@@@@@ 16
512 | 1
1024 |@@@@ 9
2048 |@@@@@@@@@@@@@@ 28
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 56
8192 |@@@@@@@@@@@@@@@@@@@@@@@@ 49
16384 |@@@@@@@@@ 18
32768 | 0
65536 | 0
The tests reported here have been performed on a old system running RHEL 5 (kernel 2.6.18-371) and SystemTap v. 1.8, however the scripts have also been tested on OL and CentOS 7.1 (kernel 3.10.0-229) and SystemTap 2.8. If you are using SystemTap 2.6 or above you can use the scripts blockio_rq_issue_latencyhistogram_new.stp(link is external)and blockio_rq_issue_filter_latencyhistogram_new.stp(link is external) instead.
Conclusion: In this post we have shown a simple technique to troubleshoot I/O latency outliers and in particular on how to drill down on I/Os served at high latency because of one or more misbehaving disk in the storage volume/filesystem. The investigation has been done using SystemTap scripts used first to discover which disks were serving some of their I/O s with high latency and then drilling down on specific devices with the use of latency histograms. The fact of identifying and later replacing the badly performing disks can be beneficial for the performance of the entire storage systems.
Download: The tools discussed in this post can be downloaded from this webpage and from Github(link is external).
Acknowledgements and additional links: Brendan Gregg(link is external) has published an extensive set of articles and tools on performance tuning, including storage latency investigations.
For more information and examples on how to use SystemTap see the SystemTap wiki(link is external)
Additional tools for Oracle ASM I/O investigations also see Bertrand Drouvot's asm_metrics.pl(link is external)
Kevin Closson's SLOB(link is external) was used as workload generator for the examples discussed here.
Additional links on tools and techniques on Oracle I/O troubleshooting in this blog: Heat Map Visualization of I/O Latency with SystemTap and PyLatencyMap(link is external), Event Histogram Metric and Oracle 12c(link is external) and Life of an Oracle I/O: Tracing Logical and Physical I/O with Systemtap(link is external)
Diagnose High-Latency I/O Operations Using SystemTap的更多相关文章
- SystemTap Beginners Guide
SystemTap 3.0 SystemTap Beginners Guide Introduction to SystemTap Edition 3.0 Red Hat, Inc. Don Do ...
- Examining Huge Pages or Transparent Huge Pages performance
Posted by William Cohen on March 10, 2014 All modern processors use page-based mechanisms to transla ...
- HazelCast 的内存管理原理
As it is said in the recent article "Google: Taming the Long Latency Tail - When More Machines ...
- Windows常用性能计数器总结
基础监控: Processor:% Processor Time CPU当前利用率,百分比 Memory:Available MBytes 当前可用内存,兆字节(虚拟内存不需要监控,只有当物理内存不够 ...
- 【RabbitMQ】 RabbitMQ安装
MQ全称为Message Queue, 消息队列(MQ)是一种应用程序对应用程序的通信方法.应用程序通过读写出入队列的消息(针对应用程序的数据)来通信,而无需专用连接来链接它们.消息传递指的是程序之间 ...
- Extension of write anywhere file system layout
A file system layout apportions an underlying physical volume into one or more virtual volumes (vvol ...
- Spark第一周
Why Scala 在数据集不是很大的时候,开发人员可以使用python.R.MATLAB等语言在单机上处理数据集.但是在大数据时代,数据集少说都是TB.PB级别,此时便需要分布式地处理.相较于上述语 ...
- Linux安装RabbitMQ教程(文件下载地址+安装命令+ 端口开放 + 用户创建 +配置文件模板+端口修改)
前言 1.安装RabbitMQ前需先安装erlang, 且两者需要版本对应, 否则无法正常启动RabbitMQ (本教程使用22.0.7版本的erlang和3.8.6版本的Rabbitmq) 版本对应 ...
- 用systemtap对sysbench IO测试结果的分析1
http://www.actionsky.com/docs/archives/171 2016年5月6日 黄炎 近期在一些简单的sysbench IO测试中, 遇到了一些不合常识的测试结果. 从结 ...
随机推荐
- daily_journal_2 神奇的一天
写博客日记的第二天,第一天立的flag开始有点松动啦,继续坚持啊!坚持就是胜利. 今天真是神奇的一天,上午的计划是照常进行的,但是前天淋雨赶上风寒,又吃了新疆室友的大补特产,龙体开始感觉到不适,于是上 ...
- 贪心 Codeforces Round #135 (Div. 2) C. Color Stripe
题目传送门 /* 贪心:当m == 2时,结果肯定是ABABAB或BABABA,取最小改变量:当m > 2时,当与前一个相等时, 改变一个字母 同时不和下一个相等就是最优的解法 */ #incl ...
- Javascript 排序算法(转)
1.快速排序 class QuickSort { Sort(originalArray) { // 复制 originalArray 数组防止它被修改 const array = [...origin ...
- [转]Mysql Join语法解析与性能分析
转自:http://www.cnblogs.com/BeginMan/p/3754322.html 一.Join语法概述 join 用于多表中字段之间的联系,语法如下: ... FROM table1 ...
- MySQL的主从复制(windows)
在我们实际的开发中,当系统业务到达一定的程度,可能数据库会到达一定的瓶颈,但实际开发中最容易到达数据库瓶颈的应该是数据库的读性能,一般的业务大多都是读多写少,我们可以通过提高读的性能来提高数据库的整体 ...
- vscode使用教程(web开发)
1.安装 进入官网下载https://code.visualstudio.com/ 一直下一步就好了,中间可以选择把软件安装在哪个目录. 2.常用插件安装 a. 进入扩展视图界面安装/卸载 a1.快捷 ...
- 【Android开发】XML文件解析
最近在做一个项目,涉及到XML文件的解析,废话不多说,如下: 读取 private ArrayList<Data> readXMLLocked() { File file = new Fi ...
- R语言曲线拟合函数(绘图)
曲线拟合:(线性回归方法:lm) 1.x排序 2.求线性回归方程并赋予一个新变量 z=lm(y~x+I(x^2)+...) 3.plot(x,y) #做y对x的散点图 4.lines(x ...
- Stanford coursera Andrew Ng 机器学习课程第四周总结(附Exercise 3)
Introduction Neural NetWork的由来 时,我们可以对它进行处理,分类.但是当特征数增长为时,分类器的效率就会很低了. Neural NetWork模型 该图是最简单的神经网络, ...
- 树莓派安装CentOS
1.下载并安装,这里使用的是 centos系统地址:http://mirror.centos.org/altarch/7/isos/armhfp/ 下载CentOS-Userland-7-armv7h ...