使用pt-ioprofile监控数据库io文件读写情况

我们在做IO密集型的应用程序的时候，比如MySQL数据库，通常系统的表现取决于workload的类型。比如我们要调优，我们就必须非常清楚的知道数据的访问规律，收集到足够的数据，用来做调优的依据。

有很多工具可以收集系统层面的，设备层面的，进程层面的IO数据，但是没有一个现成的工具可以回答我们比如应用打开了多少文件，文件的读和写的比例是多少，调用了多少次sync, 每次的数据大小是多少，调用了多少次，每次用了多少时间，是顺序操作还是随机操作，是那个线程发起的操作。

pt-ioprofile是percona提供的用于监控进程io和文件读写的一个工具。

pt-ioprofile does two things: 1) get lsof+strace for -s seconds, 2) aggregate the result. If you specify a FILE, then step 1) is not performed.

风险：

WARNING: pt-ioprofile freezes the server and may crash the process, or make it perform badly after detaching, or leave it in a sleeping state! Before using this tool, please:

Read the tool’s documentation
Review the tool’s known “BUGS”
Test the tool on a non-production server
Backup your production server and verify the backups

pt-ioprofile should be considered an intrusive tool, and should not be used on production servers unless you understand and accept the risks.

由于pt-ioprofile是一种侵入性工具，所以尽量不要在生产上使用，可能会引起进程挂掉。

pt-ioprofile使用的是strace 和lsof来监控进程操作，最后得出操作文件的列表，默认直接运行是监控mysql进程，默认时间是30s。不过也可以用来监控PostgreSQL的服务进程。

下面是几个示例：

使用pgbench进行压测，压测脚本为：

\set id random(,)

insert into test (id,info,crt_time) values (:id, md5(random()::text), now()) on conflict (id) do update 
set info=excluded.info, crt_time=excluded.crt_time;

压测命令：

pgbench -M prepared -U swrd swrd  -n -r -P  -f ./test.sql -c  -j  -T

指定监控pg进程：

# pt-ioprofile -p 7269

Mon Oct  :: CST

Tracing process ID

     total       read      write       open      close      lseek filename

  0.078474   0.075984   0.000000   0.001795   0.000000   0.000695 base//2619_fsm

  0.049367   0.049318   0.000000   0.000023   0.000000   0.000026 base//

  0.004804   0.004764   0.000000   0.000018   0.000011   0.000011 base//

  0.003581   0.003302   0.000000   0.000125   0.000154   0.000000 base//pg_internal.init

  0.003091   0.000000   0.000000   0.003078   0.000000   0.000013 base//

  0.002482   0.000000   0.000000   0.000018   0.000000   0.002464 base//

  0.000829   0.000800   0.000000   0.000016   0.000013   0.000000 base//pg_internal.init

  0.000770   0.000000   0.000000   0.000042   0.000697   0.000031 base//

  0.000613   0.000000   0.000000   0.000181   0.000419   0.000013 base//

  0.000567   0.000160   0.000000   0.000301   0.000106   0.000000 pg_stat_tmp/global.stat

  0.000448   0.000153   0.000000   0.000154   0.000000   0.000141 base//

  0.000385   0.000326   0.000000   0.000035   0.000024   0.000000 global/pg_internal.init

  0.000329   0.000190   0.000000   0.000117   0.000022   0.000000 global/pg_filenode.map

  0.000253   0.000031   0.000028   0.000114   0.000036   0.000044 base//38453_vm

  0.000245   0.000100   0.000000   0.000057   0.000000   0.000088 base//2840_fsm

  0.000194   0.000111   0.000000   0.000044   0.000000   0.000039 base//

  0.000164   0.000081   0.000000   0.000050   0.000033   0.000000 pg_stat_tmp/db_0.stat

  0.000161   0.000107   0.000000   0.000031   0.000023   0.000000 pg_stat_tmp/db_16401.stat

  0.000156   0.000000   0.000000   0.000117   0.000000   0.000039 base//2619_vm

  0.000150   0.000000   0.000000   0.000064   0.000046   0.000040 base//38453_fsm

  0.000101   0.000011   0.000000   0.000071   0.000019   0.000000 base//pg_filenode.map

  0.000088   0.000037   0.000000   0.000025   0.000000   0.000026 base//2840_vm

  0.000085   0.000000   0.000000   0.000022   0.000012   0.000051 base//

  0.000075   0.000000   0.000000   0.000017   0.000011   0.000047 base//

  0.000074   0.000045   0.000000   0.000016   0.000013   0.000000 pg_stat_tmp/db_13269.stat

  0.000071   0.000000   0.000000   0.000036   0.000013   0.000022 global/

  0.000058   0.000021   0.000000   0.000025   0.000012   0.000000 base//PG_VERSION

  0.000046   0.000016   0.000000   0.000019   0.000011   0.000000 base//PG_VERSION

  0.000039   0.000012   0.000000   0.000016   0.000011   0.000000 base//pg_filenode.map

  0.000022   0.000000   0.000000   0.000022   0.000000   0.000000 base//38459_fsm

其中，

read：从文件中读出数据。要读取的文件用文件描述符标识，数据读入一个事先定义好的缓冲区。

write：把缓冲区的数据写入文件中。

pread：由于lseek和read调用之间，内核可能会临时挂起进程，所以对同步问题造成了问题，调用pread相当于顺序调用了lseek和read，这两个操作相当于一个捆绑的原子操作。

pwrite：由于lseek和write调用之间，内核可能会临时挂起进程，所以对同步问题造成了问题，调用pwrite相当于顺序调用了lseek 和write，这两个操作相当于一个捆绑的原子操作。

fsync：确保文件所有已修改的内容已经正确同步到硬盘上，该调用会阻塞等待直到设备报告IO完成。

open：打开一个文件，并返回这个文件的描述符。

close：close系统调用用于“关闭”一个文件，close调用终止一个文件描述符以及文件之间的关联。文件描述符被释放，并能够重新使用。

lseek：对文件描述符指定文件的读写指针进行设置，也就是说，它可以设置文件的下一个读写位置。

fcntl：针对(文件)描述符提供控制。

看看各参数的作用：

--aggregate

统计的方式，默认是sum，也可指定为avg。

short form: -a; type: string; default: sum

The aggregate function, either sum or avg.

If sum, then each cell will contain the sum of the values in it. If avg, then each cell will contain the average of the values in it.

--cell

显示的单位，默认是times，即IO操作的时间，也可指定为count（IO操作的次数），size（IO操作的大小）

如下所示：

--cell=times

Tracing process ID

     total       read       open      close      lseek filename

  0.000378   0.000351   0.000016   0.000011   0.000000 base//pg_internal.init

  0.000140   0.000053   0.000053   0.000034   0.000000 pg_stat_tmp/global.stat

  0.000105   0.000077   0.000016   0.000012   0.000000 global/pg_internal.init

  0.000067   0.000041   0.000015   0.000011   0.000000 pg_stat_tmp/db_13269.stat

  0.000061   0.000000   0.000016   0.000000   0.000045 base//

  0.000051   0.000013   0.000025   0.000013   0.000000 global/pg_filenode.map

  0.000044   0.000016   0.000015   0.000013   0.000000 pg_stat_tmp/db_0.stat

  0.000043   0.000016   0.000016   0.000011   0.000000 base//PG_VERSION

  0.000036   0.000011   0.000015   0.000010   0.000000 base//pg_filenode.map

  0.000030   0.000000   0.000019   0.000000   0.000011 base//

  0.000028   0.000000   0.000017   0.000000   0.000011 global/

--cell=sizes

Tracing process ID

     total       read       open      close      lseek filename

                                  base//

                                  base//

                                        base//

                                        base//

                                        base//pg_internal.init

                                          pg_stat_tmp/db_16401.stat

                                          base//2840_fsm

                                          global/pg_internal.init

                                            global/

                                            base//38453_vm

                                            base//

                                            pg_stat_tmp/global.stat

                                            pg_stat_tmp/db_0.stat

                                              global/pg_filenode.map

                                              base//pg_filenode.map

                                                  base//PG_VERSION

--cell=count

Tracing process ID

     total       read       open      close      lseek filename

                                                base//pg_internal.init

                                                  pg_stat_tmp/global.stat

                                                  global/pg_internal.init

                                                  pg_stat_tmp/db_16401.stat

                                                  base//

                                                  pg_stat_tmp/db_0.stat

                                                  global/pg_filenode.map

                                                  base//PG_VERSION

                                                  base//pg_filenode.map

                                                  global/

                                                  base//

short form: -c; type: string; default: times

The cell contents.

Valid values are:

VALUE CELLS CONTAIN

===== =======================

count Count of I/O operations

sizes Sizes of I/O operations

times I/O operation timing

--group-by

分组的单位，默认是filename，即对文件名进行统计，也可指定为all，即对所有操作进行统计，pid，对进程进行统计

short form: -g; type: string; default: filename

The group-by item.

Valid values are:

VALUE GROUPING

===== ======================================

all Summarize into a single line of output

filename One line of output per filename

pid One line of output per process ID

--run-time

执行strace命令的时间，OPT_RUN_TIME就是--run-time指定的值。

--save-samples

将strace和lsof获取的结果保存到指定的文件中

type: string

Filename to save samples in; these can be used for later analysis.

参考：

http://www.cnblogs.com/ivictor/p/6013980.html

https://www.percona.com/doc/percona-toolkit/2.2/pt-ioprofile.html#options

使用pt-ioprofile监控数据库io文件读写情况的更多相关文章

JAVA之IO文件读写
IO概述: IO(Input output)流作用:IO流用来处理设备之间的数据传输 ...
python IO 文件读写
IO 由于CPU和内存的速度远远高于外设的速度,所以,在IO编程中,就存在速度严重不匹配的问题. 如要把100M的数据写入磁盘,CPU输出100M的数据只需要0.01秒,可是磁盘要接收这100M数据可 ...
NIO与普通IO文件读写性能对比
最近在熟悉java的nio功能.nio采用了缓冲区的方式进行文件的读写,这一点更接近于OS执行I/O的方式.写了个新旧I/O复制文件的代码,练练手,顺便验证一下两者读写性能的对比,nio是否真的比普通 ...
[PY3]——IO——文件读写
文件打开和关闭 # 使用open 打开文件,返回时值是一个 File-like对象 f.open('/test/file') # 使用read读取文件 f.read( ) # 使用close关闭文件 ...
python学习笔记 IO 文件读写
读写文件是最常见的IO操作.python内置了读写文件的函数. 读写文件前,我们先必须了解一下,在磁盘上读写文件的功能都是由操作系统完成的,现代操作系统不允许普通的程序直接对磁盘进行操作,所以, 读写 ...
java IO文件读写例子（OutputStream，InputStream，Writer，Reader）
一,File创建文件 File file = new File("D:" + File.separator + "yi.txt"); 代码示例: package ...
java io 文件读写操作
写: import java.io.*; String filePath= "F:\\test.txt"; FileWriter fwriter = null; fwriter = ...
IO文件读写
*b表示二进制模式访问,但是对于Linux或者Unix系统来说这个模式没有任何意义,因为他们把所有文件都看做二进制文件,包括文本文件一.三种方法读取文件方法1:open f=open(" ...
文件读写监控(inotify, systemtap)
一.inotify inotify是内核的一个特性,可以用来监控目录.文件的读写等事件,当监控目标是目录时,inotify除了会监控目录本身,还会监控目录中的文件.inotify的监控功能由 ...

随机推荐

运输层（TCP/UDP）详解
TCP和UDP的区别: tcp是面向连接的可靠的传输协议 udp是非连接的不可靠的传输协议 TCP组成可以看到虽然tcp是面向字节流的,但是其传输的基本单位还是报文(tcp首部和数据,ip报文和ud ...
ZT-----用javascrip写一个区块链
几乎每个人都听说过像比特币和以太币这样的加密货币,但是只有极少数人懂得隐藏在它们背后的技术.在这篇博客中,我将会用JavaScript来创建一个简单的区块链来演示它们的内部究竟是如何工作的.我将会称之 ...
throttle(节流)和debounce(防抖)
防抖和节流都是用来控制频繁调用的问题,但是这两种的应用场景是有区别的. throttle(节流) 有一个调用周期,在一个很长的时间里分为多段,每一段执行一次.例如onscroll,resize,500 ...
PytorchZerotoAll学习笔记（五）--逻辑回归
逻辑回归: 本章内容主要讲述简单的逻辑回归:这个可以归纳为二分类的问题. 逻辑,非假即真.两种可能,我们可以联想一下在继电器控制的电信号(0 or 1) 举个栗子:比如说你花了好几个星期复习的考试(通 ...
食物链 POJ 1182（种类并查集）
Description 动物王国中有三类动物A,B,C,这三类动物的食物链构成了有趣的环形.A吃B, B吃C,C吃A. 现有N个动物,以1-N编号.每个动物都是A,B,C中的一种,但是我们并不知道它到 ...
Scrum立会报告+燃尽图（Beta阶段第二周第一次）
此作业要求参见:https://edu.cnblogs.com/campus/nenu/2018fall/homework/2409 项目地址:https://coding.net/u/wuyy694 ...
Android开发第二阶段（2）
昨天:总结了第一阶段的开发经验今天:学习了一下java中对事件处理这块的初步了解比如设置监听器等明天:我会走进我们的代码去看看相关的一些知识.
解决Ubuntu16.04 fatal error: json/json.h: No such file or directory
参考博客错误产生安装json-c库之后,根据GitHub上面的readme文件链接到json-c库时出现以下错误: SDMBNJson.h:9:23: fatal error: json/json ...
本周WEB技术学习情况
相较于之前本周有了明显的进步之前不熟悉的知识点在多次的实机中得到巩固加深,用得越来越得心应手我认为只要多用功更加自主的学习 web其实并不难希望自己天天进步
第七周C语言代码
#ifndef NMN_LIST_H #define NMN_LIST_H #include <stdio.h> struct list_head { struct lis ...

使用pt-ioprofile监控数据库io文件读写情况

使用pt-ioprofile监控数据库io文件读写情况的更多相关文章

随机推荐

热门专题