Since I did't see here anything about perf which is a relatively new tool for profiling the kernel and user applications on Linux I decided to add this information.

First of all - this is a tutorial about Linux profiling with perf

You can use perf if your Linux Kernel is greater than 2.6.32 or oprofile if it is older. Both programs don't require from you to instrument your program (like gprof requires). However in order to get call graph correctly in perf you need to build you program with -fno-omit-frame-pointer. For example: g++ -fno-omit-frame-pointer -O2 main.cpp.

You can see "live" analysis of your application with perf top:

sudo perf top -p `pidof a.out` -K

Or you can record performance data of a running application and analyze them after that:

1) To record performance data:

perf record -p `pidof a.out`

or to record for 10 secs:

perf record -p `pidof a.out` sleep 10

or to record with call graph ()

perf record -g -p `pidof a.out`

2) To analyze the recorded data

perf report --stdio

perf report --stdio --sort=dso -g none

perf report --stdio -g none

perf report --stdio -g

Or you can record performace data of a application and analyze them after that just by launching the application in this way and waiting for it to exit:

perf record ./a.out

This is an example of profiling a test program

The test program is in file main.cpp (I will put main.cpp at the bottom of the message):

I compile it in this way:

g++ -m64 -fno-omit-frame-pointer -g main.cpp -L.  -ltcmalloc_minimal -o my_test

I use libmalloc_minimial.so since it is compiled with -fno-omit-frame-pointer while libc malloc seems to be compiled without this option. Then I run my test program

./my_test 100000000

Then I record performance data of a running process:

perf record -g  -p `pidof my_test` -o ./my_test.perf.data sleep 30

Then I analyze load per module:

perf report --stdio -g none --sort comm,dso -i ./my_test.perf.data

# Overhead  Command                 Shared Object

# ........  .......  ............................

#

70.06%  my_test  my_test

and so on ...

Then call chains are analyzed:

perf report --stdio -g graph -i ./my_test.perf.data | c++filt

0.16%  my_test  [kernel.kallsyms]             [k] _spin_lock

and so on ...

So at this point you know where your program spends time.

And this is main.cpp for the test:

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

time_t f1(time_t time_value)

{

for (int j =0; j < 10; ++j) {

++time_value;

if (j%5 == 0) {

double *p = new double;

delete p;

}

}

return time_value;

}

time_t f2(time_t time_value)

{

for (int j =0; j < 40; ++j) {

++time_value;

}

time_value=f1(time_value);

return time_value;

}

time_t process_request(time_t time_value)

{

for (int j =0; j < 10; ++j) {

int *p = new int;

delete p;

for (int m =0; m < 10; ++m) {

++time_value;

}

}

for (int i =0; i < 10; ++i) {

time_value=f1(time_value);

time_value=f2(time_value);

}

return time_value;

}

int main(int argc, char* argv2[])

{

int number_loops = argc > 1 ? atoi(argv2[1]) : 1;

time_t time_value = time(0);

printf("number loops %d\n", number_loops);

printf("time_value: %d\n", time_value );

for (int i =0; i < number_loops; ++i) {

time_value = process_request(time_value);

}

printf("time_value: %ld\n", time_value );

return 0;

}

原文

http://stackoverflow.com/questions/1777556/alternatives-to-gprof#comment3480484_1779343

how to use perf的更多相关文章

  1. 系统级性能分析工具perf的介绍与使用

    测试环境:Ubuntu16.04(在VMWare虚拟机使用perf top存在无法显示问题) Kernel:3.13.0-32 系统级性能优化通常包括两个阶段:性能剖析(performance pro ...

  2. 玩 perf

    有一个进程happy在执行,另一个进程spy发送了一个信号把happy给杀死了 我怎么能通过perf抓到spy进程? happy进程一直执行 在spy进程中调用kill(happy's pid) ,发 ...

  3. linux perf - 性能测试和优化工具

    Perf简介 Perf是Linux kernel自带的系统性能优化工具.虽然它的版本还只是0.0.2,Perf已经显现出它强大的实力,足以与目前Linux流行的OProfile相媲美了. Perf 的 ...

  4. Linux 性能优化工具 perf top

    1. perf perf 是一个调查 Linux 中各种性能问题的有力工具. NAME perf - Performance analysis tools for Linux SYNOPSIS per ...

  5. 【转】Profiling application LLC cache misses under Linux using Perf Events

    转自:http://ariasprado.name/2011/11/30/profiling-application-llc-cache-misses-under-linux-using-perf-e ...

  6. Linux/Android 性能优化工具 perf

    /***************************************************************************** * Linux/Android 性能优化工 ...

  7. Linux下的内核测试工具——perf使用简介

    Perf是Linux kernel自带的系统性能优化工具.Perf的优势在于与Linux Kernel的紧密结合,它可以最先应用到加入Kernel的new feature.pef可以用于查看热点函数, ...

  8. 系统级性能分析工具 — Perf

    从2.6.31内核开始,linux内核自带了一个性能分析工具perf,能够进行函数级与指令级的热点查找. perf Performance analysis tools for Linux. Perf ...

  9. Linux Kernel ‘perf’ Utility 本地提权漏洞

    漏洞名称: Linux Kernel ‘perf’ Utility 本地提权漏洞 CNNVD编号: CNNVD-201309-050 发布时间: 2013-09-09 更新时间: 2013-09-09 ...

  10. 使用perf生成Flame Graph(火焰图)

      具体的步骤参见这里: <flame graph:图形化perf call stack数据的小工具>   使用SystemTap脚本制作火焰图,内存较少时,分配存储采样的数组可能失败,需 ...

随机推荐

  1. 012.Zabbix的IT服务

    一 IT服务简介 IT服务能体现宏观度量和管理基础设施的总体情况的可用性,从而体现总体的趋势,发现并解决IT基础设施暴露的问题. 二 IT服务的添加 2.1 IT services--services ...

  2. 001.FTP简介及相关文件

    一 FTP简介 FTP(File Transfer Protocol)文件传输协议,用于Internet上控制文件的双向传输. 下载:远程主机拷贝文件至本地: 上传:本地主机拷贝文件至远程. 二 FT ...

  3. 【Ray Tracing in One Weekend 超详解】 光线追踪1-3

    学完了插值,我们来学习在场景里面添加一个立体彩色球(三维插值) 按照惯例,先看效果: Chapter4: Adding a sphere 我们又一次面临图形学的主要任务. 我们需要再次回顾coord1 ...

  4. 面向对象设计原则 里氏替换原则(Liskov Substitution Principle)

    里氏替换原则(Liskov Substitution Principle LSP)面向对象设计的基本原则之一. 里氏替换原则中说,任何基类可以出现的地方,子类一定可以出现. LSP是继承复用的基石,只 ...

  5. android stuidio 导入项目问题。

    避免重新下载. === === === 改成自己对应的. ===== Gradle sync failed: Could not find method android() for arguments ...

  6. Jindent——让intellij idea 像eclipse一样生成模版化的javadoc注释

    插件地址 http://plugins.jetbrains.com/plugin/2170?pr=idea 安装方法参考 http://www.cnblogs.com/nova-/p/3535636. ...

  7. BZOJ.1923.[SDOI2010]外星千足虫(高斯消元 异或方程组 bitset)

    题目链接 m个方程,n个未知量,求解异或方程组. 复杂度比较高,需要借助bitset压位. 感觉自己以前写的(异或)高斯消元是假的..而且黄学长的写法都不需要回代. //1100kb 324ms #i ...

  8. 邻接矩阵实现图的存储,DFS,BFS遍历

    图的遍历一般由两者方式:深度优先搜索(DFS),广度优先搜索(BFS),深度优先就是先访问完最深层次的数据元素,而BFS其实就是层次遍历,每一层每一层的遍历. 1.深度优先搜索(DFS) 我一贯习惯有 ...

  9. BZOJ4278 : [ONTAK2015]Tasowanie

    首先在串的末尾加上1000,然后进行归并,每次取字典序较小的那个后缀即可. 用hash+二分支持查询lcp,时间复杂度$O(n\log n)$. #include<cstdio> type ...

  10. BZOJ4254 : Aerial Tramway

    可以修建的缆车总数不超过n,于是可以先通过$O(n^2)$的枚举求出所有可以修建的缆车. 对于一个缆车,若它仅连接i和i+1,那么它不受k的限制,把这种缆车额外取出,从大到小排序. 剩下的缆车两两之间 ...