性能调优 session 1 - 计算机体系结构 量化研究方法
近期本人参与的存储系统项目进入到性能调优阶段,当前系统的性能指标离项目预期目标还有较大差距。本人一直奉行"理论指导下的实践",尤其在调试初期,更要抓住主要矛盾,投入最少的资源来获取最大的收益。如何找到主要矛盾并重点解决呢?
本文参考经典书籍《计算机体系结构 量化研究方法》,主要介绍系统可靠性和性能评估的基本理论,以及 Amdahl's Law (阿姆达定律)和 processor performance equation(处理器性能等式),为性能调优和系统可靠性评估提供理论支撑。
Background and Introduction
Dependability
- SLA (Service Level Agreement)
- Service Accomplishment, where the service is delivered as specified
- Service Interruption, where the delivered service is different from the SLA
- Module Reliability
- Mean time to failure (MTTF)
- Mean time to repair (MTTR)
- Mean time between failures (MTBF) = MTTF + MTTR
- Failure in time (FIT): failures per billion hours
- Module Availability
- Module availability = MTTF / (MTTF + MTTR)
Example1
Assume a disk subsystem with the following components and MTTF:
- 10 disks, each rated at 1,000,000-hour MTTF
- 1 ATA controller, 500,000-hour MTTF
- 1 power supply, 200,000-hour MTTF
- 1 fan, 200,000-hour MTTF
- 1 ATA cable, 1,000,000-hour MTTF
Using the simplifying assumptions that the lifetime are exponentially distributed and that failures are independent, compute the MTTF of the system as a whole.
Answer1
The sum of the failure rates is
\[Failure\ rate_{system}=10\times\frac{1}{1,000,000}+\frac{1}{500,000}+\frac{1}{200,000}+\frac{1}{200,000}+\frac{1}{1,000,000}=\frac{10+2+5+5+1}{1,000,000}=\frac{23}{1,000,000}=\frac{23,000}{1,000,000,000}
\]or 23,000 FIT.
The MTTF for the system is just the inverse of the failure rate\[MTTF_{system}=\frac{1}{Failure\ rate}=\frac{1,000,000}{23}\approx43,500 \ hours
\]or just under 5 years.
Example2
Disk subsystems often have redundant power supplies to improve dependability. Using the preceding components and MTTFs, calculate the reliability of redundant power supplies. Assume that one power supply is sufficient to run the disk subsystem and that we are adding one redundant power supply.
Assumptions:
- lifetime of components are exponentially distributed.
- there is no dependency between the components failures.
- MTTF for our redundant power supplies is the mean time until one power supply failed divided by the chance that the other will fail before the first one is replaced.
Answer2
Mean time until one supply failed is $ MTTF_{power supply} / 2 $.
A good approximation of the probability of a second failure is MTTR over the mean time until the other power supply fails.\[MTTF_{power\ supply\ pair}=\frac{MTTF_{power\ supply}/2}{\frac{MTTR_{power\ supply}}{MTTF_{power\ supply}}}=\frac{MTTF^2_{power\ supply}}{2 \times MTTR_{power\ supply}}
\]Assume a human operator to notice the failure and replace it, the reliability of the fault tolerant pair of power supplies is
\[MTTF_{power supply pair} = \frac{200000^2}{2 \times 24} \approx 830,000,000
\]making the pair about 4150 times more reliable than a single supply.
Annual Failure Rate
Fallacy
The rated mean time to failure of disks is 1,200,000 hours or almost 140 years so disk practically never fail.
The number 1,200,000 far exceeds the lifetime of a disk, which is commonly assumed to be 5 years or 43,800 hours.
For this large MTTF to make some sense: keep replacing the disk every 5 years - the planned lifetime of the disk. Replace a disk 27 times before a failure in next century, or about 140 years.
Therefore, more useful measure is the percentage of disks that fail, which is called annual failure rate (AFR).
Example
Assume 1000 disks with a 1,000,000-hour MTTF and that the disks are used 24 hours a day. If you replaced failed disk with a new one having the same reliability characteristics, the number of failed disks in a year(8760 hours) is
\[Failed\ disks = \frac{number\ of\ disks \times time\ period}{MTTF}=\frac{1000\ disks\times8760\ hours/disk}{1,000,000 hours}=9
\]0.9% of disks would fail per year, 4.4% over 5-years lifetime.
In real environments according to research, 3%-7% of drives failed per year for an MTTF of about 125,000-300,000 hours.
The real-world MTTF is about 2-10 times worse than the manufacture's MTTF.
Performance Measurement
- Typical performance metrics
- response time
- throughput
- Execution time
- Wall clock time: include all system overheads
- CPU time: only computation time
- Speedup of X relative to Y
X is faster than Y,\[n=\frac{Execution\ time_Y}{Execution\ time_X}=\frac{1/Performance_Y}{1/Performance_X}=\frac{Performance_X}{Performance_Y}
\] - Benchmarks
- Kernels(e.g. matrix multiply)
- Toy program (e.g. quick sort)
Above 2 metrics cannot give the real performance of application execution. - Synthetic benchmarks (e.g. Dhrystone)
- Benchmark suites (e.g. SPEC06FP, TPC-c)
Quantitative Principles of Computer Design
- Take advantage of parallelism
e.g. multiple processors, disks, memory banks, pipelining, multiple function units - Principle of locality
- reuse of data and instructions
- Temporal locality and spatial locality
- Focus on the common case
- favor the frequent case over the infrequent case
- Amdahl's Law
- processor performance equation
Amdahl's Law
Basics
Amdahl's law gives us a quick way to find speedup from some enhancement, which depends on 2 factors:
- the fraction of the computation time in the original computer that can be converted to take advantage of the enhancement.
- the improvement gained by the enhanced execution mode, that is, how much faster the task would run if the enhanced mode were used for the entire program.
\]
The overall speedup is the ratio of the execution times:
\]
Examples
Example1
Suppose that we want to enhance the processor used for web serving. The new processor is 10 times faster on computation in the web serving application than the old processor. Assuming that the original processor is busy with computation 40% of the time and is waiting for IO 60% of the time, what is the overall speedup gained by incorporating the enhancement?
Answer1\[Fraction_{enhanced}=0.4;Speedup_{enhanced}=10;Speedup_{overall}=\frac{1}{0.6+\frac{0.4}{10}} \approx 1.56
\]
Example2
FSQRT (Floating-point square root)
Proposal 1: FSQRT is responsible for 20% of the execution time of a critical graphics benchmark. Enhance FSQRT hardware and speed up this operation by a factor of 10.
Proposal 2: FP instructions are responsible for half of the execution time for the application. Make all FP instructions in the graphics process run faster by a factor of 1.6.
Compare these 2 design alternatives.
Answer2
\[Speedup_{FSQRT}=\frac{1}{(1-0.2)+\frac{0.2}{10}}=1.22
\]\[Speedup_{FP}=\frac{1}{0.5+\frac{0.5}{1.6}}=1.23
\]Improving the performance of the FP operations overall is slightly better because of the higher frequency.
Example3
Back to dependability example:
\[Failure\ rate_{system}=\frac{10+2+5+5+1}{1,000,000}=\frac{23}{1,000,000}
\]The fraction of power supply in system is $ \frac{5}{23}=0.22 $.
After adding a redundant power supply, the system is about 4150 times more reliable than before.
The reliability improvement would be\[Improvement_{power supply pair}=\frac{1}{(1-0.22)+\frac{0.22}{4150}} \approx 1.28
\]Despite an impressive 4150x improvement in reliability of one module, from the system's perspective, the change has a measurable but small benefit.
Summary
- Amdahl's law can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost performance. The goal, clearly, is to speed resources proportional to where time is spent.
- Amdahl's law is particularly useful for comparing the overall system performance/processor design of 2 alternatives.
Processor Performance Equation
Basics
\]
or
\]
From instruction respect,
\]
\]
Term & Dependency:
clock cycle time- Hardware technology and organization,1/clock rateCPI, clock cycles per instruction- Organization and instruction set architectureIC, instruction count- Instruction set architecture and compiler technology
For different types of instructions,
\]
Overall CPI
\]
Examples
Consider previous Example2 in section Amdahl's Law, here modified to use measurements of the frequency of the instructions and of the instruction CPI values, which, in practice, are obtained by simulation or by hardware instrumentation.
Example
Suppose we made the following measurements:
- Frequency of FP operations = 25%
- Average CPI of FP operations = 4.0
- Average CPI of other instructions = 1.33
- Frequency of FSQRT = 2%
- CPI of FSQRT = 20
Assume that the 2 design alternatives are to
- decrease the CPI of FSQRT to 2
- decrease the average CPI of all FP operations to 2.5.
Compare these 2 design alternatives using the processor performance equation.
Answer
Original CPI with neither enhancement:
\[CPI_{original}=\Sigma_{i=1}^{n}{\frac{IC_i}{IC} \times CPI_i}=(4.0 \times 25\%)+(1.33 \times 75\%)=2.0
\]\[CPI_{with\ new\ FSQRT}=CPI_{original}-2\%\times(CPI_{old\ FSQRT}-CPI_{of\ new\ FSQRT})=2.0-2\% \times (20-2)=1.64
\]Since the CPI of overall FP enhancement is slightly lower, its performance will be marginally better.
\[Speedup_{new\ FP}=\frac{CPU\ time_{original}}{CPU\ time_{new FP}}=\frac{IC \times CPI_{original} \times clock\ cycle\ time}{IC \times CPI_{new FP} \times clock\ cycle\ time}=\frac{2.0}{1.625}=1.23
\]
It is more possible to measure the constituent parts of the processor performance equation. Such isolated measurements are a key advantage of using processor performance equation versus Amdahl's Law in the previous example. In particular, it may be difficult to measure things such as the fraction of execution time for which a set of instructions is responsible.
性能调优 session 1 - 计算机体系结构 量化研究方法的更多相关文章
- (0303)《计算机体系结构 量化研究方法》PDF
(01) https://blog.csdn.net/konghhhhh/article/details/106828402 存储器相关 (1) https://blog.csdn.net/iva_ ...
- golang 性能调优分析工具 pprof (上)
一.golang 程序性能调优 在 golang 程序中,有哪些内容需要调试优化? 一般常规内容: cpu:程序对cpu的使用情况 - 使用时长,占比等 内存:程序对cpu的使用情况 - 使用时长,占 ...
- linux性能调优概述
- 什么是性能调优?(what) - 为什么需要性能调优?(why) - 什么时候需要性能调优?(when) - 什么地方需要性能调优?(where) - 什么人来进行性能调优?(who) - 怎么样 ...
- SQL Server 性能调优(方法论)【转】
目录 确定思路 wait event的基本troubleshooting 虚拟文件信息(virtual file Statistics) 性能指标 执行计划缓冲的使用 总结 性能调优很难有一个固定的理 ...
- 数据库实例性能调优利器:Performance Insights
Performance Insights是什么 阿里云RDS Performance Insights是RDS CloudDBA产品一项专注于用户数据库实例性能调优.负载监控和关联分析的利器,以简单直 ...
- 一目了然 | 数据库实例性能调优利器:Performance Insights
Performance Insights是什么 阿里云RDS Performance Insights是RDS CloudDBA产品一项专注于用户数据库实例性能调优.负载监控和关联分析的利器,以简单直 ...
- Kafka技术专题之「性能调优篇」消息队列服务端出现内存溢出OOM以及相关性能调优实战分析
内存问题 本篇文章介绍Kafka处理大文件出现内存溢出 java.lang.OutOfMemoryError: Direct buffer memory,主要内容包括基础应用.实用技巧.原理机制等方面 ...
- 第0/24周 SQL Server 性能调优培训引言
大家好,这是我在博客园写的第一篇博文,之所以要开这个博客,是我对MS SQL技术学习的一个兴趣记录. 作为计算机专业毕业的人,自己对技术的掌握总是觉得很肤浅,博而不专,到现在我才发现自己的兴趣所在,于 ...
- JVM内存模型与性能调优
堆内存(Heap) 堆是由Java虚拟机(JVM,下文提到的JVM特指Sun hotspot JVM)用来存放Java类.对象和静态成员的内存空间,Java程序中创建的所有对象都在堆中分配空间,堆只用 ...
- iOS-------应用性能调优的25个建议和技巧
性能对 iOS 应用的开发尤其重要,如果你的应用失去反应或者很慢,失望的用户会把他们的失望写满App Store的评论.然而由于iOS设备的限制,有时搞好性能是一件难事.开发过程中你会有很多需要注意的 ...
随机推荐
- ENVI手动地理配准栅格图像的方法
本文介绍在ENVI软件中,手动划定地面控制点从而实现栅格图像相互间地理配准的方法:其中,所用软件版本为ENVI Classic 5.3 (64-bit). 首先,在软件中同时打开两景需要进行地 ...
- C#设计模式19——装饰器模式的写法
装饰器模式(Decorator Pattern)是一种结构型设计模式,它允许你动态地给一个对象添加一些额外的职责,而不需要修改这个对象的代码. What(什么) 装饰器模式是一种结构型设计模式,它允许 ...
- API NEWS | 三个Argo CD API漏洞
欢迎大家围观小阑精心整理的API安全最新资讯,在这里你能看到最专业.最前沿的API安全技术和产业资讯,我们提供关于全球API安全资讯与信息安全深度观察. 本周,我们带来的分享如下: 关于三个Argo ...
- iOS气泡提示工具BubblePopup的使用
在平时的开发中,通常新手引导页或功能提示页会出现气泡弹窗来做提示.如果遇到了这类功能通常需要花费一定的精力来写这么一个工具的,这里写了一个气泡弹窗工具,希望能帮你提升一些开发效率. 使用方法 ...
- 02-面试必会-SSM框架篇
01-什么是 Spring IOC 和 DI ? IOC : 控制翻转 , 它把传统上由程序代码直接操控的对象的调用权交给容 器,通过容器来实现对象组件的装配和管理.所谓的"控制反转&quo ...
- ResNet模型:在计算机视觉任务中实现深度学习
目录 1. 引言 2. 技术原理及概念 2.1 基本概念解释 2.2 技术原理介绍 3. 实现步骤与流程 3.1 准备工作:环境配置与依赖安装 3.2 核心模块实现 3.3 集成与测试 4. 示例与应 ...
- 利用Spire.Pdf实现PDF添加印章的操作
在一些文档处理中,我们需要对PDF盖上公司的印章操作,本篇随笔介绍利用Spire.Pdf实现PDF添加印章的操作,如全章和骑缝章的处理. 1.实现效果和处理代码 有时候,需要在特定的位置盖章,以及各个 ...
- [Spring+SpringMVC+Mybatis]框架学习笔记(三):Spring实现JDBC
上一章:[Spring+SpringMVC+Mybatis]框架学习笔记(二):Spring-IOC-DI 下一章:[Spring+SpringMVC+Mybatis]框架学习笔记(四):Spring ...
- python笔记:第十一章正则表达式
1.模块re 以一定规则,快速检索文本,或是实现一些替换操作 默认下,区分大小写 2.常见的匹配字符表 字符 描述 \d 代表任意数字,就是阿拉伯数字 0-9 这些 \D 代表非数字的字符.与\d完全 ...
- chrome pre 自动换行
问题引出 当我想要使用chrome的打印功能生成一份关于md的pdf版本的时候发现有的代码块没有自动换行,生成的PDF没有自动换行,导致部分信息无法阅读 处理方式 把有自动换行的部分处理一下,在md文 ...