TVM量化小结手册
TVM量化小结手册
文章目录
- Offical References
- TVM quantization roadmap
- INT8 quantization proposal
- Quantization Story - 2019-09
- Quantization Development
- Quantization Framework supported by TVM
- TF Quantization Related
- Pytorch Quantization Related
- MXNet related
- Tensor Core Related
- Related Commit
- Speed Up
- Comparison
- Automatic Integer Quantization
- Accepting Pre-quantized Integer models
- Speed Profile Tools
- Devices Attributes
- Copartner
- Alibaba
TVM里面关于量化的资料非常的多,虽然很有价值,但是极其零散,对于散户不是非常友好。这里汇总一下。
OFFICAL REFERENCES
TVM QUANTIZATION ROADMAP
INT8 QUANTIZATION PROPOSAL
- INT8 quantization proposal - 2018-07
- This document presents the high-level overview of quantization process, and presents a proposal for implementing that in TVM.
- introduce background on quantization
- INT8 Quantization - Code generation for backends - 2018-07
- This thread only focuses on implementation of quantized layers in TVM.
QUANTIZATION STORY - 2019-09

QUANTIZATION DEVELOPMENT
- [RFC] Search-based Automated Quantization - 2020-01-22
- I proposed a new quantization framework, which brings hardware and learning method in the loop.
- Brought the idea from some existing quantization frameworks, I choose to adopt the annotation-calibration-realization 3-phases design:
- Annotation: The annotation pass rewrites the graph and inserts simulated quantize operation according to the rewrite function of each operator. The simulated quantize operation simulates the rounding error and saturating error of quantizing from float to integer,
- Calibration: The calibration pass will adjust thresholds of simulated quantize operations to reduce the accuracy dropping.
- Realization: The realization pass transforms the simulation graph, which computes with float32 actually, to a real low-precision integer graph.

QUANTIZATION FRAMEWORK SUPPORTED BY TVM
TF QUANTIZATION RELATED
TVM support all Pre-quantized TFLite hosted
- The performance is evaluated on C5.12xlarge Cascade lake machine, supported Intel VNNI
- not autotuned the models yet.

PYTORCH QUANTIZATION RELATED
- How to convert the model to a quantized one through relay?
- telling how to set qconfig for torch.quantization.get_default_qconfig(‘fbgemm’)
- Quantized model accuracy benchmark: PyTorch vs TVM
- telling how to convert quantized pytorch model to tvm model
- compare between accuracy and speed for resent18、resent5、mobilenet-v2、moblienet-v3、inception_v3 and googlenet.
- include STATIC QUANTIZATION WITH EAGER MODE IN PYTORCH: pytorch’s quantization turorial.

- gap_quantization
- Placeholder for GAP8 export and quantization module for PyTorch
- include squeezenet-v1.1’ s quantization file.
MXNET RELATED
- Model Quantization for Production-Level Neural Network Inference
- The below CPU performance is from an AWS EC2 C5.24xlarge instance with custom 2nd generation Intel Xeon Scalable Processors (Cascade Lake).
- The model quantization delivers more stable speedup over all models, such as 3.66X for ResNet 50 v1, 3.82X for ResNet 101 v1 and 3.77X for SSD-VGG16, which is very close to the theoretical 4X speedup from INT8.

the accuracy from Apache/MXNet quantization solution is very close to FP32 models without the request of retaining the mode. In Figure 8, MXNet ensured only a small reduction in accuracy, less than 0.5%.

TENSOR CORE RELATED
- [RFC][Tensor Core] Optimization of CNNs on Tensor Core
- [Perf] Enhance cudnn and cublas backend and enable TensorCore
RELATED COMMIT
- [OPT] Low-bit Quantization #2116
- Benchmarking Quantization on Intel CPU
- [RFC][Quantization] Support quantized models from TensorflowLite#2351
- After initial investigation and effort, in the Mobilenet V1 model, INT8 can get speed up about 30% when compared with FP32 on ARM CPU.
- [TFLite] Support TFLite FP32 Relay frontend. #2365
- This is the first PR of #2351 to support importing exist quantized int8 TFLite model. The base version of Tensorflow / TFLite is 1.12.
- [Strategy] Support for Int8 schedules - CUDA/x86 #5031
- Recently introduce op strategy currently has some issues with task extraction with AutoTVM. This PR fixes them for x86/CUDA.
- [Torch, QNN] Add support for quantized models via QNN #4977

SPEED UP
COMPARISON
AUTOMATIC INTEGER QUANTIZATION
- The inference time is longer after int8 quantization
- TVM-relay.quantize vs quantization of other Framework
- TVM FP32、TVM int8、TVM int8 quantization + AutoTVM,MXNet

Quantization int8 slower than int16 on skylake CPU
- The int8 is always slower than int16 before and after the auto-tuning
- Target: llvm -mcpu=skylake-avx512
- Problem is solved by creating the int8 task explicitly
- create the task topi_x86_conv2d_NCHWc_int8
- set output dtype to int32, input dtype=uint8, weight dtype=int8

- TVM FP32、TVM int8、TVM int8 quantization , MXNet, TF1.13
- 含测试代码

8bit@Cuda: AutoTVMvs TensorRT vs MXNet
- In this post, we show how to use TVM to automatically optimize of quantized deep learning models on CUDA.

ACCEPTING PRE-QUANTIZED INTEGER MODELS
- Is there any speed comparison of quantization on cpu
- discuss a lot about speed comparison among torch-fp32, torch-int8, tvm-fp32, tvm-int16, tvm-int8

SPEED PROFILE TOOLS
Node Name Ops Time(us) Time(%) Start Time End Time Shape Inputs Outputs
--------- --- -------- ------- ---------- -------- ----- ------ -------
1_NCHW1c fuse___layout_transform___4 56.52 0.02 15:24:44.177475 15:24:44.177534 (1, 1, 224, 224) 1 1
_contrib_conv2d_nchwc0 fuse__contrib_conv2d_NCHWc 12436.11 3.4 15:24:44.177549 15:24:44.189993 (1, 1, 224, 224, 1) 2 1
relu0_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 4375.43 1.2 15:24:44.190027 15:24:44.194410 (8, 1, 5, 5, 1, 8) 2 1
_contrib_conv2d_nchwc1 fuse__contrib_conv2d_NCHWc_1 213108.6 58.28 15:24:44.194440 15:24:44.407558 (1, 8, 224, 224, 8) 2 1
relu1_NCHW8c fuse___layout_transform___broadcast_add_relu___layout_transform__ 2265.57 0.62 15:24:44.407600 15:24:44.409874 (64, 1, 1) 2 1
_contrib_conv2d_nchwc2 fuse__contrib_conv2d_NCHWc_2 104623.15 28.61 15:24:44.409905 15:24:44.514535 (1, 8, 224, 224, 8) 2 1
relu2_NCHW2c fuse___layout_transform___broadcast_add_relu___layout_transform___1 2004.77 0.55 15:24:44.514567 15:24:44.516582 (8, 8, 3, 3, 8, 8) 2 1
_contrib_conv2d_nchwc3 fuse__contrib_conv2d_NCHWc_3 25218.4 6.9 15:24:44.516628 15:24:44.541856 (1, 8, 224, 224, 8) 2 1
reshape1 fuse___layout_transform___broadcast_add_reshape_transpose_reshape 1554.25 0.43 15:24:44.541893 15:24:44.543452 (64, 1, 1) 2 1
DEVICES ATTRIBUTES
COPARTNER
Please go tvmai/meetup-slides for more recently info what ohter copartners have done for tvm.
ALIBABA
- 记录一下2019
- 介绍阿里在TVM上的发展历程
- 在今年(2019年)4月份的时候,我又回来和同事一起搞ARM CPU量化优化了,因为这是有业务要用的。我们一起吭哧吭哧搞了一段时间,可以很高兴的说我们比QNNPack更快,在Mobilenet V1上是1.61x TFLite,1.27X QNNPACK,Mobilenet V2是2X TFLite, 1.34X QNNPack。
- TVM@AliOS




TVM量化小结手册的更多相关文章
- TVM vs TensorRT比较
TVM vs TensorRT比较 如果理解正确的话,TensorRT和TVM会加快预测速度. TensorRT优化预测GPU和TVM优化预测几乎所有平台支持GPU,ARM,Mobile... 两者在 ...
- ANN中乘积量化与多维倒排小结
目前特征向量的比对加速优化能极大缩短比对耗时,改善用户体验. 优化的途径主要有两种,一是使用指令集(SSE,AVX)加速运算.二是使用ANN替代暴力搜索. 乘积量化和倒排索引组合是ANN中效果较好且实 ...
- TVM设计与构架构建
TVM设计与构架构建 本文档适用于希望了解TVM体系结构和/或在项目上进行积极开发的开发人员.该页面的组织如下: 实例编译流程Example Compilation Flow描述TVM把一个模型的高级 ...
- React JS快速开始手册
怎样用React JS构建一个用户界面?本文将快速地给你一个React JS的概览.代码,请君移步react-starter 概念 React只有很少的API,这使得它很容易去学习与理解.当然,使用它 ...
- js中各种跨域问题实战小结(一)
什么是跨域?为什么要实现跨域呢? 这是因为JavaScript出于安全方面的考虑,不允许跨域调用其他页面的对象.也就是说只能访问同一个域中的资源.我觉得这就有必要了解下javascript中的同源策略 ...
- Eclipse上GIT插件EGIT使用手册
http://blog.csdn.net/luckarecs/article/details/7427605 Eclipse上GIT插件EGIT使用手册 一_安装EGIT插件 http://dow ...
- sql编程小结
对照mysql5.1手册,对这几天学的sql编程进行小结,主要涉及触发器.存储过程.权限管理.主从分离等,权当抛砖引玉,高手请略过. 一.触发器 通俗的说就是在指定的数据表增删改的前或后触发执行特定的 ...
- Git版本控制软件结合GitHub从入门到精通常用命令学习手册(转)
简要参考:http://www.tuicool.com/articles/mEvaq2 http://gitref.org/zh/index.html GIT 学习手册简介 本站为 Git 学习参考手 ...
- Solaris 命令 小结
Solaris 命令 小结 prstat -a 系统进程监控 Solaris 10默认的shell是sh,可以改成bash #useradd -m -d /home/dave dave -s /bin ...
随机推荐
- 微信小程序中的自定义组件
微信小程序中的组件 前言 之前做小程序开发的时候,对于开发来说比较头疼的莫过于自定义组件了,当时官方对这方面的文档也只是寥寥几句,一笔带过而已,所以写起来真的是非常非常痛苦!! 好在微信小程序的库从 ...
- 【JDK8】Java8 LocalDate操作时间和日期的API
时间项目中的涉及到的时间处理非常多,犹豫SimpleDateFormat的不安全性以及Calendar等类在计算时比较复杂, 往往我们都会使用工具类来封装较多的日期处理函数, 但是JDK8中新增了操作 ...
- 【SpringBoot】Springboot1.5.9整合WebSocket
一.WebSocket介绍 1.WebSocket是什么? WebSocket是协议,是HTML5开始提供的基于TCP(传输层)的一种新的网络协议, 它实现了浏览器与服务器全双工(full-duple ...
- (翻译)Attacking Interoperability(攻击互操作性)in Black Hat 2009 研究报告
前言 攻击互操作性(Attacking Interoperability)是 Mark & Ryan & David 发表于 2009 年的美国黑帽大会(Black Hat)上的一份研 ...
- <JVM中篇:字节码与类的加载篇>03-类的加载过程(类的生命周期)详解
笔记来源:尚硅谷JVM全套教程,百万播放,全网巅峰(宋红康详解java虚拟机) 同步更新:https://gitee.com/vectorx/NOTE_JVM https://codechina.cs ...
- 使用navicat连接阿里云上mysql
使用宝塔面板安装mysql Linux基本内容,里面有涉及到安装Mysql 修改密码 而且也要在数据库的菜单中设置root密码 修改后密码后进行登录,就不会出现下面的报错了 [root@centos7 ...
- Activity,Tasks
常见的一些Activity的打开方式: //1.拨打电话 // 给移动客服10086拨打电话 Uri uri = Uri.parse("tel:10086"); Intent in ...
- vue中v-if与v-show的区别以及使用场景
区别 1.手段:v-if是通过控制dom节点的存在与否来控制元素的显隐:v-show是通过设置DOM元素的display样式,block为显示,none为隐藏: 2.编译过程:v-if切换有一个局部编 ...
- Servlet三大域对象
Servlet三大域对象的应用 request.session.application(ServletContext) ServletContext是一个全局的储存信息的空间,服务器开始就存在,服务器 ...
- web.xml常用配置详解
web.xml常用配置详解 context-param 指定 ServletContext(上下文) 配置文件路径,基本配置一般是Spring配置文件,或者是spring-security的配置文件. ...