[转帖]TaiShan v110 - Microarchitectures - HiSilicon
https://en.wikichip.org/wiki/hisilicon/microarchitectures/taishan_v110
| Edit Values | |
| TaiShan v110 µarch | |
| General Info | |
| Arch Type | CPU |
| Designer | HiSilicon |
| Manufacturer | TSMC |
| Introduction | 2019 |
| Process | 7 nm |
| Core Configs | 32, 48, 64 |
| Pipeline | |
| Type | Superscalar, Superpipeline |
| OoOE | Yes |
| Speculative | Yes |
| Reg Renaming | Yes |
| Decode | 4-way |
| Instructions | |
| ISA | ARMv8.2-A |
| Extensions | NEON |
| Cache | |
| L1I Cache | 64 KiB/core |
| L1D Cache | 64 KiB/core |
| L2 Cache | 512 KiB/core |
| L3 Cache | 1 MiB/core |
| Succession | |
|
|
|
TaiShan v110 is the successor to the TaiShan v100, a high-performance ARM server microarchitecture designed by HiSilicon for Huawei's own TaiShan servers.
contents
[hide]
Brands[edit]
TaiShan-based CPUs are branded as the Kunpeng 920 series.
Release Dates[edit]
Kunpeng 920 CPUs were officially launched in early 2019.
Architecture[edit]
Overview
Key changes from TaiShan v100[edit]
- TSMC 7 nm HPC process (from 16 nm)
- 2x core count (64, up from 32)
- Custom cores (from Cortex-A72)
- ASIMD
- double SP Vector throughput (2 inst/cycle, up from 1)
- ASIMD
- Custom cores (from Cortex-A72)
- Memory
- 2x memory channels (8, up from 4)
- I/O
- PCIe Gen 4 (from Gen 3)
This list is incomplete; you can help by expanding it.
Block Diagram[edit]
Entire Chip[edit]
Memory Hierarchy[edit]
- Cache
- L1I Cache
- 64 KiB/core, private
- 64-byte cache lines
- L1D Cache
- 64 KiB/core, private
- 64-byte cache lines
- L2 Cache
- 512 KiB/core, private
- L3 Cache
- 1 MiB/core
- Shared by all cores
- System DRAM
- 1 TiB Max Memory / socket
- 8 Channels
- DDR4, up to 2933 MT/s
- 1 DPC and 2 DPC support
- 8 B/cycle/channel (@ memory clock)
- ECC, SDDC, DDDC
- L1I Cache
Overview[edit]
Overview
Though HiSilicon has a history of designing Arm processors. The TaiShan v110 core is HiSilicons' first custom homegrown high-performance ARM core and SoC design. The chip, which incorporates multiple compute dies and an I/O is a multi-chip package, is fabricated on TSMC's 7-nanometers HPC process and integrates up to 64 cores and up to 64 MiB of last level cache.
The SoC also incorporates a number of hardware accelerators. There is a crypto engine that supports AES, DES/3DES, MD5, SHA1, SHA2, HMAC, CMAC with throughputs of up to 100 Gbit/s. Additionally, there is also a compression engine supporting GZIP, LZS, LZ4 with compression throughputs of up to 40 Gbit/s and decompression of up to 100 Gbit/s.
Marketed as the Kunpeng 920, this SoC supports up to 4-way multiprocessing support through HiSilicon's Hydra interface. In order to keep the cores fed, eight DDR4 memory channels are incorporated per socket. Additionally, designed to facilitate an easy accelerator platform, there are 40 PCIe Gen 4 lanes provided per socket with CCIX support, enabling cache coherency.
Core[edit]
Each core is a 4-way out-of-order superscalar that implements the ARMv8.2-A ISA. Huawei stated that the core supports almost all the ARMv8.4 features with a few exceptions, including dot product and the FP16 FML extension. It features private 64 KiB L1 instruction and data caches as well as 512 KiB of private L2. Though light on details, Huawei says that compared to Arm's Cortex cores, their core features an improved memory subsystem, a larger number of execution units, and a better branch predictor.
ASIMD[edit]
Each core features a single 128-bit NEON unit. It is capable of executing single double-precision FMA vector instruction per cycle or two single-precision vector instructions per cycle. Operating at 2 GHz, a 64-core chip will have a peak compute of 512 GigaFLOPS of double-precision floating point. It's worth noting that compared to the TaiShan v100, the throughput for single-precision vector has been doubled from 1 to 2 instructions per cycle.
MCP physical design[edit]
The SoC itself comprises 3 dies - two Super CPU Cluster (SCCL) compute dies and a Super IO Cluster (SICL). The SCCL compute dies contains 8 CPU Clusters (CCLs), memory controllers, and the L3 cache block. There are eight CCLs on each of the SICL dies for a total of 64 cores. The CCLs are TaiShan V110 quadplex along with the L3 cache tags partition. The Super IO Clusters include the various I/O peripherals including PCIe Gen 4, SAS, the network interface controllers, and the Hydra links.
Scalability[edit]
- See also: Hydra Interface
Each chip incorporates three Hydra interface ports. The Hydra interface facilitates the cache coherency between the dies on the chip. Every link supports 240 Gb/s (30 GB/s) of peak bandwidth for a total aggregated bandwidth of 720 Gb/s (90 GB/s) in a 2-way symmetric multiprocessing configuration.
With all three links, there is also support for 4-way SMP. In this configuration, one link from each socket is connected to another socket for an all-for-all connection.
Chipset[edit]
Along with the Hi1620 SoC, HiSilicon developed a number of integrated circuits as part of the chipset platform.
| Chip | Description |
|---|---|
| Hi1620 | CPU, Kunpeng 920 series Chip |
| Hi1503 | CPU interconnect chip, supports scaling-up to 32 sockets |
| Hi1812 | SSD storage controller, for read/write I/O acceleration |
| Hi1822 | Network controller chip, DC high-speed flexible interconnect |
| Hi1710 | BMC management chip + enhanced RAS features chip |
Die[edit]
- TSMC 7 nm HPC
- 20,000,000,000 transistors
- 3-4 dies
All TaiShan v110 Chips[edit]
Bibliography[edit]
- Huawei. Personal Communication. 2019
- Huawei Connect 2018. October 2018
- HiSilicon Event. January 7, 2019
- Huawei, Supercomputing 2018
| codename | TaiShan v110 + |
| core count | 32 +, 48 + and 64 + |
| designer | HiSilicon + |
| first launched | 2019 + |
| full page name | hisilicon/microarchitectures/taishan v110 + |
| instance of | microarchitecture + |
| instruction set architecture | ARMv8.2-A + |
| manufacturer | TSMC + |
| microarchitecture type | CPU + |
| name | TaiShan v110 + |
| process | 7 nm (0.007 μm, 7.0e-6 mm) + |
[转帖]TaiShan v110 - Microarchitectures - HiSilicon的更多相关文章
- 华为TaiShan 2280 ARM 服务器
华为TaiShan 2280 ARM 服务器 华为TaiShan 2280 ARM 服务器 https://e.huawei.com/cn/products/cloud-computing-dc/s ...
- nginx负载均衡基于ip_hash的session粘帖
nginx负载均衡基于ip_hash的session粘帖 nginx可以根据客户端IP进行负载均衡,在upstream里设置ip_hash,就可以针对同一个C类地址段中的客户端选择同一个后端服务器,除 ...
- [转帖]网络协议封封封之Panabit配置文档
原帖地址:http://myhat.blog.51cto.com/391263/322378
- [转帖]零投入用panabit享受万元流控设备——搭建篇
原帖地址:http://net.it168.com/a2009/0505/274/000000274918.shtml 你想合理高效的管理内网流量吗?你想针对各个非法网络应用与服务进行合理限制吗?你是 ...
- 3d数学总结帖
3d数学总结帖,以下是对3d学习过程中数学知识的简单总结 角度值和弧度制的互转 Deg2Rad 角度A1转弧度A2 => A2=A1*PI/180 Rad2Deg 弧度A2转换角度A1 => ...
- [转帖]The Lambda Calculus for Absolute Dummies (like myself)
Monday, May 7, 2012 The Lambda Calculus for Absolute Dummies (like myself) If there is one highly ...
- [转帖]FPGA开发工具汇总
原帖:http://blog.chinaaet.com/yocan/p/5100017074 ----------------------------------------------------- ...
- [Android分享] 【转帖】Android ListView的A-Z字母排序和过滤搜索功能
感谢eoe社区的分享 最近看关于Android实现ListView的功能问题,一直都是小伙伴们关心探讨的Android开发问题之一,今天看到有关ListView实现A-Z字母排序和过滤搜索功能 ...
- AxureRP7.0各类交互效果汇总帖(转)
了便于大家参考,我把这段时间发布分享的所有关于AxureRP7.0的原型做了整理. 以下资源均有对应的RP源文件可以下载. 当然 ,其中有部分是需要通过完成解密游戏[攻略]才能得到下载地址或者下载密码 ...
- 未能加载文件或程序集“Newtonsoft.Json, Version=4.0.0.0, Culture=neutral, PublicKeyToken=30a [问题点数:40分,结帖人u010259408]
未能加载文件或程序集“Newtonsoft.Json, Version=4.0.0.0, Culture=neutral, PublicKeyToken=30a [问题点数:40分,结帖人u01025 ...
随机推荐
- WinDbg实践--入门篇
WinDbg从字面意思就是Windows+Debug的组合,即Windows平台上的调试工具,可以调试用户模式.内核模式.dump文件等,总之知道它的调试功能非常强大就行了.WinDbg调试命令分 ...
- Java 在PDF中添加文本水印、图片水印(基于Spire.Cloud.SDK for Java)
Spire.Cloud.SDK for Java提供了接口pdfWartermarkApi可用于添加文本水印addTextWartermark()和图片水印addImageWartermark()到P ...
- 【华为云技术分享】网络场景AI模型训练效率实践
[摘要] 问题 KPI异常检测项目需要对设备内多模块.多类型数据,并根据波形以及异常表现进行分析,这样的数据量往往较大,对内存和性能要求较高.同时,在设计优化算法时,需要快速得到训练及测试结果并根据结 ...
- openGauss内核分析:查询重写
摘要:查询重写优化既可以基于关系代数的理论进行优化,也可以基于启发式规则进行优化. 本文分享自华为云社区<openGauss内核分析(四):查询重写>,作者:酷哥. 查询重写 SQL语言是 ...
- AI为啥能读懂说话人的情感?
摘要:本文介绍了语音情感识别领域的发展现状,挑战,重点介绍了处理标注数据缺乏的问题. 本文分享自华为云社区<语音情感识别的应用和挑战>,作者:SSIL_SZT_ZS. 情感在人与人的交流中 ...
- 华为云VSS漏洞扫描服务之开源组件漏洞检测能力
摘要:华为云VSS漏洞扫描服务提供针对于Web.主机和软件包的漏洞检测能力. 近日Apache Log4j2漏洞持续发酵,已成为中国互联网2021年年底前最大的安全事件.华为云VSS漏洞扫描服务,提供 ...
- Docker 安装 kafka
简单安装为了集成 SpringBoot,真实使用,增加增加更多配置,比如将log映射出来 1.安装 zookeeper [root@centos-linux ~]# docker pull wurst ...
- 【短道速滑二】古老的基于亮度平均值的自动Gamma校正算法。
在github上搜索代码Auto Gamma Correction,找到一个比较古老的代码,详见:https://github.com/PedramBabakhani/Automatic-Gamma- ...
- 神经网络优化篇:详解动量梯度下降法(Gradient descent with Momentum)
动量梯度下降法 还有一种算法叫做Momentum,或者叫做动量梯度下降法,运行速度几乎总是快于标准的梯度下降算法,简而言之,基本的想法就是计算梯度的指数加权平均数,并利用该梯度更新的权重. 例如,如果 ...
- 2012年第三届蓝桥杯【C++省赛B组】
第一题:微生物增殖 假设有两种微生物 X 和 Y X出生后每隔3分钟分裂一次(数目加倍),Y出生后每隔2分钟分裂一次(数目加倍). 一个新出生的X,半分钟之后吃掉1个Y,并且,从此开始,每隔1分钟吃1 ...