Intel MIC
http://en.wikipedia.org/wiki/Intel_MIC
Intel MIC
| Designer | Intel |
|---|---|
| Design | manycore extended x86/x64 design |
| Registers | |
| General purpose | Intel Architecture registers |
| Floating point | 512-bit SIMD vector registers |
Intel Many Integrated Core Architecture or Intel MIC (pronounced Mike) is a multiprocessor computer architecture developed by Intel incorporating earlier work on the Larrabee many core architecture, the Teraflops Research Chipmulticore chip research project, and the Intel Single-chip Cloud Computer multicore microprocessor.
Prototype products codenamed Knights Ferry were announced and released to developers in 2010. A commercial release, codenamed Knights Corner to be built on a 22nm process was scheduled to go into production in late 2012.
In September 2011, the Texas Advanced Computing Center (TACC) announced it would use Knights Corner cards in their 10 PetaFLOPS "Stampede" supercomputer, providing 8 PetaFLOPS of computing power.
At the International Supercomputing Conference (2012, Hamburg), Intel announced the branding of the processor product family as Intel Xeon Phi.
In November 2012, Intel formally announced the first products citing claims of CPU-like versatile programmability, high performance and power efficiency.[1] The Green 500 list placed a system using these new products as the most power efficient computer in the world.[2]
In June 2013, the Tianhe-2 supercomputer at the National Supercomputing Center in Guangzhou (NSCC-GZ) was announced[3] as the world's fastest supercomputer. It utilizes Intel Ivy Bridge-EP Xeon and Xeon Phi processors to achieve 33.86 PetaFLOPS.[4]
Contents
[hide]
History[edit]
Background[edit]
The Larrabee microarchitecture (in development since 2006[5]) introduced very wide (512-bit) SIMD units to a x86 architecture based processor design, extended to a cache coherent multiprocessor system connected via a ring bus to memory; each core was capable of 4-way multi-threading. Due to the design being intended for GPU as well as general purpose computing the Larrabee chips also included specialised hardware for texture sampling.[6][7] The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010.[8]
Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the 'Single Chip Cloud Computer', (prototype introduced 2009.[9]), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores - the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for interchip messaging. The design lacked cache coherent cores and focused on principles that would allow the design to scale to many more cores.[10]
The Teraflops Research Chip (prototype unveiled 2007[11]) was an experimental 80 core chip with two floating point units per core implementing not x86 but a 96-bit VLIW architecture.[12] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power.[13][14]
Knights Ferry[edit]
Intel's MIC prototype board, named Knights Ferry, incorporating a processor codenamed Aubrey Isle was announced 31 May 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer.[15][16]
The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with 4 threads per core, 2 GB GDDR5 memory,[17] and 8 MB coherent L2 cache (256 kB per core with 32 kB L1 cache), and a power requirement of ~300 W,[17] built at a 45 nm process.[18] In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory.[19] Single board performance has exceeded 750 GFLOPS.[18] The prototype boards only support single precision floating point instructions.[20]
Initial developers included CERN, Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Centre. Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others.[21]
Knights Corner[edit]
The Knights Corner product line is expected to be made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is expected to lead to commercial products.[15][18]
In June 2011, SGI announced a partnership with Intel to utilize the MIC architecture in its high performance computing products.[22] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10 PetaFLOPS "Stampede" supercomputer, providing 8 PetaFLOPS of the compute power.[23] According to "Stampede: A Comprehensive Petascale Computing Environment" the "second generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS."[24]
On November 15, 2011, Intel showed an early silicon version of a Knights Corner processor.[25][26]
On June 5, 2012, Intel released open source software and documentation regarding Knights Corner.[27]
In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems.[28][29]
In June 2012, ScaleMP announced it will provide its virtualization software to allows using 'Knight's Corner' chips (branded as 'Xeon Phi') as main processor transparent extension. The virtualization software will allow 'Knight's Corner' to run legacy MMX/SSE code and access unlimited amount of (host) memory without need for code changes.[30]
The Knight's Corner chip was announced as being rebranded as 'Xeon Phi' at the 2012 Hamburg International Supercomputing Conference.[31][32]
Knights Landing[edit]
Code name for the second generation MIC architecture product from Intel.[24] Intel officially first revealed details of its second generation Intel Xeon Phi products on June 17, 2013.[4] Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth. Knights Landing will support AVX-512.[33]
Xeon Phi[edit]
On June 18, 2012, Intel announced that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture.[34][35][36][37][38]
On September 11, 2012, it was announced that a supercomputer called Stampede will be based on the Xeon Phi.[39] Stampede will be capable of 10 petaflops.[39]
On November 12, 2012, Intel announced two Xeon Phi coprocessor families which are the Xeon Phi 3100 and the Xeon Phi 5110P.[40][41][42] The Xeon Phi 3100 will be capable of more than 1 teraflops of double precision floating point instructions with 240 GB/sec memory bandwidth at 300 W.[40][41][42] The Xeon Phi 5110P will be capable of 1.01 teraflops of double precision floating point instructions with 320 GB/sec memory bandwidth at 225 W.[40][41][42] The Xeon Phi 7120P will be capable of 1.2 teraflops of double precision floating point instructions with 352 GB/sec memory bandwidth at 300 W.
The Xeon Phi uses the 22 nm process size.[40][41][42] The Xeon Phi 3100 will be priced at under US$2,000 while the Xeon Phi 5110P will have a price of US$2,649 and Xeon Phi 7120 at US$4129.00.[40][41][42] On June 17, 2013, the Tianhe-2 supercomputer was announced[3] by TOP500 as the world's fastest. It uses Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 PetaFLOPS.
Design[edit]
The cores of Intel MIC are based on a modified version of P54C design, used in the original Pentium.[43] The basis of the Intel MIC architecture is to leverage x86 legacy by creating a x86-compatible multiprocessor architecture that can utilize existing parallelization software tools.[18] Programming tools includeOpenMP, OpenCL,[44] Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++[45] and math libraries.[46]
Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, coherent L2 cache, and ultra-wide ring bus connecting processors and memory.
The Knights Corner instruction set documentation is available from Intel.[47][48]
Competitors[edit]
- Nvidia Tesla, direct competitor in the HPC market.[49]
See also[edit]
http://www.zdnet.com/sc13-intel-reveals-knights-landing-high-performance-cpu-7000023393/
SC13: Intel reveals Knights Landing high-performance CPU
Summary: Once a niche, high-performance computing has become a key growth area for the tech industry. Intel’s announcements at Supercomputing 13 today---including new details of a completely redesigned Many Integrated Core processor—show just how important technical computing has become.

By John Morris for Laptops & Desktops | November 19, 2013 -- 21:36 GMT (13:36 PST)
High-performance computing, once a niche area catering to academia and government, has become a key growth area for the tech industry as countries battle to develop the first exascale supercomputers and companies adopt the technology. Intel’s announcements at SC13 today---including new details of a completely redesigned Many Integrated Core processor—show just how important technical computing has become.

Intel released its first Xeon Phi in late 2012 and expanded the product line in June 2013. Known as Knights Corner and manufactured on a 22nm process, these are all co-processors, meaning they must be used with a host x86 processor (generally a Xeon server chip) connected over a PCI-Express bus much like Nvidia Tesla and AMD FirePro accelerators.
The Xeon Phi co-processor is already used in Tianhe-2, the world’s fastest supercomputer andone of 13 systems on the Top500 list that now employ Intel’s Knights Corner. Hazra said that what is “perhaps more exciting” is that in addition to some wins on the Top500, Xeon Phi is also starting to be adopted more broadly in mainstream high-performance applications.
What's Hot on ZDNet
Intel is clearly fast-tracking the development of its Many-Integrated Core architecture. The next version, code-named Knights Landing, will not only be manufactured on a more advanced 14nm process, but it will also include significant changes to the core and other parts of the chip designed to increase performance and improve efficiency.
“It’s a major transition from Knights Corner,” said Raj Hazra, Vice President of the Data Center Group and General Manager of the Technical Computing Group. “You can think of Many-Core as a tock, tock, tock cadence” referring to the so-called tick-tock cadence that Intel uses to introduce major changes to its mainstream Core architecture every other year. To translate, Intel will use the extra transistors provided by Moore’s Law to make big changes.
The biggest of these is that Knights Landing will be a standalone many-core CPU that will fit into standard rack architecture and run its own operating system without needing a separate host CPU. That means Knights Landing can be used as a homogenous, many-core processor in everything from workstations to massive supercomputer clusters without having to develop for heterogeneous systems that offload certain data to accelerators.
“It will have the performance of an accelerator but you will view it as a software developer as a CPU,” Hazra said. “It’s the best of both worlds.” Though Intel is clearly emphasizing its use as a CPU, Knights Landing will also be available in a PCI-Express card as a drop-in replacement for Knights Corner.
Near Memory
The second big change is in the memory architecture. Knights Landing will have a relatively large pool of high-bandwidth “Near Memory” in the CPU package, in addition to the standard DDR memory on the board (aka “Far Memory”). The addition of the Near Memory is meant to boost the performance on memory-bound workloads.
Hazra said this isn’t a new memory hierarchy since developers can treat it as one flat memory space and leave everything up the system software, but Intel also plans to offer developer tools to further optimize applications for the extra high-bandwidth memory. Intel did not say exactly how much extra memory will be in the chip package, but Hazra said it will have “enough capacity to hold meaningful workloads.”

Hazra also talked a bit about how Intel is “opening the door” to customer requests for more customized products. This goes beyond system-level customization to the developments of chips with different types of cores, operating frequencies or thermal envelopes designed for specific sorts of tasks.
Competitor AMD has also talked extensively about developing semi-custom SoCs, but so far neither company has provided examples of real-world products.
Software efforts
Intel has also intensified its efforts in software for high-performance computing. Intel has thousands of software engineers, and is already a big contributor to the Linux kernel and the Android ecosystem. Intel’s Boyd Davis, Vice President of the Data Center Group and General Manager of the Datacenter Software Division, said that modular hardware and open software is driving a lot of growth not only in the cloud and high-performance computing, but also “bleeding over” into the enterprise.
These open-source projects are so disruptive, Davis said, that Intel felt it had to develop its own software for the cloud and HPC. That began with the acquisition last year of Whamcloud, one of the key players behind the Lustre parallel file system used in many of the world's top supercomputers, and the release of Intel Enterprise Edition for Lustre.
At SC13 today Intel announced an HPC Distribution for Apache Hadoop (which runs on Intel Enterprise Edition for Lustre), Cloud Edition for Lustre running on Amazon Web Services Marketplace, and turnkey hardware and software solutions for Enterprise Edition for Lustre from several partners (Advanced HPC, Aeon Computing, Atipa, Boston Ltd., Colfax, E4 Computer Engineering, NOVATTE and System Fabric Works).
Earlier in the day, in the opening keynote address of SC13, Dr. Genevieve Bell, an Intel Fellow and Director of User Experience Research, gave the sort of wide-ranging talk on big data that you’d expect from anthropologist. She defined big data as the combination of data, visualization, analytics and algorithms and talked about some of the earliest examples reaching all the way back to theDomesday Book.
More powerful systems may enable us to analyze larger sets, but big data has been around a long time. “Computers didn’t invent big data. Humans did,” she said. “We are the people who build, we are the people make it, we are the people who use it.”
Bell said that big data holds extraordinary potential in areas such as climate change, energy, medicine, education, and it will be limited not by technology but only by the human imagination.
Topics: Processors, Intel
Intel MIC的更多相关文章
- Intel processor brand names-Xeon,Core,Pentium,Celeron----Xeon
http://en.wikipedia.org/wiki/Comparison_of_Intel_processors Processor Series Nomenclature Code Name ...
- linux内核更新前后配置文件的比较
说明:这里先给出一个比较的结果,作为记录,后续会给出内核配置差异的详细解释. [root@xiaolyu linux-4.7.2]# diff .config .config_bak 3c3< ...
- 第一个 MIC shared_memory 程序
设置Intel编译器的运行环境 在terminal中执行编译器的环境脚本 compilervars.sh: source <install-dir>/bin/compilervars.sh ...
- MIC性能优化策略
MIC性能优化主要包括系统级和内核级:系统级优化包括节点之间,CPU与MIC之间的负载均衡优化:MIC内存空间优化:计算与IO并行优化:IO与IO并行优化:数据传递优化:网络性能优化:硬盘性能优化等. ...
- Intel CPUs
http://en.wikipedia.org/wiki/Intel_cpus List of Intel Atom microprocessors List of Intel Xeon microp ...
- Intel主板芯片组
写这个的初衷还是由于linux内核本身就是硬件的抽象,如果你对硬件的相关发展,机制以及架构不了解,实际你也是看不懂linux内核代码以及看不懂linux很多命令输出的结果的,如果你看内核代码就会发现内 ...
- Intel Media SDK H264 encoder GOP setting
1 I帧,P帧,B帧,IDR帧,NAL单元 I frame:帧内编码帧,又称intra picture,I 帧通常是每个 GOP(MPEG 所使用的一种视频压缩技术)的第一个帧,经过适度地压缩,做为随 ...
- [Intel Edison开发板] 05、Edison开发基于MRAA实现IO控制,特别是UART通信
一.前言 下面是本系列文章的前几篇: [Intel Edison开发板] 01.Edison开发板性能简述 [Intel Edison开发板] 02.Edison开发板入门 [Intel Edison ...
- [Intel Edison开发板] 04、Edison开发基于nodejs和redis的服务器搭建
一.前言 intel-iot-examples-datastore 是Intel提供用于所有Edison开发板联网存储DEMO所需要的服务器工程.该工程是基于nodejs和redis写成的一个简单的工 ...
随机推荐
- linux查找和替换命令
http://blog.csdn.net/imyang2007/article/details/8105499 命令的东西应该多练,熟能生巧.
- Nmap手册
转自:http://drops.xmd5.com/static/drops/tips-4333.html 0x00:说明 只是一个快速查询手册,理论的东西都没有补充,欢迎大家积极在评论区补充自己常用的 ...
- appium+python自动化-adb文件导入和导出(pull push)
前言 用手机连电脑的时候,有时候需要把手机(模拟器)上的文件导出到电脑上,或者把电脑的图片导入手机里做测试用,我们可以用第三方的软件管理工具直接复制粘贴,也可以直接通过adb命令导入和导出. adb ...
- CF802D
D. Marmots (easy) time limit per test 2 seconds memory limit per test 256 megabytes input standard i ...
- 【bzoj4197】[Noi2015]寿司晚宴 分解质因数+状态压缩dp
题目描述 为了庆祝 NOI 的成功开幕,主办方为大家准备了一场寿司晚宴.小 G 和小 W 作为参加 NOI 的选手,也被邀请参加了寿司晚宴. 在晚宴上,主办方为大家提供了 n−1 种不同的寿司,编号 ...
- 【Luogu】P1594护卫队(前缀和+DP)
TM搞了半天的二维DP方程还是错的. 这是题目链接: 设f[i]表示前i辆车顺利通过的最小时间. 则对于每一个i枚举该组车的起点j,然后从所有的f[j]+Min[j][i]中选一个最小的. Min[j ...
- 暑假训练Round1——G: Hkhv的水题之二(字符串的最小表示)
Problem 1057: Hkhv的水题之二 Time Limits: 1000 MS Memory Limits: 65536 KB 64-bit interger IO format: ...
- 有大神告诉我为什么pymysql导入失败
import json import requests import pymysql url = 'https://xueqiu.com/v4/statuses/public_timeline_by_ ...
- 算法复习——数位dp(不要62HUD2089)
题目 题目描述 杭州人称那些傻乎乎粘嗒嗒的人为 62(音:laoer). 杭州交通管理局经常会扩充一些的士车牌照,新近出来一个好消息,以后上牌照,不再含有不吉利的数字了,这样一来,就可以消除个别的士司 ...
- 【基础操作】FFT / DWT / NTT / FWT 详解
1. 2. 点值表示法 假设两个多项式相乘后得到的多项式 的次数(最高次项的幂数)为 $n$.(这个很好求,两个多项式的最高次项的幂数相加就得到了) 对于每个点,要用 $O(n)$ 的时间 把 $x$ ...