intel：spectre&Meltdown侧信道攻击（四）—

　　前面简单介绍了row hammer攻击的原理和方法，为了更好理解这种底层硬件类攻击，今天介绍一下cpu的cache mapping；

　　众所周知，cpu从内存读数据，最开始用的是虚拟地址，需要通过分页机制，将虚拟地址转换成物理地址，然后从物理地址（默认是DRAM，俗称内存条）读数据；但内存条速度和cpu相差近百倍，由此诞生了L1\L2\L3 cache；cpu取数据时，会先从各个层级的cache去找，没有的再从内存取；那么问题来了，L3 cache里面有set、slice、line等模块将整个cache划分成一个一个64byte的cache line，cpu是怎么根据物理地址从L3 cache中取数据的了？比如8MB的L3 cache，一共有8MB/64byte = 2,097,152个cache line，cpu怎么根据物理地址精确地找到目标cache line了？

　　1、直接映射（单路相连）

　　假如物理地址是0x654，这个地址对应的L3 cache的哪个存储单元了？先看一种最简单的情况：

假如有8个cache line，需要3bit遍历，中间标黄的010就是cache line之间的index；
假如每个cache line 长度是8byte，同样只需要3bit就能遍历所有bbyte，标蓝的就是cache line内部的offset
剩下标绿的11001就是tag；cpu额外有个tag array，通过index取出tag array中的tag，和11001对比，如果是，说明这个byte就是该物理地址对应的存储单元，可以马上取数据了，这叫cache hit；否则称为cache miss；

　　直接映射有缺陷：如果两个物理地址的index和offset都一样，但tag不同，也会映射到同一个cache line，增加了刷新cache的时间成本。由此产生了改进的方法，

　　 2、两路相连

　　和1的直连比，仅仅把tag array和cache line组均分成2分，offset和index寻址不变，仅仅是tag对比改变：这里由于分了两组，所以会有2个tag，只要物理地址的tag和其中一个相同，就算cache hit；相当于多了一次tag比对的机会，增加了命中概率；比如物理地址的tag=0x32，和tag array左边那个是一样的，那么cache line就用way0的；

如果继续分组，比如4组，就是4way；8组就是8way了，以此类推（后面我在kali上做实验，查到cache是4 way的，也就是说每个物理地址的tag都有4次对比的机会，命中的概率还是蛮大的）；

　再举例，比如缓存总大小32 KB，由4路（4slice，或则说4core）组相连cache，cache line大小是32 Bytes，该怎么划分了？

　总大小32KB，由4路，每路8KB；
每个cache line 32byte，那么一共有8KB/32byte=256个，所以index至少8bit；
每个cache line 32byte，offset至少5bit；

　　整个规划架构如下：

　　3、全连接

　　所有的cache line都在一个组内，因此地址中不需要index部分；可根据地址中的tag部分和所有的cache line对应的tag进行比较（硬件上可能并行比较也可能串行比较），哪个tag比较相等，就命中某个cache line，所以在全相连缓存中，任意地址的数据可以缓存在任意的cache line；但这么做成本很高；

　　4、前面介绍3中cache mapping的方法，一旦出现cache miss，cpu会怎么做了？

　　假设我们有一个64 Bytes大小直接映射缓存，cache line大小是8 Bytes，采用写分配和写回机制。当CPU从地址0x2a读取一个字节，cache中的数据将会如何变化呢？假设当前cache状态如下图所示(tag旁边valid一栏的数字1代表合法。0代表非法。后面Dirty的1代表dirty，0代表没有写过数据，即非dirty)；

　　根据index找到对应的cache line，对应的tag部分valid bit是合法的，但是tag的值不相等，因此发生cache miss。此时我们需要从地址0x28（8字节对齐）地址加载8字节数据到该cache line中（cache line是缓存最小的读写单元）；但是，我们发现当前cache line的dirty bit置位（表示），所以cache line里面的数据不能被简单的丢弃；由于采用写回机制，所以我们需要将cache中的数据0x11223344写到地址0x0128地址（tag:0x04 index:101 offset:010，连接起来就是100 101 010=0x12a，考虑到8字节对齐，就从0x128开始）；

　　当写回操作完成，再将主存中0x28地址开始的8个字节加载到该cache line中，并清除dirty bit。然后根据offset找到0x52返回给CPU；

　　5、 cache mapping测试

　　https://github.com/google/rowhammer-test/tree/master/cache_analysis 这里有现成的代码，可以直接用；

　　核心思路：分配虚拟空间->转成物理地址->每隔一页再生成物理地址->这两个地址在同一个cache set吗? -> 如果是就保留->从该保留的地址读10次数据，保留每次耗时->取中位数;

　　本人vmware虚拟机实验环境kali下查看cpu L3缓存（这里用index2表示）的ways_of_associate是4，关联度就是4；cache line是64byte，那么物理地址的0~5bit就是offset，6~7bit就是index；下面的代码中uintptr_t next_addr = buf + page_size，产生新物理地址时直接在上一个物理地址上加0x1000，低12bit是没变的，offset和index是一样的，所以新旧物理地址都在同一个cache set；

// Copyright 2015, Google, Inc.

//

// Licensed under the Apache License, Version 2.0 (the "License");

// you may not use this file except in compliance with the License.

// You may obtain a copy of the License at

//

//     http://www.apache.org/licenses/LICENSE-2.0

//

// Unless required by applicable law or agreed to in writing, software

// distributed under the License is distributed on an "AS IS" BASIS,

// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

// See the License for the specific language governing permissions and

// limitations under the License.

#include <assert.h>

#include <fcntl.h>

#include <stdint.h>

#include <stdio.h>

#include <sys/mman.h>

#include <time.h>

#include <unistd.h>

#include <algorithm>

// This program attempts to pick sets of memory locations that map to

// the same L3 cache set.  It tests whether they really do map to the

// same cache set by timing accesses to them and outputting a CSV file

// of times that can be graphed.  This program assumes a 2-core Sandy

// Bridge CPU.

// Dummy variable to attempt to prevent compiler and CPU from skipping

// memory accesses.

int g_dummy;

namespace {

const int page_size = 0x1000;

int g_pagemap_fd = -;

// Extract the physical page number from a Linux /proc/PID/pagemap entry.

uint64_t frame_number_from_pagemap(uint64_t value) {

  return value & ((1ULL << ) - );

}

void init_pagemap() {

  g_pagemap_fd = open("/proc/self/pagemap", O_RDONLY);

  assert(g_pagemap_fd >= );

}

/*虚拟地址转成物理地址*/

uint64_t get_physical_addr(uintptr_t virtual_addr) {

  uint64_t value;

  /*virtual_addr=16<<20；page_size=4096，sizeof(value)=8，offset=4096*8*/

  off_t offset = (virtual_addr / page_size) * sizeof(value);

  int got = pread(g_pagemap_fd, &value, sizeof(value), offset);

  assert(got == );

  // Check the "page present" flag.

  assert(value & (1ULL << ));

  uint64_t frame_num = frame_number_from_pagemap(value);

  return (frame_num * page_size) | (virtual_addr & (page_size - ));

}

// Execute a CPU memory barrier.  This is an attempt to prevent memory

// accesses from being reordered, in case reordering affects what gets

// evicted from the cache.  It's also an attempt to ensure we're

// measuring the time for a single memory access.

//

// However, this appears to be unnecessary on Sandy Bridge CPUs, since

// we get the same shape graph without this.

inline void mfence() {

  asm volatile("mfence");

}

// Measure the time taken to access the given address, in nanoseconds.

int time_access(uintptr_t ptr) {

  struct timespec ts0;

  int rc = clock_gettime(CLOCK_MONOTONIC, &ts0);

  assert(rc == );

  g_dummy += *(volatile int *) ptr;

  mfence();

  struct timespec ts;

  rc = clock_gettime(CLOCK_MONOTONIC, &ts);

  assert(rc == );

  return (ts.tv_sec - ts0.tv_sec) *

         + (ts.tv_nsec - ts0.tv_nsec);

}

// Given a physical memory address, this hashes the address and

// returns the number of the cache slice that the address maps to.

//

// This assumes a 2-core Sandy Bridge CPU.

//

// "bad_bit" lets us test whether this hash function is correct.  It

// inverts whether the given bit number is included in the set of

// address bits to hash.   不同cpu架构的hash算法不同，作者是基于sandy brige架构的，其他架构比如ivy、hashwell、coffe lake可能无法运行或逻辑错误；

int get_cache_slice(uint64_t phys_addr, int bad_bit) {

  // On a 4-core machine, the CPU's hash function produces a 2-bit

  // cache slice number, where the two bits are defined by "h1" and

  // "h2":

  //

  // h1 function:

  //   static const int bits[] = { 18, 19, 21, 23, 25, 27, 29, 30, 31 };

  // h2 function:

  //   static const int bits[] = { 17, 19, 20, 21, 22, 23, 24, 26, 28, 29, 31 };

  //

  // This hash function is described in the paper "Practical Timing

  // Side Channel Attacks Against Kernel Space ASLR".

  //

  // On a 2-core machine, the CPU's hash function produces a 1-bit

  // cache slice number which appears to be the XOR of h1 and h2.

  // XOR of h1 and h2: 这些位依次做检验，根据不同的0或1来决定存放不同的slice，以此达到负载均衡的目的

  static const int bits[] = { , , , , , , , , ,  };

  int count = sizeof(bits) / sizeof(bits[]);

  int hash = ;

  //分别测试bits各个元素指向的位是1还是0

  for (int i = ; i < count; i++) {

    hash ^= (phys_addr >> bits[i]) & ;//h1

  }

  if (bad_bit != -) {

    /*phys_addr中，bad_bit位是1吗?如果是，hash不变；如果不是，hash=1；

    比如phys_addr=0x1234，bad_bit=17，那么(phys_addr>>bad_bit)&1=0,hash=1;

    比如phys_addr=0x8234，bad_bit=15，那么(phys_addr>>bad_bit)&1=1,hash不变;

    */

    hash ^= (phys_addr >> bad_bit) & ;//h1 xor h2

  }

  return hash;//hash初始值是0，这里只能是0或1，因为这是2-core cpu，slice只能是0或1；

}

/*

1、低17位相同

2、hash相同

*/

bool in_same_cache_set(uint64_t phys1, uint64_t phys2, int bad_bit) {

  // For Sandy Bridge, the bottom 17 bits determine the cache set

  // within the cache slice (or the location within a cache line).

  uint64_t mask = ((uint64_t)  << ) - ;//1FFFF，只保留低17位，其余清零

  return ((phys1 & mask) == (phys2 & mask) && //两个物理地址低17位相同

          get_cache_slice(phys1, bad_bit) == get_cache_slice(phys2, bad_bit));

}

int timing(int addr_count, int bad_bit) {

  size_t size =  << ;

  uintptr_t buf =

    (uintptr_t) mmap(NULL, size, PROT_READ | PROT_WRITE,

                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -, );//分配内存

  assert(buf);

  uintptr_t addrs[addr_count];

  addrs[] = buf;

  uintptr_t phys1 = get_physical_addr(addrs[]);

  // Pick a set of addresses which we think belong to the same cache set；

  /*本人CPU是intel core-i7 8750, 用cpu-z查是coffee lake架构，L3=9M，12way,cahe line=64byte（0~5位是offset）;

  cache line总数=9M/64byte=147456个；cache set数量=cache line总数/way = 12288，需要17bit,所以物理地址的6~23bit是index，用来寻找cache set的

  物理地址增加0x1000，低12bit没变，原作者的offset和index没变（本人的cpu6~23bit是index，会导致set改变），映射到的set应该是一样的；

  但第13位依次加1，会导致物理地址的tag(从第10位开始)不同，由此映射到同一set下不同的slice

  */

  uintptr_t next_addr = buf + page_size;

  uintptr_t end_addr = buf + size;

  int found = ;

  while (found < addr_count) {

    assert(next_addr < end_addr);

    uintptr_t addr = next_addr;

    //从buf开始取第一个物理地址，每隔1页再取物理地址，看看这些物理地址在不在同一个cache set

    next_addr += page_size;

    uint64_t phys2 = get_physical_addr(addr);

    if (in_same_cache_set(phys1, phys2, bad_bit)) {

      addrs[found] = addr;

      found++;

    }

  }

  // Time memory accesses.

  int runs = ;

  int times[runs];

  for (int run = ; run < runs; run++) {

    // Ensure the first address is cached by accessing it.

    g_dummy += *(volatile int *) addrs[];

    mfence();

    // Now pull the other addresses through the cache too.

    for (int i = ; i < addr_count; i++) {

      g_dummy += *(volatile int *) addrs[i];

    }

    mfence();

    // See whether the first address got evicted from the cache by

    // timing accessing it. 如果时间很长，说明第一个地址已经被从cache set驱逐出去了；

    times[run] = time_access(addrs[]);

  }

  // Find the median time.  We use the median in order to discard

  // outliers.  We want to discard outlying slow results which are

  // likely to be the result of other activity on the machine.

  //

  // We also want to discard outliers where memory was accessed

  // unusually quickly.  These could be the result of the CPU's

  // eviction policy not using an exact LRU policy.

  std::sort(times, &times[runs]);

  int median_time = times[runs / ];

  int rc = munmap((void *) buf, size);

  assert(rc == );

  return median_time;

}

int timing_mean(int addr_count, int bad_bit) {

  int runs = ;

  int sum_time = ;

  for (int i = ; i < runs; i++)

    sum_time += timing(addr_count, bad_bit);

  return sum_time / runs;

}

} // namespace

int main() {

  init_pagemap();

  // Turn off stdout caching.

  setvbuf(stdout, NULL, _IONBF, );

  // For a 12-way cache, we want to pick 13 addresses belonging to the

  // same cache set.  Measure the effect of picking more addresses to

  // test whether in_same_cache_set() is correctly determining whether

  // addresses belong to the same cache set.

  //，这里用超过12个的物理地址做测试

  //会导致第一个物理地址的缓存被从cache set驱逐(eviction)，再次读该物理地址

  //时耗时明显增加

  int max_addr_count =  * ;

  bool test_bad_bits = true;

  printf("Address count");

  printf(",Baseline hash (no bits changed)");

  if (test_bad_bits) {

    for (int bad_bit = ; bad_bit < ; bad_bit++) {

      printf(",Change bit %i", bad_bit);

    }

  }

  printf("\n");

  for (int addr_count = ; addr_count < max_addr_count; addr_count++) {

    printf("%i", addr_count);

    printf(",%i", timing_mean(addr_count, -));

    if (test_bad_bits) {

      for (int bad_bit = ; bad_bit < ; bad_bit++) {

        printf(",%i", timing_mean(addr_count, bad_bit));

      }

    }

    printf("\n");

  }

  return ;

}

　　代码中：尝试的地址个数：int max_addr_count = 5 * 4 就可以在8附近（比如3~7）多取几个值对比看看结果；（原作则是12way的，用不同数量地址反复做测试，发现地址数量大于13后耗时明显增加很多，也就是cache missing激增）

参考：http://lackingrhoticity.blogspot.com/2015/04/l3-cache-mapping-on-sandy-bridge-cpus.html L3 cache mapping on Sandy Bridge CPUs

　　 https://zhuanlan.zhihu.com/p/102293437 Cache的基本原理

　　 Reverse Engineering IntelLast-Level Cache Complex AddressingUsing Performance Counters

　　　Mapping the Intel Last-Level Cache

最后整理了一个脑图，方便串联理解各个要点：

intel：spectre&Meltdown侧信道攻击（四）—— cache mapping的更多相关文章

intel：spectre&Meltdown侧信道攻击（一）
只要平时对安全领域感兴趣的读者肯定都听过spectre&Meltdown侧信道攻击,今天简单介绍一下这种攻击的原理( https://www.bilibili.com/video/av1814 ...
intel：spectre&Meltdown侧信道攻击（三）—— raw hammer
今天介绍raw hammer攻击的原理:这次有点“标题党”了.事实上,raw hammer是基于DRAM内存的攻击:所以理论上,只要是用了DRAM内存的设备,不论是什么cpu(intel.amd,或则 ...
intel：spectre&Meltdown侧信道攻击（二）
上面一篇介绍了spectre&meltdown基本原理和简单的demo方案,今天继续学习一下该漏洞发现团队原始的POC:https://spectreattack.com/spectre.pd ...
intel：spectre&Meltdown侧信道攻击（五）—— DRAM address mapping
前面介绍了row hammer,理论上很完美,实际操作的时候会面临很尴尬的问题:内存存储数据最小的单位是cell(就是个电容,充电是1,放电是0),无数个横着的cell组成row,无数个竖着的cell ...
第四十三个知识点：为AES描述一些基础的（可能无效）的对抗侧信道攻击的防御
第四十三个知识点:为AES描述一些基础的(可能无效)的对抗侧信道攻击的防御原文地址:http://bristolcrypto.blogspot.com/2015/07/52-things-numbe ...
第四十五个知识点：描述一些对抗RSA侧信道攻击的基础防御方法
第四十五个知识点:描述一些对抗RSA侧信道攻击的基础防御方法原文地址:http://bristolcrypto.blogspot.com/2015/08/52-things-number-45-de ...
侧信道攻击，从喊666到入门之——Unicorn的环境构建
作者:backahasten 发表于小米安全中心微信公众号 0x00 前言 Unicorn可以模拟多种指令集的代码,在很多安全研究领域有很强大的作用,但是由于需要从头自己布置栈空间,代码段等虚拟执行环 ...
嵌入式 -- WINKHUB 边信道攻击 (NAND Glitch)
0x00 前言随着物联网IOT的飞速发展,各类嵌入式设备, 路由器安全研究也越来越火. 但因为跟以往纯软件安全研究的要求不同, 这类研究往往需要结合相应的硬件知识. 很多朋友困惑如何开始, 甚至卡在 ...
ORW-测信道攻击
做SCTF时碰到一个没看过的题型,比赛结束之后才知道是orw的一个玩法,测信道攻击.主要特点就是只给使用open,read,但是不给write,即无法把flag输出到终端.这里可以通过把flag读到栈 ...

随机推荐

（三）ansible playbook
一,YAML语法 YAML的语法和其他高阶语言类似并且可以简单表达清单.散列表.标量等数据结构.(列表用横杆表示,键值对用冒号分割,键值对里又可以嵌套另外的键值对) YAML文件扩展名通常为.yaml ...
CSS的引入与选择器
CSS的引入与选择器 CSS与HTML的关系 Cascading Style Sheet 即层叠样式表在上一篇文中,已经介绍了一些非常常用的HTML标签,接下来将步入CSS的学习,如果将单纯HTML ...
实现 (5).add(3).minus(2) 功能
Number.prototype.add = function (number) { if (typeof number !== 'number') { throw new Error('请输入数字- ...
机器学习实战基础（二十二）：sklearn中的降维算法PCA和SVD（三） PCA与SVD 之重要参数n_components
重要参数n_components n_components是我们降维后需要的维度,即降维后需要保留的特征数量,降维流程中第二步里需要确认的k值,一般输入[0, min(X.shape)]范围中的整数. ...
机器学习实战---决策树CART回归树实现
机器学习实战---决策树CART简介及分类树实现一:对比分类树 CART回归树和CART分类树的建立算法大部分是类似的,所以这里我们只讨论CART回归树和CART分类树的建立算法不同的地方.首先,我 ...
ffplay源码编译
ffplay是ffmpeg源码中一个自带的开源播放器组件,支持本地视频文件的播放以及在线流媒体播放,很多商业播放器都是基于ffplay定制而来的.ffplay中的代码充分利用了ffmpeg中的函数库, ...
状态模式（c++实现）
状态模式目录状态模式模式定义模式动机 UML类图源码实现优点缺点模式定义状态模式(state),当一个对象的内在状态改变时允许改变其行为,这个对象看起来像是改变了其类. 模式动机状 ...
html 转义和反转义
public static void main(String[] args) {// String html = "<img style=\"width: 100%; hei ...
Go Pentester - HTTP CLIENTS(3)
Interacting with Metasploit Early-stage Preparation: Setting up your environment - start the Metaspl ...
P1776 宝物筛选
题目: 正文: 啊,多重背包真恶心... 一开始我是把多重背包改成了01背包,然鹅我当时是直接1个1个的往后摞的... 参见以下代码: for(int i=1;i<=n;++i){//平平无奇的 ...

intel：spectre&Meltdown侧信道攻击（四）—— cache mapping

intel：spectre&Meltdown侧信道攻击（四）—— cache mapping的更多相关文章

随机推荐

热门专题