最近(以及预感接下来的一年)会读很多很多的paper......不如开个帖子记录一下读paper心得

SysML相关的文章很多来源于上学期的8980课。有些和具体field(比如DB/architecture/...)相关的就放到各个field的分类里啦,这里放一些不好分类的

Class Blog

Mark一个大神的Blog


Virtual Address Translation via Learned Page Table Indexes

在页表中使用learned index。

笔记在这里:Link

Representation Learning for Resource Usage Prediction

用lstm跟踪进程的运行情况(cpu/mem/disk使用情况),并预测

Selecta: Heterogeneous Cloud Storage Configuration for Data Analytics

根据观测VPS的应用程序负载来推荐VPS配置(eg: SSD还是HDD)

REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems

The idea in this paper is similar to a paper[http://www-users.cselabs.umn.edu/classes/Spring-2019/csci8980/papers/tuning.pdf] we have discussed on class before. In the paper[Auto DBMS Tuning...], the author use ML model to automatically tune database knobs, while in this paper the author use ML model to optimize the combination of components in a complex software system to achieve better performance.

The whole framework could be divided into 3 parts:

  1. Dane: A light weight programming language for implementing small components. These components could be assembled together to build a large scale software system. Some of these components may have the same function but do jobs in different ways. The main contribution of Dane is that it is fast so that different component could be dynamically rewired with very low cost.
  2. A perception, assembly and learning framework (PAL): It contains 2 modules: 1). Assembly Module. It could assemble different components to implement a feature. 2). Perception Module. It could monitor the performance of the software system, as well as the runtime environment.
  3. The online learning part. It uses reinforcement learning model “multi-armed bandit” to optimize the selection of components. Also, like the [Auto DBMS Tuning...] paper, some selection of components are related so they could share information. Thus, the author uses regression model to reduce the search space.

Application of ML method:

It used a continuous way when updating configuration. First, the software will run with a random combination of components. After several seconds, the PAL will collect perception data, and the online learning part will update estimates and update the model. Then the software will run under the new configuration for several seconds, and update the model again. In this way we could better shape the software system to the environment.

Positive Points:

  1. The idea is interesting. It used an automatic ML model to replace manually written rules in previous work. This idea could also apply to other computer system problems where configurations and performance are linked.
  2. The usage of regression factor model reduces the search space, which achieved balancing exploration and exploitation.
  3. Compared to [Auto DBMS Tuning...] paper, this paper considered how to adapt to the changes in deployment environment.

Negative Points:

  1. The performance of the model deeply relies on the components implemented by Dane. Firstly, it means that this model could not be applied to present software systems that are written in more common languages, like C++ and java. Secondly, this requires that the whole software could be disassembled into multiple irrelevant, small, dynamic components. In some situations it may not be guaranteed.
  2. For a specific tasks(like web server), the author could have built a configuration dataset, so when first running the model, the application could be have a less optimal, but not bad combination of components. This could be better than choosing randomly at first.

Related Links:

https://blog.acolyer.org/2016/12/05/rex-a-development-platform-and-online-learning-approach-for-runtime-emergent-software-systems/

https://www.usenix.org/conference/osdi16/technical-sessions/presentation/porter

StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware

Introduction

This paper focuses on detecting malware android applications based on machine learning method. They defined 4 types of features to be extracted from applications:

  • 1. well-received features

    • 1.1 Permission: Some malicious applications need some specific types of permissions.
    • 1.2 Sensitive API Calls: They extracted sensitive API calls from Snail files, and found the top types of sensitive API calls which could best distinguish malicious and benign applications.
  • 2. newly-defined features
    • 2.1 Sequence: Malicious apps tend to have drastically different sensitive API calls. They defined 3 metrics to quantify the number of sensitive API calls.
    • 2.2 Dynamic Behavior: Monitor the activities triggered by each application from their log file.

In both of these features, they removed the common ones which are shared by malicious and benign apps, and left the most distinguishable features.

Application of ML methods

They compared several ML methods: SVM, decision tree, Multi-Layer Perceptron, Naïve Bayes, KNN, Bagging predictor. They also compared the performance of different feature selection methods (Only well-received features / well-received and newly-defined features). Finally, they found that KNN classifier + well-received and newly-defined features achieved the best performance.

Pros

  1. They contributed to define new features for monitoring dynamic behavior of malicious applications, which achieved better performance than using static analysis only.

Cons

  1. When analyzing features, they used the same APP dataset which is also used in testing ML classifier. So they removed the common features and selected most distinguishable features based on their analyzing results, before testing their ML classifier. I think there will be a kind of “over-fitting” in this process. They should use different APP dataset for analyzing and testing.
  2. Decompiling and analyzing the application on Android device may use too many system resources, since the performance of mobile CPU and memory are comparatively lower. Actually I think this method is more suitable for detecting malicious applications on PC.

...

Paper Reading_ML for system的更多相关文章

  1. ### Paper about Event Detection

    Paper about Event Detection. #@author: gr #@date: 2014-03-15 #@email: forgerui@gmail.com 看一些相关的论文. 1 ...

  2. Toward Scalable Systems for Big Data Analytics: A Technology Tutorial (I - III)

    ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g ...

  3. [转载]bigtable 中文版

    转载厦门大学林子雨老师的译文 原文: http://dblab.xmu.edu.cn/post/google-bigtable/ Google Bigtable (中文版) 林子雨2012-05-08 ...

  4. Typical sentences in SCI papers

       Beginning  1. In this paper, we focus on the need for   2. This paper proceeds as follow.   3. Th ...

  5. SSD: Single Shot MultiBox Detector

    By Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexande ...

  6. google 论文

    从google历年所有论文的汇总来看,TOP5的分别是人工智能和机器学习.算法理论.人机交互与视觉.自然语言处理.机器感知,大家从一个侧面看出goolge research的重点了吧. Google所 ...

  7. CodeForces 69D Dot (游戏+记忆)

    Description Anton and Dasha like to play different games during breaks on checkered paper. By the 11 ...

  8. Google Bigtable (中文版)

    http://dblab.xmu.edu.cn/post/google-bigtable/ Abstract BigTable是一个分布式存储系统,它可以支持扩展到很大尺寸的数据:PB级别的数据,包含 ...

  9. [转] XEN, KVM, Libvirt and IPTables

    http://cooker.techsnail.com/index.php/XEN,_KVM,_Libvirt_and_IPTables XEN, KVM, Libvirt and IPTables ...

随机推荐

  1. vue学习-day03(动画,组件)

    目录: 1.品牌列表-从数据库获取列表    2.品牌列表-完成添加功能    3.品牌列表-完成删除功能    4.品牌列表-全局配置数据接口的根域名    5.品牌列表-全局配置emulateJS ...

  2. jquery gt选择器 语法

    jquery gt选择器 语法 作用::gt 选择器选取 index 值高于指定数的元素.index 值从 0 开始.经常与其他元素/选择器一起使用,来选择指定的组中特定序号之后的元素(如上面的例子) ...

  3. html b标签 语法

    html b标签 语法 标签b是什么意思? b的意思是bold,b标签主要用于html中规定粗体文本,该标签内的字符将被设为粗体.B标签所传达的意思只是加粗,没有任何其它的作用. 作用:规定粗体文本. ...

  4. Cisco基础(三):HSRP配置、三层交换配置HSRP、STP的配置、三层交换配置STP

    一.HSRP配置 目标: 在企业网络到外部的连接方案中,要求不高的条件下可以是单出口.一旦该出口线路出现问题,整个企业网络就不能连接到外网了.为了使得企业网络到外网连接的高可用性,可以设置两个以上的出 ...

  5. wordcloud:让你的词语像云朵一样美

    介绍   对文本中出现频率较高的关键词给予视觉化的显示 使用 python import jieba import codecs import wordcloud file = r"C:\U ...

  6. Pollard's Rho算法简单总结

    先贴一份代码在这. 最近几天实在是太忙了没时间更新了. 代码 #include <iostream> #include <cstdio> #include <cstdli ...

  7. 快速沃尔变换 FWT

    P4717 [模板]快速沃尔什变换 #include<bits/stdc++.h> using namespace std; #define int long long #define s ...

  8. Linux shell - 除法保留小数点

    我想实现 举例:1/3=0.33得到0.33, 尝试过bc 只能得到.33,没有0了, linux 下的shell脚本,1和3是变量$a和$b,并能指定小数点后的位数, 方法1: $> res= ...

  9. ubuntu16虚拟机迁移/移动/复制后无法上网

    修改grub配置 如果没有网卡,需要配置 sudo vi /etc/default/grub 将 GRUB_CMDLINE_LINUX="" 修改为 GRUB_CMDLINE_LI ...

  10. 网络协议之FTP协议

    FTP(File Transfer Protocol) 协议文档:RFC 959 1.1 FTP协议介绍 FTP协议基于TCP/IP协议实现,处于应用层. FTP协议为C/S架构,每一次FTP连接,命 ...