网络流量预测 国内外研究现状【见评论】——传统的ARIMA、HMM模型,目前LSTM、GRU、CNN应用较多,貌似小波平滑预处理步骤非常关键
Time Series Anomaly Detection in Network Traffic: A Use Case for Deep Neural Networks
Introduction
As the waves of the big data revolution cascade across industries, more and more forms of sensor data become valuable inputs to predictive analytics. This sensor data has an intrinsic temporal component to it – and this temporality lets us use a family of techniques for predictive analytics called Time Series Models [1]. In this blog post we explore the underlying nature of time series modeling in the context of enterprise IT analytics particularly for cyber security use-cases.
Time series can exist in many different industries and problem spaces, but at its essence it is simply a data set that has values indexed by time. In research literature we usually refer to a univariate time series as a data set that has timestamps and single values associated to each timestamp. Examples of univariate time series include the number of packets sent over time by a single host in a network, or the amount of voltage used by a smart meter for a single home over the year. Multivariate time series are an extension of the original concept to the case where each time stamp has a vector or array of values associated with it. Examples of multivariate time series are the (P/E, price, volume) for each time tick of a single stock or the tuple of information for each netflow between a single session (e.g. source and destination ip and port, packets and bytes sent and received, etc.)
Time Series Models For Network Security
Time series data is particularly prevalent in any modeling scenario dependent on input from a modern IT infrastructure. Almost every single component of the hardware and software used in enterprise networks have some sub-system that generates time series data. For cybersecurity models univariate/multivariate time series form one of the cornerstone data structures, particular for studying evolving patterns of behavior.
There are multitudes of different use cases relevant for modeling problems in cybersecurity. To illustrate some of the common phenomena associated with this class of problems we enumerate a couple of the most common scenarios below.
Use Case 1: Detecting DDOS Attacks
With the growing prevalence of pay for play attack infrastructure, Distributed Denial of Service (DDOS) attack volume has hit all time records, including the latest attack on Krebs last year using the Merai botnet [1,2]. Denial of service attacks come in a couple of different varieties inducing ‘Layer-4’ attacks and ‘Layer-7’ attacks, referencing the OSI 7-layer network model. Typically the detection of the application layer attacks (Layer-7) is more difficult than the lower layer attacks because it involves exploiting some property of an API. For either case though, we can use the data related to overall flow, size/volume, and app layer traffic stats generated by our routers and perimeter infrastructure over time, to build time series models for layer 4 and layer 7 inbound traffic patterns. A standard time series model is then overlayed on this data to detect change points in the normal traffic baseline of the key choke points and DMZ assets exposed to inbound network traffic. The goal of this model is to identify spikes in traffic patterns that are extreme deviations from the observed baseline like in the figure below.
Use Case 2: Detecting Failed Login Spikes
Another common attack pattern usually following a large leak of user names or PII data onto the darkweb is called Account Takeover (ATO). For instance, after the leak of a large number of user names for a financial institution, attacks can follow by targeting the login infrastructure for the banking applications. Typically attackers will script an automated test of usernames /passwords against the list of stolen data; there will be a pattern of logins on the application that is rapidly changing the number of attempted logins per username. There is potential for major financial gain to be had, even in the case of a single successful login, so attackers are incentivized to target weak infrastructure in combination with the darkwebs economy of stolen PII. This type of attack manifests as a time series problem, particularly in the application logs of the web service being targeted. A changepoint, in total number of failed logins related to a particular external subnet or other group information, is a one primary indicator an ATO attack is taking place. Typical patterns we look for in this case can be seen as intermittent spikes of activity spread out over time (see below figure).
Use Case 3: Data Exfiltration
Finally the last common use case that is most common with regards to time series models is exfiltration of data. There are many sub-problems and behaviors to take under consideration here depending on the particular security scenario. For instance, an enterprise may be dealing with a disgruntled insider who is actively dumping data from repos onto a physical usb disk or sending it to attachments through google drive. Different paths of exfiltration require careful analysis of the protocol and methods involved. One rich area that is nice to model, using multivariate time series, is time series behaviors involving DNS data*. In the example below we see that if we build the appropriate multivariate vector on each individual endpoint, DNS requests we can predict multiple attack patterns with a single model. *See the JASK blog post here for more details on some of insights into searching for key patterns related to DNS exfiltration [10].
Time Series Prediction Using Neural Nets
Neural networks have a long and interesting history as pattern recognition engines used in machine learning [4]. Over the last decade the advent of next generation hardware for specific learning tasks (e.g. tensor processing units) along with breakthroughs in neural-net training has led us to the era of Deep Learning [6,7]. State-of-the-art libraries like TensorFlow and PyTorch provide high level abstractions for making some of most important techniques from Deep Learning available to solve business problems.
One of the most important aspects of leveraging time series output in security operations is building detections tuned to highest priority outcomes. With most of the toolsets and solutions designed for security operations center (SOC) workflows, the operator has to specify a manual threshold in order to detect time series outliers. Neural networks provide a nice solution, from an engineering standpoint, for cybersecurity models with temporal data because they provide a more dynamic learning aspect that helps drive data-driven detections past static thresholds.
In 1997 Hochreiter and Schmidhuber wrote their original paper that introduced the concept of long-short term memory (LSTM) cell in neural net architectures [5]. Since then LSTMs have become one of the most flexible and best-in-breed solutions for a variety of classification problems in deep learning.
Traditional statistical/mathematical approaches for analyzing time series are run over a specified time window frame. The length of this window needs to be pre-determined and the results of these approaches are heavily influenced by the length of this window. Traditional machine learning algorithms require extensive feature engineering to train the classifier on. However, with any change in the input data, the dynamics of the features change as well, forcing a re-design of feature vectors to maintain performance. During the feature extraction phase, if the features are not appropriately chosen, then there are high chances of losing important information from the time series. LSTM, on the other hand, showcases the ability to learn long-term sequential patterns without the need for feature engineering: part of the magic here is the concept of three memory gates specific to this particular implementation of deep learning. Recurrent Neural Networks suffer from the problem of vanishing gradient descent, which prevents the model from converging properly due to insufficient error correction, and which is overcome by LSTM. On account of these advantages, we turn to LSTM for modeling our time series.
TensorFlow LSTM Model Layer-By-Layer
Using TensorFlow [13] we can build a template for processing with arbitrary types of time series data. For a good introductory overview into TensorFlow and LSTM check out some of the great books and blogs that have been published recently on the topic [9,11,12].
In our prototype example we build a simple architecture description of a neural network specifying the number of layers and some of related properties. We define our LSTM model to contain a visible layer with 3 neurons, followed by a hidden “dense” (densely connected) layer with two-dimensional output and finally an activation layer. The mean squared error regression problem is the objective that the model tries to optimize. The final output is a single prediction.

The input to the LSTM is higher-dimensional than traditional machine learning modeling inputs. A diagrammatic representation of our data is as shown:
Algorithmic Scalability Notes
For univariate time series data LSTM training scales linearly for single time series (O(N) scaling with N number of time steps). The training time using LSTM networks is one of the drawbacks but because time series models are often embarrassingly parallel these problems are suitable to running on large GPU/TPU clusters.
To test if our model overfit we plotted a training size versus the RMSE plot and saw that the error reduced with the increase in the training data (RMSE is a quick metric that is easy to use but proper overit analysis requires a more detailed testing paradigm). This is the expected trend since the model should be able to predict better with the increase in the training data. The tests below are run on synthetic time series data and are on regular CPU cores.
Conclusion
Part of the appeal of neural network methods for time series problems is they let us move past traditional threshold-based detections as well as automate some key use cases. There is a lot depth to this topic and related engineering design. We have found Python and TensorFlow are great tools for prototyping ideas for building operationalized solutions with low initial complexity. In the realm of cybersecurity we can move a lot of the generic queries that end up being driven by fixed thresholds to a more dynamic learning paradigm driven by deep learning models. The benefit we see for choosing LSTM in these cases is that we can get better data driven detections while moving away from simple rule based time series alerts.
References
- Jan G. De Gooijer, Rob J. Hyndman, 25 years of time series forecasting, In International Journal of Forecasting, Volume 22, Issue 3, 2006, Pages 443-473, ISSN 0169-2070, https://doi.org/10.1016/j.ijforecast.2006.01.001
- https://www.theguardian.com/technology/2016/oct/26/ddos-attack-dyn-mirai-botnet
- https://www.abusix.com/blog/5-biggest-ddos-attacks-of-the-past-decade
- Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York, NY, USA.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780. DOI=http://dx.doi.org/10.1162/neco.1997.9.8.1735
- Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (July 2006), 1527-1554. DOI=http://dx.doi.org/10.1162/neco.2006.18.7.1527
- http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/
- Greff K, Srivastava R, Koutnik J, Steunebrink B, Schmidhuber J, LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems (2016) Published by Institute of Electrical and Electronics Engineers Inc.
- Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems By Aurélien Géron
- http://www.jask.com/cyber-security/threat-hunting-part-3-going-hunting-with-machine-learning/
- http://papers.nips.cc/paper/822-bounds-on-the-complexity-of-recurrent-neural-network-implementations-of-finite-state-machines.pdf
- https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
- https://www.tensorflow.org/
网络流量预测 国内外研究现状【见评论】——传统的ARIMA、HMM模型,目前LSTM、GRU、CNN应用较多,貌似小波平滑预处理步骤非常关键的更多相关文章
- 网络流量预测入门(一)之RNN 介绍
目录 网络流量预测入门(一)之RNN 介绍 RNN简介 RNN 结构 RNN原理 结构原理 损失函数$E$ 反向传播 总结 参考 网络流量预测入门(一)之RNN 介绍 了解RNN之前,神经网络的知识是 ...
- 网络流量预测入门(三)之LSTM预测网络流量
目录 网络流量预测入门(三)之LSTM预测网络流量 数据集介绍 预测流程 数据集准备 SVR预测 LSTM 预测 优化点 网络流量预测入门(三)之LSTM预测网络流量 在上篇博客LSTM机器学习生成音 ...
- 网络流量预测入门(二)之LSTM介绍
目录 网络流量预测入门(二)之LSTM介绍 LSTM简介 Simple RNN的弊端 LSTM的结构 细胞状态(Cell State) 门(Gate) 遗忘门(Forget Gate) 输入门(Inp ...
- ARIMA模型实例讲解——网络流量预测可以使用啊
ARIMA模型实例讲解:时间序列预测需要多少历史数据? from:https://www.leiphone.com/news/201704/6zgOPEjmlvMpfvaB.html 雷锋网按:本 ...
- Kaggle比赛冠军经验分享:如何用 RNN 预测维基百科网络流量
Kaggle比赛冠军经验分享:如何用 RNN 预测维基百科网络流量 from:https://www.leiphone.com/news/201712/zbX22Ye5wD6CiwCJ.html 导语 ...
- 利用神经网络进行网络流量识别——特征提取的方法是(1)直接原始报文提取前24字节,24个报文组成596像素图像CNN识别;或者直接去掉header后payload的前1024字节(2)传输报文的大小分布特征;也有加入时序结合LSTM后的CNN综合模型
国外的文献汇总: <Network Traffic Classification via Neural Networks>使用的是全连接网络,传统机器学习特征工程的技术.top10特征如下 ...
- 见过NTP服务,没见过网络流量到200M左右的NTP服务
XXX,看来可能是NTP.CONF的文件配置错误所致了. 附上一段查看网络流量的SHELL.(好像只针对ETH0,如果要看其它的,还需要修改) #!/bin/bash typeset in_old d ...
- 网络流量分析——NPMD关注IT运维、识别宕机和运行不佳进行性能优化。智能化分析是关键-主动发现业务运行异常。科来做APT相关的安全分析
科来 做流量分析,同时也做了一些安全分析(偏APT)——参考其官网:http://www.colasoft.com.cn/cases-and-application/network-security- ...
- VR的国内研究现状及发展趋势
转载请声明转载地址:http://www.cnblogs.com/Rodolfo/,违者必究. 一.国内研究现状 我国虚拟现实技术研究起步较晚,与发达国家还有一定的差距. 随着计算机图形学.计算机系统 ...
随机推荐
- CF593C Beautiful Function 构造
正解:构造 解题报告: 传送门! 我知道我咕了好几篇博客似乎,,,但我不听!我就是要发新博客QAQ!(理不直气也壮 这题,想明白了还是比较简单的QwQ实现起来似乎也没有很复杂QAQ 首先思考一下,显然 ...
- Mybatis三剑客之mybatis-plugin
搜索mybatis plugin并安装. 如果没有的话,就按照如下: 1. 简介 mybatis plugin作为一款优秀的mybatis跳转插件,比起free mybatis plugin插 ...
- 数据展现-百度js绘图
echarts:酷炫的绘图效果 http://echarts.baidu.com/examples/#chart-type-calendar
- ubuntu update-alternatives
update-alternatives是ubuntu系统中专门维护系统命令链接符的工具,通过它可以很方便的设置系统默认使用哪个命令.哪个软件版本,比如,我们在系统中同时安装了open jdk和sun ...
- Django组件补充(缓存,信号,序列化)
Django组件补充(缓存,信号,序列化) Django的缓存机制 1.1 缓存介绍 1.缓存的简介 在动态网站中,用户所有的请求,服务器都会去数据库中进行相应的增,删,查,改,渲染模板,执行业务逻辑 ...
- D题:数学题(贪心+二分)
原题大意:原题链接 题解链接 给定两个集合元素,求出两集合间任意两元素相除后得到的新集合中的第k大值 #include<cstdio> #include<algorithm> ...
- CCPC-Wannafly Winter Camp Day4 (Div2, onsite)
Replay Dup4: 两轮怎么退火啊? 简单树形dp都不会了,送了那么多罚时 简单题都不想清楚就乱写了,喵喵喵? X: 欧拉怎么回路啊, 不会啊. 还是有没有手误?未思考清楚或者未检查就提交, 导 ...
- ng-深度学习-课程笔记-3: Python和向量化(Week2)
1 向量化( Vectorization ) 在逻辑回归中,以计算z为例,$ z = w^{T}+b $,你可以用for循环来实现. 但是在python中z可以调用numpy的方法,直接一句$z = ...
- java集合框架体系
Collection接口: 1.单列集合类的根接口. 2.定义了可用于操作List.Set的方法——增删改查: 3.继承自Iterable<E>接口,该接口中提供了iterator() 方 ...
- springcloud18---springCloudConfig
package com.itmuch.cloud; import org.springframework.beans.factory.annotation.Value; import org.spri ...



