Time Series Anomaly Detection in Network Traffic: A Use Case for Deep Neural Networks

from：https://jask.com/time-series-anomaly-detection-in-network-traffic-a-use-case-for-deep-neural-networks/

Introduction

As the waves of the big data revolution cascade across industries, more and more forms of sensor data become valuable inputs to predictive analytics. This sensor data has an intrinsic temporal component to it – and this temporality lets us use a family of techniques for predictive analytics called Time Series Models [1]. In this blog post we explore the underlying nature of time series modeling in the context of enterprise IT analytics particularly for cyber security use-cases.

Time series can exist in many different industries and problem spaces, but at its essence it is simply a data set that has values indexed by time. In research literature we usually refer to a univariate time series as a data set that has timestamps and single values associated to each timestamp. Examples of univariate time series include the number of packets sent over time by a single host in a network, or the amount of voltage used by a smart meter for a single home over the year. Multivariate time series are an extension of the original concept to the case where each time stamp has a vector or array of values associated with it. Examples of multivariate time series are the (P/E, price, volume) for each time tick of a single stock or the tuple of information for each netflow between a single session (e.g. source and destination ip and port, packets and bytes sent and received, etc.)

Time Series Models For Network Security

Time series data is particularly prevalent in any modeling scenario dependent on input from a modern IT infrastructure. Almost every single component of the hardware and software used in enterprise networks have some sub-system that generates time series data. For cybersecurity models univariate/multivariate time series form one of the cornerstone data structures, particular for studying evolving patterns of behavior.

There are multitudes of different use cases relevant for modeling problems in cybersecurity. To illustrate some of the common phenomena associated with this class of problems we enumerate a couple of the most common scenarios below.

Use Case 1: Detecting DDOS Attacks

With the growing prevalence of pay for play attack infrastructure, Distributed Denial of Service (DDOS) attack volume has hit all time records, including the latest attack on Krebs last year using the Merai botnet [1,2]. Denial of service attacks come in a couple of different varieties inducing ‘Layer-4’ attacks and ‘Layer-7’ attacks, referencing the OSI 7-layer network model. Typically the detection of the application layer attacks (Layer-7) is more difficult than the lower layer attacks because it involves exploiting some property of an API. For either case though, we can use the data related to overall flow, size/volume, and app layer traffic stats generated by our routers and perimeter infrastructure over time, to build time series models for layer 4 and layer 7 inbound traffic patterns. A standard time series model is then overlayed on this data to detect change points in the normal traffic baseline of the key choke points and DMZ assets exposed to inbound network traffic. The goal of this model is to identify spikes in traffic patterns that are extreme deviations from the observed baseline like in the figure below.

Use Case 2: Detecting Failed Login Spikes

Another common attack pattern usually following a large leak of user names or PII data onto the darkweb is called Account Takeover (ATO). For instance, after the leak of a large number of user names for a financial institution, attacks can follow by targeting the login infrastructure for the banking applications. Typically attackers will script an automated test of usernames /passwords against the list of stolen data; there will be a pattern of logins on the application that is rapidly changing the number of attempted logins per username. There is potential for major financial gain to be had, even in the case of a single successful login, so attackers are incentivized to target weak infrastructure in combination with the darkwebs economy of stolen PII. This type of attack manifests as a time series problem, particularly in the application logs of the web service being targeted. A changepoint, in total number of failed logins related to a particular external subnet or other group information, is a one primary indicator an ATO attack is taking place. Typical patterns we look for in this case can be seen as intermittent spikes of activity spread out over time (see below figure).

Use Case 3: Data Exfiltration

Finally the last common use case that is most common with regards to time series models is exfiltration of data. There are many sub-problems and behaviors to take under consideration here depending on the particular security scenario. For instance, an enterprise may be dealing with a disgruntled insider who is actively dumping data from repos onto a physical usb disk or sending it to attachments through google drive. Different paths of exfiltration require careful analysis of the protocol and methods involved. One rich area that is nice to model, using multivariate time series, is time series behaviors involving DNS data*. In the example below we see that if we build the appropriate multivariate vector on each individual endpoint, DNS requests we can predict multiple attack patterns with a single model. *See the JASK blog post here for more details on some of insights into searching for key patterns related to DNS exfiltration [10].

Time Series Prediction Using Neural Nets

Neural networks have a long and interesting history as pattern recognition engines used in machine learning [4]. Over the last decade the advent of next generation hardware for specific learning tasks (e.g. tensor processing units) along with breakthroughs in neural-net training has led us to the era of Deep Learning [6,7]. State-of-the-art libraries like TensorFlow and PyTorch provide high level abstractions for making some of most important techniques from Deep Learning available to solve business problems.

One of the most important aspects of leveraging time series output in security operations is building detections tuned to highest priority outcomes. With most of the toolsets and solutions designed for security operations center (SOC) workflows, the operator has to specify a manual threshold in order to detect time series outliers. Neural networks provide a nice solution, from an engineering standpoint, for cybersecurity models with temporal data because they provide a more dynamic learning aspect that helps drive data-driven detections past static thresholds.

In 1997 Hochreiter and Schmidhuber wrote their original paper that introduced the concept of long-short term memory (LSTM) cell in neural net architectures [5]. Since then LSTMs have become one of the most flexible and best-in-breed solutions for a variety of classification problems in deep learning.

Traditional statistical/mathematical approaches for analyzing time series are run over a specified time window frame. The length of this window needs to be pre-determined and the results of these approaches are heavily influenced by the length of this window. Traditional machine learning algorithms require extensive feature engineering to train the classifier on. However, with any change in the input data, the dynamics of the features change as well, forcing a re-design of feature vectors to maintain performance. During the feature extraction phase, if the features are not appropriately chosen, then there are high chances of losing important information from the time series. LSTM, on the other hand, showcases the ability to learn long-term sequential patterns without the need for feature engineering: part of the magic here is the concept of three memory gates specific to this particular implementation of deep learning. Recurrent Neural Networks suffer from the problem of vanishing gradient descent, which prevents the model from converging properly due to insufficient error correction, and which is overcome by LSTM. On account of these advantages, we turn to LSTM for modeling our time series.

TensorFlow LSTM Model Layer-By-Layer

Using TensorFlow [13] we can build a template for processing with arbitrary types of time series data. For a good introductory overview into TensorFlow and LSTM check out some of the great books and blogs that have been published recently on the topic [9,11,12].

In our prototype example we build a simple architecture description of a neural network specifying the number of layers and some of related properties. We define our LSTM model to contain a visible layer with 3 neurons, followed by a hidden “dense” (densely connected) layer with two-dimensional output and finally an activation layer. The mean squared error regression problem is the objective that the model tries to optimize. The final output is a single prediction.

The input to the LSTM is higher-dimensional than traditional machine learning modeling inputs. A diagrammatic representation of our data is as shown:

Algorithmic Scalability Notes

For univariate time series data LSTM training scales linearly for single time series (O(N) scaling with N number of time steps). The training time using LSTM networks is one of the drawbacks but because time series models are often embarrassingly parallel these problems are suitable to running on large GPU/TPU clusters.

To test if our model overfit we plotted a training size versus the RMSE plot and saw that the error reduced with the increase in the training data (RMSE is a quick metric that is easy to use but proper overit analysis requires a more detailed testing paradigm). This is the expected trend since the model should be able to predict better with the increase in the training data. The tests below are run on synthetic time series data and are on regular CPU cores.

Conclusion

Part of the appeal of neural network methods for time series problems is they let us move past traditional threshold-based detections as well as automate some key use cases. There is a lot depth to this topic and related engineering design. We have found Python and TensorFlow are great tools for prototyping ideas for building operationalized solutions with low initial complexity. In the realm of cybersecurity we can move a lot of the generic queries that end up being driven by fixed thresholds to a more dynamic learning paradigm driven by deep learning models. The benefit we see for choosing LSTM in these cases is that we can get better data driven detections while moving away from simple rule based time series alerts.

References

Jan G. De Gooijer, Rob J. Hyndman, 25 years of time series forecasting, In International Journal of Forecasting, Volume 22, Issue 3, 2006, Pages 443-473, ISSN 0169-2070, https://doi.org/10.1016/j.ijforecast.2006.01.001
https://www.theguardian.com/technology/2016/oct/26/ddos-attack-dyn-mirai-botnet
https://www.abusix.com/blog/5-biggest-ddos-attacks-of-the-past-decade
Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York, NY, USA.
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780. DOI=http://dx.doi.org/10.1162/neco.1997.9.8.1735
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (July 2006), 1527-1554. DOI=http://dx.doi.org/10.1162/neco.2006.18.7.1527
http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/
Greff K, Srivastava R, Koutnik J, Steunebrink B, Schmidhuber J, LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems (2016) Published by Institute of Electrical and Electronics Engineers Inc.
Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems By Aurélien Géron
http://www.jask.com/cyber-security/threat-hunting-part-3-going-hunting-with-machine-learning/
http://papers.nips.cc/paper/822-bounds-on-the-complexity-of-recurrent-neural-network-implementations-of-finite-state-machines.pdf
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
https://www.tensorflow.org/

网络流量预测国内外研究现状【见评论】——传统的ARIMA、HMM模型，目前LSTM、GRU、CNN应用较多，貌似小波平滑预处理步骤非常关键的更多相关文章

网络流量预测入门（一）之RNN 介绍
目录网络流量预测入门(一)之RNN 介绍 RNN简介 RNN 结构 RNN原理结构原理损失函数$E$ 反向传播总结参考网络流量预测入门(一)之RNN 介绍了解RNN之前,神经网络的知识是 ...
网络流量预测入门（三）之LSTM预测网络流量
目录网络流量预测入门(三)之LSTM预测网络流量数据集介绍预测流程数据集准备 SVR预测 LSTM 预测优化点网络流量预测入门(三)之LSTM预测网络流量在上篇博客LSTM机器学习生成音 ...
网络流量预测入门（二）之LSTM介绍
目录网络流量预测入门(二)之LSTM介绍 LSTM简介 Simple RNN的弊端 LSTM的结构细胞状态(Cell State) 门(Gate) 遗忘门(Forget Gate) 输入门(Inp ...
ARIMA模型实例讲解——网络流量预测可以使用啊
ARIMA模型实例讲解:时间序列预测需要多少历史数据? from:https://www.leiphone.com/news/201704/6zgOPEjmlvMpfvaB.html 雷锋网按:本 ...
Kaggle比赛冠军经验分享：如何用 RNN 预测维基百科网络流量
Kaggle比赛冠军经验分享:如何用 RNN 预测维基百科网络流量 from:https://www.leiphone.com/news/201712/zbX22Ye5wD6CiwCJ.html 导语 ...
利用神经网络进行网络流量识别——特征提取的方法是（1）直接原始报文提取前24字节，24个报文组成596像素图像CNN识别；或者直接去掉header后payload的前1024字节（2）传输报文的大小分布特征；也有加入时序结合LSTM后的CNN综合模型
国外的文献汇总: <Network Traffic Classification via Neural Networks>使用的是全连接网络,传统机器学习特征工程的技术.top10特征如下 ...
见过NTP服务，没见过网络流量到200M左右的NTP服务
XXX,看来可能是NTP.CONF的文件配置错误所致了. 附上一段查看网络流量的SHELL.(好像只针对ETH0,如果要看其它的,还需要修改) #!/bin/bash typeset in_old d ...
网络流量分析——NPMD关注IT运维、识别宕机和运行不佳进行性能优化。智能化分析是关键-主动发现业务运行异常。科来做APT相关的安全分析
科来做流量分析,同时也做了一些安全分析(偏APT)——参考其官网:http://www.colasoft.com.cn/cases-and-application/network-security- ...
VR的国内研究现状及发展趋势
转载请声明转载地址:http://www.cnblogs.com/Rodolfo/,违者必究. 一.国内研究现状我国虚拟现实技术研究起步较晚,与发达国家还有一定的差距. 随着计算机图形学.计算机系统 ...

随机推荐

Mirror--自增键在镜像中的影响
测试环境: OS: Windows Server 2008 R2 Enterprise SQL: SQL Server 2012 Enterprise 测试场景: 有SERVER A上数据库 DB10 ...
（转）Terraform，自动化配置与编排必备利器
本文来自作者 QingCloud实践课堂在 GitChat 上分享「Terraform,自动化配置与编排必备利器」 Terraform - Infrastructure as Code 什么是 T ...
如何用 testNG 生成测试报告
原文地址https://testerhome.com/topics/3473 总结一下testNG生成报告的三种方式,基本都是我直接转载的,没有补充就不说了,有补充的我会加以说明的(这里直说生成报告, ...
redhat 5的中文包安装
中文包文件名.在iso文件的/server/文件夹下fonts-chinese-3.02-9.6.el5.noarch.rpmfonts-ISO8859-2-75dpi-1.0-17.1.noarch ...
centos7 lua安装
yum -y install gcc automake autoconf libtool makeyum install readline-develcurl -R -O http://www.lua ...
A题:Common Substrings(KMP应用)
原题链接注意:2号和3号get_next()函数中next[i]赋值时的区别,一个是0,一个是1,且不能互换 #include<cstdio> #include<cstring&g ...
2017-2018 ACM-ICPC Southeastern European Regional Programming Contest (SEERC 2017) Solution
A:Concerts 题意:给出一个串T, 一个串S,求串S中有多少个串T,可以重复,但是两个字符间的距离要满足给出的数据要求思路:先顺序统计第一个T中的字符在S中有多少个,然后对于第二位的以及后面 ...
Win7系统（台式机）设置系统的窗口背景色（豆沙绿色）
Win7系统(台式机)设置系统的窗口背景色(豆沙绿色) 1,桌面->右键->个性化->窗口颜色->高级外观设置->项目选择(窗口).颜色1(L)选择(其它)将色调改为:8 ...
Sybase数据库常用函数
Sybase数据库常用函数一.字符串函数 1,ISNULL(EXP1,EXP2,EXP3,...) :返回第一个非空值,用法与COALESCE(exp1,exp2[,exp3...])相同: 2,T ...
20135320赵瀚青LINUX第四章读书笔记
概述什么是进程调度进程调度:在可运行态进程之间分配有限处理器时间资源的内核子系统. 一.调度策略 4.1进程类型 I/O消耗型进程:大部分时间用来提交I/O请求或是等待I/O请求,经常处于可运行状 ...

网络流量预测 国内外研究现状【见评论】——传统的ARIMA、HMM模型，目前LSTM、GRU、CNN应用较多，貌似小波平滑预处理步骤非常关键