Time Series Anomaly Detection in Network Traffic: A Use Case for Deep Neural Networks

from:https://jask.com/time-series-anomaly-detection-in-network-traffic-a-use-case-for-deep-neural-networks/

Introduction

As the waves of the big data revolution cascade across industries, more and more forms of sensor data become valuable inputs to predictive analytics.  This sensor data has an intrinsic temporal component to it – and this temporality lets us use a family of techniques for predictive analytics called Time Series Models [1]. In this blog post we explore the underlying nature of time series modeling in the context of enterprise IT analytics particularly for cyber security use-cases.

Time series can exist in many different industries and problem spaces, but at its essence it is simply a data set that has values indexed by time. In research literature we usually refer to a univariate time series as a data set that has timestamps and single values associated to each timestamp. Examples of univariate time series include the number of packets sent over time by a single host in a network, or the amount of voltage used by a smart meter for a single home over the year. Multivariate time series are an extension of the original concept to the case where each time stamp has a vector or array of values associated with it. Examples of multivariate time series are the (P/E, price, volume) for each time tick of a single stock or the tuple of information for each netflow between a single session (e.g. source and destination ip and port, packets and bytes sent and received, etc.)

Time Series Models For Network Security

Time series data is particularly prevalent in any modeling scenario dependent on input from a modern IT infrastructure. Almost every single component of the hardware and software used in enterprise networks have some sub-system that generates time series data. For cybersecurity models univariate/multivariate time series form one of the cornerstone data structures, particular for studying evolving patterns of behavior.

There are multitudes of different use cases relevant for modeling problems in cybersecurity.  To illustrate some of the common phenomena associated with this class of problems we enumerate a couple of the most common scenarios below.

Use Case 1: Detecting DDOS Attacks

With the growing prevalence of pay for play attack infrastructure, Distributed Denial of Service (DDOS) attack volume has hit all time records, including the latest attack on Krebs last year using the Merai botnet [1,2].  Denial of service attacks come in a couple of different varieties inducing ‘Layer-4’ attacks and ‘Layer-7’ attacks, referencing the OSI 7-layer network model. Typically the detection of the application layer attacks (Layer-7) is more difficult than the lower layer attacks because it involves exploiting some property of an API.  For either case though, we can use the data related to overall flow, size/volume, and app layer traffic stats generated by our routers and perimeter infrastructure over time, to build time series models for layer 4 and layer 7 inbound traffic patterns. A standard time series model is then overlayed on this data to detect change points in the normal traffic baseline of the key choke points and DMZ assets exposed to inbound network traffic. The goal of this model is to identify spikes in traffic patterns that are extreme deviations from the observed baseline like in the figure below.

Use Case 2: Detecting Failed Login Spikes

Another common attack pattern usually following a large leak of user names or PII data onto the darkweb is called Account Takeover (ATO).  For instance, after the leak of a large number of user names for a financial institution, attacks can follow by targeting the login infrastructure for the banking applications. Typically attackers will script an automated test of usernames /passwords against the list of stolen data; there will be a pattern of logins on the application that is rapidly changing the number of attempted logins per username. There is potential for major financial gain to be had, even in the case of a single successful login, so attackers are incentivized to target weak infrastructure in combination with the darkwebs economy of stolen PII. This type of attack manifests as a time series problem, particularly in the application logs of the web service being targeted. A changepoint, in total number of failed logins related to a particular external subnet or other group information, is a one primary indicator an ATO attack is taking place. Typical patterns we look for in this case can be seen as intermittent spikes of activity spread out over time (see below figure).

Use Case 3: Data Exfiltration

Finally the last common use case that is most common with regards to time series models is exfiltration of data. There are many sub-problems and behaviors to take under consideration here depending on the particular security scenario. For instance, an enterprise may be dealing with a disgruntled insider who is actively dumping data from repos onto a physical usb disk or sending it to attachments through google drive. Different paths of exfiltration require careful analysis of the protocol and methods involved. One rich area that is nice to model, using multivariate time series, is time series behaviors involving DNS data*. In the example below we see that if we build the appropriate multivariate vector on each individual endpoint, DNS requests we can predict multiple attack patterns with a single model. *See the JASK blog post here for more details on some of insights into searching for key patterns related to DNS exfiltration [10].

Time Series Prediction Using Neural Nets

Neural networks have a long and interesting history as pattern recognition engines used in machine learning [4].  Over the last decade the advent of next generation hardware for specific learning tasks (e.g. tensor processing units) along with breakthroughs in neural-net training has led us to the era of Deep Learning [6,7].  State-of-the-art libraries like TensorFlow and PyTorch provide high level abstractions for making some of most important techniques from Deep Learning available to solve business problems.

One of the most important aspects of leveraging time series output in security operations is  building detections tuned to highest priority outcomes. With most of the toolsets and solutions designed for security operations center (SOC) workflows, the operator has to specify a manual threshold in order to detect time series outliers. Neural networks provide a nice solution, from an engineering standpoint, for cybersecurity models with temporal data because they provide a more dynamic learning aspect that helps drive data-driven detections past static thresholds.

In 1997 Hochreiter and Schmidhuber wrote their original paper that introduced the concept of long-short term memory (LSTM) cell in neural net architectures [5].  Since then LSTMs have become one of the most flexible and best-in-breed solutions for a variety of classification problems in deep learning.

Traditional statistical/mathematical approaches for analyzing time series are run over a specified time window frame. The length of this window needs to be pre-determined and the results of these approaches are heavily influenced by the length of this window. Traditional machine learning algorithms require extensive feature engineering to train the classifier on. However, with any change in the input data, the dynamics of the features change as well, forcing a re-design of feature vectors to maintain performance. During the feature extraction phase, if the features are not appropriately chosen, then there are high chances of losing important information from the time series. LSTM, on the other hand, showcases the ability to learn long-term sequential patterns without the need for feature engineering:  part of the magic here is the concept of three memory gates specific to this particular implementation of deep learning. Recurrent Neural Networks suffer from the problem of vanishing gradient descent, which prevents the model from converging properly due to insufficient error correction, and which is overcome by LSTM. On account of these advantages, we turn to LSTM for modeling our time series.

TensorFlow LSTM Model Layer-By-Layer

Using TensorFlow [13]  we can build a template for processing with arbitrary types of time series data. For a good introductory overview into TensorFlow and LSTM check out some of the great books and blogs that have been published recently on the topic [9,11,12].

In our prototype example we build a simple architecture description of a neural network specifying the number of layers and some of related properties. We define our LSTM model to contain a visible layer with 3 neurons, followed by a hidden “dense” (densely connected) layer with two-dimensional output and finally an activation layer. The mean squared error regression problem is the objective that the model tries to optimize. The final output is a single prediction.

The input to the LSTM is higher-dimensional than traditional machine learning modeling inputs. A diagrammatic representation of our data is as shown:

Algorithmic Scalability Notes

For univariate time series data LSTM training scales linearly for single time series (O(N) scaling with N number of time steps). The training time using LSTM networks is one of the drawbacks but because time series models are often embarrassingly parallel these problems are suitable to running on large GPU/TPU clusters.

To test if our model overfit we plotted a training size versus the RMSE plot and saw that the error reduced with the increase in the training data (RMSE is a quick metric that is easy to use but proper overit analysis requires a more detailed testing paradigm). This is the expected trend since the model should be able to predict better with the increase in the training data. The tests below are run on synthetic time series data and are on regular CPU cores.

Conclusion

Part of the appeal of neural network methods for time series problems is they let us move past traditional threshold-based detections as well as automate some key use cases.  There is a lot depth to this topic and related engineering design. We have found Python and TensorFlow are great tools for prototyping ideas for building operationalized solutions with low initial complexity. In the realm of cybersecurity we can move a lot of the generic queries that end up being driven by fixed thresholds to a more dynamic learning paradigm driven by deep learning models. The benefit we see for choosing LSTM in these cases is that we can get better data driven detections while moving away from simple rule based time series alerts.

References

  1. Jan G. De Gooijer, Rob J. Hyndman, 25 years of time series forecasting, In International Journal of Forecasting, Volume 22, Issue 3, 2006, Pages 443-473, ISSN 0169-2070, https://doi.org/10.1016/j.ijforecast.2006.01.001
  2. https://www.theguardian.com/technology/2016/oct/26/ddos-attack-dyn-mirai-botnet
  3. https://www.abusix.com/blog/5-biggest-ddos-attacks-of-the-past-decade
  4. Christopher M. Bishop. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York, NY, USA.
  5. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 1997), 1735-1780. DOI=http://dx.doi.org/10.1162/neco.1997.9.8.1735
  6. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18, 7 (July 2006), 1527-1554. DOI=http://dx.doi.org/10.1162/neco.2006.18.7.1527
  7. http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning-part-4/
  8. Greff K, Srivastava R, Koutnik J, Steunebrink B, Schmidhuber J, LSTM: A Search Space Odyssey, IEEE Transactions on Neural Networks and Learning Systems (2016) Published by Institute of Electrical and Electronics Engineers Inc.
  9. Hands-On Machine Learning with Scikit-Learn and TensorFlow Concepts, Tools, and Techniques to Build Intelligent Systems By Aurélien Géron
  10. http://www.jask.com/cyber-security/threat-hunting-part-3-going-hunting-with-machine-learning/
  11. http://papers.nips.cc/paper/822-bounds-on-the-complexity-of-recurrent-neural-network-implementations-of-finite-state-machines.pdf
  12. https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
  13. https://www.tensorflow.org/

网络流量预测 国内外研究现状【见评论】——传统的ARIMA、HMM模型,目前LSTM、GRU、CNN应用较多,貌似小波平滑预处理步骤非常关键的更多相关文章

  1. 网络流量预测入门(一)之RNN 介绍

    目录 网络流量预测入门(一)之RNN 介绍 RNN简介 RNN 结构 RNN原理 结构原理 损失函数$E$ 反向传播 总结 参考 网络流量预测入门(一)之RNN 介绍 了解RNN之前,神经网络的知识是 ...

  2. 网络流量预测入门(三)之LSTM预测网络流量

    目录 网络流量预测入门(三)之LSTM预测网络流量 数据集介绍 预测流程 数据集准备 SVR预测 LSTM 预测 优化点 网络流量预测入门(三)之LSTM预测网络流量 在上篇博客LSTM机器学习生成音 ...

  3. 网络流量预测入门(二)之LSTM介绍

    目录 网络流量预测入门(二)之LSTM介绍 LSTM简介 Simple RNN的弊端 LSTM的结构 细胞状态(Cell State) 门(Gate) 遗忘门(Forget Gate) 输入门(Inp ...

  4. ARIMA模型实例讲解——网络流量预测可以使用啊

    ARIMA模型实例讲解:时间序列预测需要多少历史数据? from:https://www.leiphone.com/news/201704/6zgOPEjmlvMpfvaB.html   雷锋网按:本 ...

  5. Kaggle比赛冠军经验分享:如何用 RNN 预测维基百科网络流量

    Kaggle比赛冠军经验分享:如何用 RNN 预测维基百科网络流量 from:https://www.leiphone.com/news/201712/zbX22Ye5wD6CiwCJ.html 导语 ...

  6. 利用神经网络进行网络流量识别——特征提取的方法是(1)直接原始报文提取前24字节,24个报文组成596像素图像CNN识别;或者直接去掉header后payload的前1024字节(2)传输报文的大小分布特征;也有加入时序结合LSTM后的CNN综合模型

    国外的文献汇总: <Network Traffic Classification via Neural Networks>使用的是全连接网络,传统机器学习特征工程的技术.top10特征如下 ...

  7. 见过NTP服务,没见过网络流量到200M左右的NTP服务

    XXX,看来可能是NTP.CONF的文件配置错误所致了. 附上一段查看网络流量的SHELL.(好像只针对ETH0,如果要看其它的,还需要修改) #!/bin/bash typeset in_old d ...

  8. 网络流量分析——NPMD关注IT运维、识别宕机和运行不佳进行性能优化。智能化分析是关键-主动发现业务运行异常。科来做APT相关的安全分析

    科来 做流量分析,同时也做了一些安全分析(偏APT)——参考其官网:http://www.colasoft.com.cn/cases-and-application/network-security- ...

  9. VR的国内研究现状及发展趋势

    转载请声明转载地址:http://www.cnblogs.com/Rodolfo/,违者必究. 一.国内研究现状 我国虚拟现实技术研究起步较晚,与发达国家还有一定的差距. 随着计算机图形学.计算机系统 ...

随机推荐

  1. 洛谷P4799 世界冰球锦标赛 CEOI2015 Day2 meet-in-the-middle

    正解:折半搜索 解题报告: 先放个传送门QAQ 想先说下部分分?因为包含了搜索背包两个方面就觉得顺便复习下?QwQ 第一档部分分 爆搜 就最最普通的爆搜鸭,dfs(第几场,钱),然后每次可以看可以不看 ...

  2. SQL基础--查询之三--嵌套查询

    SQL基础--查询之三--嵌套查询

  3. .NET、NET Framewor以及.NET Core的关系(二)

    什么是CLR,.NET虚拟机? 实际上,.NET不仅提供了自动内存管理的支持,他还提供了一些列的如类型安全.应用程序域.异常机制等支持,这些 都被统称为CLR公共语言运行库. CLR是.NET类型系统 ...

  4. idea中添加模板。

    1:点击File>settings>live template 2: 在 Editor界面下,点击右上角 + 好, 如果想添加一个新类型的语言,点击templateGroup  输入组名. ...

  5. 数据挖掘-逻辑Logistic回归

    逻辑回归的基本过程:a建立回归或者分类模型--->b 建立代价函数 ---> c 优化方法迭代求出最优的模型参数  --->d 验证求解模型的好坏. 1.逻辑回归模型: 逻辑回归(L ...

  6. java对象,引用的区别

    一,其实 对象 就是一个类的实例 在Java中有一句比较流行的话,叫做“万物皆对象”,这是Java语言设计之初的理念之一.要理解什么是对象,需要跟类一起结合起来理解.下面这段话引自<Java编程 ...

  7. flask使用ajax提交表单

    Flask中使用ajax提交表单刷新数据,避免提交表单后使用return render_temp()会刷新页面 <form id ="test_form"> {{ fo ...

  8. myeclipse安装jadclipse(反编译工具)

    我是myeclipse5. 的IDE工具.为了能反编译class文件,上网搜索了很多资料,终于找到一下的一段资料: .将jad.exe 复制到myeclipse安装目录的jre/bin目录下, 如:C ...

  9. 无界面Ubuntu服务器搭建selenium+chromedriver+VNC运行环境

    搭建背景 有时候我们需要把基于selenium的爬虫放到服务器上跑的时候,就需要这样一套运行环境,其中VNC是虚拟的显示模式,用于排查定位线上问题以及实时运行情况. 搭建流程 安装虚拟输出设备:sud ...

  10. postgresql常用操作

    需要安装的软件包: apt-get install postgresql postgresql-client-9.1 postgresql-common postgresql-9.1 postgres ...