Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做
A sample network anomaly detection project
Suppose we wanted to detect network anomalies with the understanding that an anomaly might point to hardware failure, application failure, or an intrusion.
What our model will show us
The RNN will train on a numeric representation of network activity logs, feature vectors that translate the raw mix of text and numerical data in logs.
By feeding a large volume of network activity logs, with each log line a time step, to the RNN, the neural net will learn what normal expected network activity looks like. When this trained network is fed new activity from the network, it will be able to classify the activity as normal and expected, or anomalous.
Training a neural net to recognize expected behavior has an advantage, because it is rare to have a large volume of abnormal data, or certainly not enough to accurately classify all abnormal behavior. We train our network on the normal data we have, so that it alerts us to non-normal activity in the future. We train for the opposite where we have enough data about attacks.
As an aside, the trained network does not necessarily note that certain activities happen at certain times (it does not know that a particular day is Sunday), but it does notice those more obvious temporal patterns we would be aware of, along with other connections between events that might not be apparent.
We’ll outline how to approach this problem using Deeplearning4j, a widely used open-source library for deep learning on the JVM. Deeplearning4j comes with a variety of tools that are useful throughout the model development process: DataVec is a collection of tools to assist with the extract-transform-load (ETL) tasks used to prepare data for model training. Just as Sqoop helps load data into Hadoop, DataVec helps load data into neural nets by cleaning, preprocessing, normalizing and standardizing data. It’s similar to Trifacta’s Wrangler but focused a bit more on binary data.
Getting started
The first stage includes typical big data tasks and ETL: We need to gather, move, store, prepare, normalize, and vectorize the logs. The size of the time steps must be decided. Data transformation may require significant effort, since JSON logs, text logs, and logs with inconsistent labeling patterns will have to be read and converted into a numeric array. DataVec can help transform and normalize that data. As is the norm when developing machine learning models, the data must be split into a training set and a test (or evaluation) set.
Training the network
The net’s initial training will run on the training split of the input data.
For the first training runs, you may need to adjust some hyperparameters (“hyperparameters” are parameters that control the “configuration” of the model and how it trains) so that the model actually learns from the data, and does so in a reasonable amount of time. We discuss a few hyperparameters below. As the model trains, you should look for a steady decrease in error.
There is a risk that a neural network model will "overfit" on the data. A model that has been trained to the point of overfitting the dataset will get good scores on the training data, but will not make accurate decisions about data it has never seen before. It doesn’t “generalize” -- in machine-learning parlance. Deeplearning4J provides regularization tools and “early stopping” that help prevent overfitting while training.
Training the neural net is the step that will take the most time and hardware. Running training on GPUs will lead to a significant decrease in training time, especially for image recognition, but additional hardware comes with additional cost, so it’s important that your deep-learning framework use hardware as efficiently as possible. Cloud services such as Azure and Amazon provide access to GPU-based instances, and neural nets can be trained on heterogenous clusters with scalable commodity servers as well as purpose-built machines.
Productionizing the model
Deeplearning4J provides a ModelSerializer class to save a trained model. A trained model can be saved and either be used (i.e., deployed to production) or updated later with further training.
When performing network anomaly detection in production, log files need to be serialized into the same format that the model trained on, and based on the output of the neural network, you would get reports on whether the current activity was in the range of normal expected network behavior.
Sample code
The configuration of a recurrent neural network might look something like this:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(123)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.weightInit(WeightInit.XAVIER)
.updater(Updater.NESTEROVS).momentum(0.9)
.learningRate(0.005)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(0.5)
.list()
.layer(0, new GravesLSTM.Builder().activation("tanh").nIn(1).nOut(10).build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
.activation("softmax").nIn(10).nOut(numLabelClasses).build())
.pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
Let’s describe a few important lines of this code:
.seed(123)
sets a random seed to initialize the neural net’s weights, in order to obtain reproducible results. Typically, coefficients are initialized randomly, and so to obtain consistent results while adjusting other hyperparameters, we need to set a seed, so we can use the same random weights over and over as we tune and test.
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
determines which optimization algorithm to use (in this case, stochastic gradient descent) to determine how to modify the weights to improve the error score. You probably won’t have to modify this.
.learningRate(0.005)
When using stochastic gradient descent, the error gradient (that is, the relation of a change in coefficients to a change in the net’s error) is calculated and the weights are moved along this gradient in an attempt to move the error towards a minimum. SGD gives us the direction of less error, and the learning rate determines how big of a step is taken in that direction. If the learning rate is too high, you may overshoot the error minimum; if it is too low, your training will take forever. This is a hyperparameter you may need to adjust.
Getting Help
There is an active community of Deeplearning4J users who can be found on several support channels on Gitter.
About the author
Tom Hanlon is currently at Skymind.IO where he is developing a Training Program for Deeplearning4J. The consistent thread in Tom’s career has been data, from MySQL to Hadoop and now neural networks.
摘自:https://www.infoq.com/articles/deep-learning-time-series-anomaly-detection
Anomaly Detection for Time Series Data with Deep Learning——本质分类正常和异常的行为,对于检测异常行为,采用预测正常行为方式来做的更多相关文章
- Open Data for Deep Learning
Open Data for Deep Learning Here you’ll find an organized list of interesting, high-quality datasets ...
- 学习Data Science/Deep Learning的一些材料
原文发布于我的微信公众号: GeekArtT. 从CFA到如今的Data Science/Deep Learning的学习已经有一年的时间了.期间经历了自我的兴趣.擅长事务的探索和试验,有放弃了的项目 ...
- 吴恩达机器学习笔记54-开发与评价一个异常检测系统及其与监督学习的对比(Developing and Evaluating an Anomaly Detection System and the Comparison to Supervised Learning)
一.开发与评价一个异常检测系统 异常检测算法是一个非监督学习算法,意味着我们无法根据结果变量
- (转)分布式深度学习系统构建 简介 Distributed Deep Learning
HOME ABOUT CONTACT SUBSCRIBE VIA RSS DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part ...
- (转) Deep Learning Resources
转自:http://www.jeremydjacksonphd.com/category/deep-learning/ Deep Learning Resources Posted on May 13 ...
- What are some good books/papers for learning deep learning?
What's the most effective way to get started with deep learning? 29 Answers Yoshua Bengio, ...
- Deep Learning Papers Reading Roadmap
Deep Learning Papers Reading Roadmap https://github.com/songrotek/Deep-Learning-Papers-Reading-Roadm ...
- Why deep learning?
1. 深度学习中网络越深越好么? 理论上说是这样的,因为网络越深,参数也越多,拟合能力也越强(但实际情况是,网络很深的时候,不容易训练,使得表现能力可能并不好). 2. 那么,不同什么深度的网络,在参 ...
- Does Deep Learning Come from the Devil?
Does Deep Learning Come from the Devil? Deep learning has revolutionized computer vision and natural ...
随机推荐
- vue-cil 和 webpack 中本地静态图片的路径问题解决方案
1.小于8K的图片将直接以base64的形式内联在代码中,可以减少一次http请求. 2.大于8k的呢?则直接file-loader打包, 这里并没有写明file-loader.但是确实是需要安装,否 ...
- 处理字符串的一些C函数
注意:以下函数都包含在ctype.h头文件中 1.isalpha函数 用来判断得到的参数是不是字母 #include<stdio.h> #include<ctype.h> in ...
- 基于togglepoolmember.pl编写F5设备控制模块
为了方便利用python对F5设备进行操作,本文将togglepoolmember.pl对F5设备的控制写成了python模块,源代码例如以下: #!/usr/bin/python # -*- cod ...
- ArcMap中使用ArcPy实现Geometry与WKT的相互转换
在Web GIS迅猛发展的今天,使用浏览器来进行交互以其方便性.快捷性被广大用户所推崇,那么在传输格式方面,都已比較简单的JSON或者WKT来解决网络带宽带来的数据压力. 在ArcGIS10.2版本号 ...
- Python,Pycharm,Anaconda等的关系与安装过程~为初学者跳过各种坑
1.致欢迎词 我将详讲讲述在学Python初期的各种手忙脚乱的问题的解决,通过这些步骤的操作,让你的注意力集中在Python的语法上以及后面利用Python所解决的项目问题上.而我自己作为小白,很不幸 ...
- python读写数据篇
一.读写数据1.读数据 #使用open打开文件后一定要记得调用文件对象的close()方法.比如可以用try/finally语句来确保最后能关闭文件.file_object = open('thefi ...
- mongodb启动不了:提示错误信息为 child process failed, exited with error number 100
[启动mongo 副本集错误提示]: [原因分析说明]: 查询很多资料得知由于上次使用了暴力关闭系统或者DB,导致数据文件锁住. [解决办法]: 1. 在 mongo.conf 文件添加一下属性值 ...
- 微信小程序页面布局之弹性布局-Flex介绍
布局的传统解决方案,基于盒状模型,依赖 display 属性 + position属性 + float属性.它对于那些特殊布局非常不方便,比如,垂直居中就不容易实现. 2009年,W3C 提出了一种新 ...
- python 基础 7.2 时间格式的相互转换
#/usr/bin/python #coding=utf-8 #@Time :2017/11/9 8:55 #@Auther :liuzhenchuan #@File :时间格式的相互转换.p ...
- 3597: [Scoi2014]方伯伯运椰子[分数规划]
3597: [Scoi2014]方伯伯运椰子 Time Limit: 30 Sec Memory Limit: 64 MB Submit: 404 Solved: 249 [Submit][Sta ...