无监督异常检测之卷积AE和卷积VAE
尝试用卷积AE和卷积VAE做无监督检测,思路如下:
1.先用正常样本训练AE或VAE
2.输入测试集给AE或VAE,获得重构的测试集数据。
3.计算重构的数据和原始数据的误差,如果误差大于某一个阈值,则此测试样本为一样。
对于数据集的描述如下:
本数据集一共有10100个样本,每个样本是1行48列的向量,为了让它变成矩阵,自己在末尾补了一个0,将其转变成7*7的矩阵。前8000个是正常样本。后2100个中,前300个是正常样本,之后的1800个中包括6种异常时间序列,每种异常时间序列包括300个样本。
VAE的代码如下:
#https://blog.csdn.net/wyx100/article/details/80647379
'''This script demonstrates how to build a variational autoencoder
with Keras and deconvolution layers.
使用Keras和反卷积层建立变分自编码器演示脚本
# Reference
- Auto-Encoding Variational Bayes
自动编码变分贝叶斯
https://arxiv.org/abs/1312.6114
'''
from __future__ import print_function import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from pandas import read_csv
from keras.layers import Input, Dense, Lambda, Flatten, Reshape
from keras.layers import Conv2D, Conv2DTranspose
from keras.models import Model
from keras import backend as K
from keras import metrics
import xlwt
from keras.datasets import mnist
from matplotlib import pyplot
import numpy
# input image dimensions
# 输入图像维度
img_rows, img_cols, img_chns = 7, 7, 1
dimension_image=7
# number of convolutional filters to use
# 使用的卷积过滤器数量
filters = 64
# convolution kernel size
# 卷积核大小
num_conv = 3 batch_size = 50
if K.image_data_format() == 'channels_first':
original_img_size = (img_chns, img_rows, img_cols)
else:
original_img_size = (img_rows, img_cols, img_chns)
latent_dim = 2
intermediate_dim = 128
epsilon_std = 1.0
epochs = 100 x = Input(shape=original_img_size)
conv_1 = Conv2D(img_chns,
kernel_size=(2, 2),
padding='same', activation='relu')(x)
conv_2 = Conv2D(filters,
kernel_size=(2, 2),
padding='same', activation='relu',
strides=(2, 2))(conv_1)
conv_3 = Conv2D(filters,
kernel_size=num_conv,
padding='same', activation='relu',
strides=1)(conv_2)
conv_4 = Conv2D(filters,
kernel_size=num_conv,
padding='same', activation='relu',
strides=1)(conv_3)
flat = Flatten()(conv_4)
hidden = Dense(intermediate_dim, activation='relu')(flat) z_mean = Dense(latent_dim)(hidden)
z_log_var = Dense(latent_dim)(hidden) def sampling(args):
z_mean, z_log_var = args
epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim),
mean=0., stddev=epsilon_std)
return z_mean + K.exp(z_log_var) * epsilon # note that "output_shape" isn't necessary with the TensorFlow backend
# so you could write `Lambda(sampling)([z_mean, z_log_var])`
# 注意,“output_shape”对于TensorFlow后端不是必需的。因此可以编写Lambda(sampling)([z_mean, z_log_var])`
z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var]) # we instantiate these layers separately so as to reuse them later
# 分别实例化这些层,以便在以后重用它们。
number=4
decoder_hid = Dense(intermediate_dim, activation='relu')
decoder_upsample = Dense(filters * number * number, activation='relu') if K.image_data_format() == 'channels_first':
output_shape = (batch_size, filters, number, number)
else:
output_shape = (batch_size, number, number, filters) decoder_reshape = Reshape(output_shape[1:])
decoder_deconv_1 = Conv2DTranspose(filters,
kernel_size=num_conv,
padding='same',
strides=1,
activation='relu')
decoder_deconv_2 = Conv2DTranspose(filters,
kernel_size=num_conv,
padding='same',
strides=1,
activation='relu')
if K.image_data_format() == 'channels_first':
output_shape = (batch_size, filters, 13, 13)
else:
output_shape = (batch_size,13, 13, filters)
decoder_deconv_3_upsamp = Conv2DTranspose(filters,
kernel_size=(3, 3),
strides=(2, 2),
padding='valid',
activation='relu')
decoder_mean_squash = Conv2D(img_chns,
kernel_size=3,
padding='valid',
activation='sigmoid') hid_decoded = decoder_hid(z)
up_decoded = decoder_upsample(hid_decoded)
reshape_decoded = decoder_reshape(up_decoded)
deconv_1_decoded = decoder_deconv_1(reshape_decoded)
deconv_2_decoded = decoder_deconv_2(deconv_1_decoded)
x_decoded_relu = decoder_deconv_3_upsamp(deconv_2_decoded)
x_decoded_mean_squash = decoder_mean_squash(x_decoded_relu) # instantiate VAE model
# 实例化VAE模型
vae = Model(x, x_decoded_mean_squash)
# Compute VAE loss
# 计算VAE损失
xent_loss = img_rows * img_cols * metrics.binary_crossentropy(
K.flatten(x),
K.flatten(x_decoded_mean_squash))
kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
vae_loss = K.mean(xent_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='Adam')
vae.summary() dataset = read_csv('randperm_zerone_Dataset.csv')
values = dataset.values
XY= values
n_train_hours1 =7000
n_train_hours3 =8000
x_train=XY[:n_train_hours1,:]
x_valid =XY[n_train_hours1:n_train_hours3, :]
x_test =XY[n_train_hours3:, :]
x_train=x_train.reshape(-1,dimension_image,dimension_image,1)
x_valid=x_valid.reshape(-1,dimension_image,dimension_image,1)
x_test=x_test.reshape(-1,dimension_image,dimension_image,1) history=vae.fit(x_train,
shuffle=True,
epochs=epochs,
batch_size=batch_size,
validation_data=(x_valid, None))
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='valid')
pyplot.legend()
pyplot.show() # 建立一个潜在空间输入模型
encoder = Model(x, z_mean)
# 在潜在空间中显示数字类的2D图
x_test_encoded = encoder.predict(x_test, batch_size=batch_size)
plt.figure(figsize=(6, 6))
plt.scatter(x_test_encoded[:, 0], x_test_encoded[:, 1])
plt.show() Reconstructed_train = vae.predict(x_train)
Reconstructed_valid = vae.predict(x_valid)
Reconstructed_test = vae.predict(x_test)
ReconstructedData1=np.vstack((Reconstructed_train,Reconstructed_valid))
ReconstructedData2=np.vstack((ReconstructedData1,Reconstructed_test))
ReconstructedData3=ReconstructedData2.reshape((ReconstructedData2.shape[0], -1)) numpy.savetxt("ReconstructedData.csv", ReconstructedData3, delimiter=',')
AE代码如下
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras import backend as K
import numpy as np
from pandas import read_csv
from matplotlib import pyplot
import numpy dimension_image=7
input_img = Input(shape=(dimension_image, dimension_image, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x) # at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (2, 2), activation='sigmoid')(x) autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.summary() dataset = read_csv('randperm_zerone_Dataset.csv')
values = dataset.values
XY= values
n_train_hours1 =7000
n_train_hours3 =8000
x_train=XY[:n_train_hours1,:]
x_valid =XY[n_train_hours1:n_train_hours3, :]
x_test =XY[n_train_hours3:, :]
x_train=x_train.reshape(-1,dimension_image,dimension_image,1)
x_valid=x_valid.reshape(-1,dimension_image,dimension_image,1)
x_test=x_test.reshape(-1,dimension_image,dimension_image,1) history=autoencoder.fit(x_train, x_train,
epochs=200,
batch_size=32,
shuffle=True,
validation_data=(x_valid, x_valid))
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='valid')
pyplot.legend()
pyplot.show()
Reconstructed_train = autoencoder.predict(x_train)
Reconstructed_valid = autoencoder.predict(x_valid)
Reconstructed_test = autoencoder.predict(x_test)
ReconstructedData1=np.vstack((Reconstructed_train,Reconstructed_valid))
ReconstructedData2=np.vstack((ReconstructedData1,Reconstructed_test))
ReconstructedData3=ReconstructedData2.reshape((ReconstructedData2.shape[0], -1)) numpy.savetxt("ReconstructedData.csv", ReconstructedData3, delimiter=',')
至于数据集,正在上传到百度文库,以后更新
无监督异常检测之卷积AE和卷积VAE的更多相关文章
- 无监督异常检测之LSTM组成的AE
我本来就是处理时间序列异常检测的,之前用了全连接层以及CNN层组成的AE去拟合原始时间序列,发现效果不佳.当利用LSTM组成AE去拟合时间序列时发现,拟合的效果很好.但是,利用重构误差去做异常检测这条 ...
- 无监督︱异常、离群点检测 一分类——OneClassSVM
OneClassSVM两个功能:异常值检测.解决极度不平衡数据 因为之前一直在做非平衡样本分类的问题,其中如果有一类比例严重失调,就可以直接用这个方式来做:OneClassSVM:OneClassSV ...
- AIOps探索:基于VAE模型的周期性KPI异常检测方法——VAE异常检测
AIOps探索:基于VAE模型的周期性KPI异常检测方法 from:jinjinlin.com 作者:林锦进 前言 在智能运维领域中,由于缺少异常样本,有监督方法的使用场景受限.因此,如何利用无监 ...
- 从时序异常检测(Time series anomaly detection algorithm)算法原理讨论到时序异常检测应用的思考
1. 主要观点总结 0x1:什么场景下应用时序算法有效 历史数据可以被用来预测未来数据,对于一些周期性或者趋势性较强的时间序列领域问题,时序分解和时序预测算法可以发挥较好的作用,例如: 四季与天气的关 ...
- Abnormal Detection(异常检测)和 Supervised Learning(有监督训练)在异常检测上的应用初探
1. 异常检测 VS 监督学习 0x1:异常检测算法和监督学习算法的对比 总结来讲: . 在异常检测中,异常点是少之又少,大部分是正常样本,异常只是相对小概率事件 . 异常点的特征表现非常不集中,即异 ...
- 杜伦大学提出GANomaly:无需负例样本实现异常检测
杜伦大学提出GANomaly:无需负例样本实现异常检测 本期推荐的论文笔记来自 PaperWeekly 社区用户 @TwistedW.在异常检测模块下,如果没有异常(负例样本)来训练模型,应该如何实现 ...
- 基于变分自编码器(VAE)利用重建概率的异常检测
本文为博主翻译自:Jinwon的Variational Autoencoder based Anomaly Detection using Reconstruction Probability,如侵立 ...
- kaggle信用卡欺诈看异常检测算法——无监督的方法包括: 基于统计的技术,如BACON *离群检测 多变量异常值检测 基于聚类的技术;监督方法: 神经网络 SVM 逻辑回归
使用google翻译自:https://software.seek.intel.com/dealing-with-outliers 数据分析中的一项具有挑战性但非常重要的任务是处理异常值.我们通常将异 ...
- 使用GAN进行异常检测——可以进行网络流量的自学习哇,哥哥,人家是半监督,无监督的话,还是要VAE,SAE。
实验了效果,下面的还是图像的异常检测居多. https://github.com/LeeDoYup/AnoGAN https://github.com/tkwoo/anogan-keras 看了下,本 ...
随机推荐
- Hive中遇到全角
今天在梳理银行SQL业务的时候出现了一个全角的问题:两个种代码 都可以 使用了UDF函数解决 package 广发; import org.apache.hadoop.hive.ql.exec.Des ...
- [暂停维护]基于8211lib库对s57电子海图的解析和存储
此篇博文停止维护,欢迎移步最新地址(含源代码),https://www.yanlongwang.net/USV/ENC-analysis-store.md/, 查看最新文章. 电子海图是为适用航海需要 ...
- Educational Codeforces Round 73 (Rated for Div. 2) A. 2048 Game
链接: https://codeforces.com/contest/1221/problem/A 题意: You are playing a variation of game 2048. Init ...
- java+上传整个文件夹的所有文件
我们平时经常做的是上传文件,上传文件夹与上传文件类似,但也有一些不同之处,这次做了上传文件夹就记录下以备后用. 首先我们需要了解的是上传文件三要素: 1.表单提交方式:post (get方式提交有大小 ...
- iosselect插件
好用的时间选择器/地址选择器插件 iosselect.js
- 彩色模型,CIE XYZ,CIE RGB
学习DIP第8天 转载请标明出处:http://blog.csdn.net/tonyshengtan,欢迎大家转载,发现博客被某些论坛转载后,图像无法正常显示,无法正常表达本人观点,对此表示很不满意. ...
- 顺序表应用1:多余元素删除之移位算法(SDUT 3324)
Problem Description 一个长度不超过10000数据的顺序表,可能存在着一些值相同的"多余"数据元素(类型为整型),编写一个程序将"多余"的数据 ...
- layer提示带文字
直接撸代码: //加载层-风格4 layer.msg('加载中', { icon: 16 ,shade: 0.01 });
- Python基础之基本数据类型的总结
基本数据类型的总结 1. 按照存储空间的占用分(从低到高) 数字 字符串 集合:无序,即无序存索引相关信息 元组:有序,需要存索引相关信息,不可变 列表:有序,需要存索引相关信息,可变,需要处理数据的 ...
- 完全免费,再也不用担心转pdf文件乱来乱去的问题了
完全免费,再也不用担心转pdf文件乱来乱去的问题了. 源代码:https://github.com/xlgwr/WpsToPdf.git 第三方插件Bye Bye... 功能说明 主要引用Wps金山办 ...