NLP（十八）一维卷积网络IMDB情感分析

原文链接：http://www.one2know.cn/nlp18/

准备

Keras的IMDB数据集，包含一个词集和对应的情感标签

import pandas as pd

from keras.preprocessing import sequence

from keras.models import Sequential

from keras.layers import Dense,Dropout,Activation

from keras.layers import Embedding

from keras.layers import Conv1D,GlobalAveragePooling1D

from keras.datasets import imdb

from sklearn.metrics import accuracy_score,classification_report

# 参数 最大特征数6000 单个句子最大长度400

max_features = 6000

max_length = 400

(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)

print(len(x_train),'train observations')

print(len(x_test),'test observations')

wind = imdb.get_word_index() # 给单词编号，用数字代替单词

revind = dict((k,v) for k,v in enumerate(wind))

# 单词编号:情感词性编号 字典 => 情感词性编号:一堆该词性的单词编号列表

print(x_train[0])

print(y_train[0])

def decode(sent_list): # 逆映射字典解码 数字=>单词

    new_words = []

    for i in sent_list:

        new_words.append(revind[i])

    comb_words = " ".join(new_words)

    return comb_words

print(decode(x_train[0]))

输出：

25000 train observations

25000 test observations

[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 。。。]

1

tsukino 'royale rumbustious canet thrace bellow headbanger 。。。

如何实现

1.预处理，数据整合到一个固定的维度

2.一维CNN模型的构建和验证

3.模型评估
代码

import pandas as pd

from keras.preprocessing import sequence

from keras.models import Sequential

from keras.layers import Dense,Dropout,Activation

from keras.layers import Embedding

from keras.layers import Conv1D,GlobalAveragePooling1D

from keras.datasets import imdb

from sklearn.metrics import accuracy_score,classification_report

# 参数 最大特征数6000 单个句子最大长度400

max_features = 6000

max_length = 400

(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)

# print(x_train) # 一堆句子，每个句子有有一堆单词编码

# print(y_train) # 一堆0或1

# print(len(x_train),'train observations')

# print(len(x_test),'test observations')

wind = imdb.get_word_index() # 给单词编号，用数字代替单词

revind = dict((k, v) for k, v in enumerate(wind))

# 单词编号:情感词性编号 字典 => 情感词性编号:一堆该词性的单词编号列表

# print(x_train[0])

# print(y_train[0])

def decode(sent_list): # 逆映射字典解码 数字=>单词

    new_words = []

    for i in sent_list:

        new_words.append(revind[i])

    comb_words = " ".join(new_words)

    return comb_words

# print(decode(x_train[0]))

# 将句子填充到最大长度400 使数据长度保持一致

x_train = sequence.pad_sequences(x_train,maxlen=max_length)

x_test = sequence.pad_sequences(x_test,maxlen=max_length)

print('x_train.shape:',x_train.shape)

print('x_test.shape:',x_test.shape)

## Keras框架 深度学习 一维CNN模型

# 参数

batch_size = 32

embedding_dims = 60

num_kernels = 260

kernel_size = 3

hidden_dims = 300

epochs = 3

# 建立模型

model = Sequential()

model.add(Embedding(max_features,embedding_dims,input_length=max_length))

model.add(Dropout(0.2))

model.add(Conv1D(num_kernels,kernel_size,padding='valid',activation='relu',strides=1))

model.add(GlobalAveragePooling1D())

model.add(Dense(hidden_dims))

model.add(Dropout(0.5))

model.add(Activation('relu'))

model.add(Dense(1))

model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])

print(model.summary())

model.fit(x_train,y_train,batch_size=batch_size,epochs=epochs,validation_split=0.2)

# 模型预测

y_train_predclass = model.predict_classes(x_train,batch_size=batch_size)

y_test_preclass = model.predict_classes(x_test,batch_size=batch_size)

y_train_predclass.shape = y_train.shape

y_test_preclass.shape = y_test.shape

print('\n\nCNN 1D - Train accuracy:',round(accuracy_score(y_train,y_train_predclass),3))

print('\nCNN 1D of Training data\n',classification_report(y_train,y_train_predclass))

print('\nCNN 1D - Train Confusion Matrix\n\n',pd.crosstab(y_train,y_train_predclass,

                    rownames=['Actuall'],colnames=['Predicted']))

print('\nCNN 1D - Test accuracy:',round(accuracy_score(y_test,y_test_preclass),3))

print('\nCNN 1D of Test data\n',classification_report(y_test,y_test_preclass))

print('\nCNN 1D - Test Confusion Matrix\n\n',pd.crosstab(y_test,y_test_preclass,

                    rownames=['Actuall'],colnames=['Predicted']))

输出：

Using TensorFlow backend.

x_train.shape: (25000, 400)

x_test.shape: (25000, 400)

WARNING:tensorflow:From

D:\Python37\Lib\site-packages\tensorflow\python\framework\op_def_library.py:263:

colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a

future version.

Instructions for updating:

Colocations handled automatically by placer.

WARNING:tensorflow:From

D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout

(from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a

future version.

Instructions for updating:

Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

_________________________________________________________________

Layer (type)                 Output Shape              Param #   

=================================================================

embedding_1 (Embedding)      (None, 400, 60)           360000   

_________________________________________________________________

dropout_1 (Dropout)          (None, 400, 60)           0        

_________________________________________________________________

conv1d_1 (Conv1D)            (None, 398, 260)          47060    

_________________________________________________________________

global_average_pooling1d_1 ( (None, 260)               0        

_________________________________________________________________

dense_1 (Dense)              (None, 300)               78300    

_________________________________________________________________

dropout_2 (Dropout)          (None, 300)               0        

_________________________________________________________________

activation_1 (Activation)    (None, 300)               0        

_________________________________________________________________

dense_2 (Dense)              (None, 1)                 301      

_________________________________________________________________

activation_2 (Activation)    (None, 1)                 0         

=================================================================

Total params: 485,661

Trainable params: 485,661

Non-trainable params: 0

_________________________________________________________________

None

WARNING:tensorflow:From

D:\Python37\Lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from

tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.

Instructions for updating:

Use tf.cast instead.

Train on 20000 samples, validate on 5000 samples

Epoch 1/3

2019-07-07 15:27:37.848057: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU

supports instructions that this TensorFlow binary was not compiled to use: AVX2

   32/20000 [..............................] - ETA: 7:03 - loss: 0.6929 - acc: 0.5000

   64/20000 [..............................] - ETA: 4:13 - loss: 0.6927 - acc: 0.5156

   96/20000 [..............................] - ETA: 3:19 - loss: 0.6933 - acc: 0.5000

  128/20000 [..............................] - ETA: 2:50 - loss: 0.6935 - acc: 0.4844

  160/20000 [..............................] - ETA: 2:32 - loss: 0.6931 - acc: 0.4813

  此处省略一堆epoch的一堆操作

CNN 1D - Train accuracy: 0.949

CNN 1D of Training data

               precision    recall  f1-score   support

           0       0.94      0.96      0.95     12500

           1       0.95      0.94      0.95     12500

    accuracy                           0.95     25000

   macro avg       0.95      0.95      0.95     25000

weighted avg       0.95      0.95      0.95     25000

CNN 1D - Train Confusion Matrix

 Predicted      0      1

Actuall               

0          11938    562

1            715  11785

CNN 1D - Test accuracy: 0.876

CNN 1D of Test data

               precision    recall  f1-score   support

           0       0.86      0.89      0.88     12500

           1       0.89      0.86      0.87     12500

    accuracy                           0.88     25000

   macro avg       0.88      0.88      0.88     25000

weighted avg       0.88      0.88      0.88     25000

CNN 1D - Test Confusion Matrix

 Predicted      0      1

Actuall               

0          11144   1356

1           1744  10756

NLP（十八）一维卷积网络IMDB情感分析的更多相关文章

NLP入门（十）使用LSTM进行文本情感分析
情感分析简介文本情感分析(Sentiment Analysis)是自然语言处理(NLP)方法中常见的应用,也是一个有趣的基本任务,尤其是以提炼文本情绪内容为目的的分类.它是对带有情感色彩的主观性 ...
十八、centos7网络属性配置
一.为什么需要这个服务器通常有多块网卡,有板载集成的,同时也有插在PCIe插槽的.Linux系统的命名原来是eth0,eth1这样的形式,但是这个编号往往不一定准确对应网卡接口的物理顺序.为解决这类 ...
keras—多层感知器MLP—IMDb情感分析
import urllib.request import os import tarfile from keras.datasets import imdb from keras.preprocess ...
最全面的图卷积网络GCN的理解和详细推导，都在这里了!
目录目录 1. 为什么会出现图卷积神经网络? 2. 图卷积网络的两种理解方式 2.1 vertex domain(spatial domain):顶点域(空间域) 2.2 spectral doma ...
浅谈NLP 文本分类/情感分析任务中的文本预处理工作
目录浅谈NLP 文本分类/情感分析任务中的文本预处理工作前言 NLP相关的文本预处理浅谈NLP 文本分类/情感分析任务中的文本预处理工作前言之所以心血来潮想写这篇博客,是因为最近在关注N ...
TensorFlow实现文本情感分析详解
http://c.biancheng.net/view/1938.html 前面我们介绍了如何将卷积网络应用于图像.本节将把相似的想法应用于文本. 文本和图像有什么共同之处?乍一看很少.但是,如果将句 ...
TensorFlow文本情感分析实现
TensorFlow文本情感分析实现前面介绍了如何将卷积网络应用于图像.本文将把相似的想法应用于文本. 文本和图像有什么共同之处?乍一看很少.但是,如果将句子或文档表示为矩阵,则该矩阵与其中每个单元 ...
学习笔记CB009:人工神经网络模型、手写数字识别、多层卷积网络、词向量、word2vec
人工神经网络,借鉴生物神经网络工作原理数学模型. 由n个输入特征得出与输入特征几乎相同的n个结果,训练隐藏层得到意想不到信息.信息检索领域,模型训练合理排序模型,输入特征,文档质量.文档点击历史.文档 ...
NLP十大里程碑
NLP十大里程碑 2.1 里程碑一:1985复杂特征集复杂特征集(complex feature set)又叫做多重属性(multiple features)描写.语言学里,这种描写方法最早出现在语 ...

随机推荐

Pivotal：15分钟部署你的应用
“ 本篇文章介绍的是PaaS平台Pivotal Cloud Foundry(以下简称PCF)的初步使用,相比于传统的IaaS平台(比如阿里云),PCF可实现快速迭代开发与部署,让您专注于业务开发.” ...
Selenium浏览器自动化测试框架
selenium简介介绍 Selenium [1] 是一个用于Web应用程序测试的工具.Selenium测试直接运行在浏览器中,就像真正的用户在操作一样.支持的浏览器包括IE(7, 8, 9, 1 ...
林大妈的JavaScript基础知识（三）：JavaScript编程（2）函数
JavaScript是一门函数式的面向对象编程语言.了解函数将会是了解对象创建和操作.原型及原型方法.模块化编程等的重要基础.函数包含一组语句,它的主要功能是代码复用.隐藏信息和组合调用.我们编程就是 ...
【iOS】获取视图的中心和宽高
示例代码: NSLog(@"%f, %f", self.view.center.x, self.view.center.y); NSLog(@"%f, %f", ...
poj 2503 Babelfish(字典树或map或哈希或排序二分)
输入若干组对应关系,然后输入应该单词,输出对应的单词,如果没有对应的输出eh 此题的做法非常多,很多人用了字典树,还有有用hash的,也有用了排序加二分的(感觉这种方法时间效率最差了),这里我参考了M ...
c# 控制台console进度条
1 说明笔者大多数的开发在 Linux 下,多处用到进度条的场景,但又无需用到图形化界面,所以就想着弄个 console 下的进度条显示. 2 步骤清行显示 //清行处理操作 int curren ...
HiveQL DDL 常用QL示例资料
hive-version2.1.1 DDL操作 Create/Drop/Alter/Use Database 创建数据库 //官方指导 CREATE (DATABASE|SCHEMA) [IF NOT ...
Linux下Docker以及portainer相关配置
一.安装使用Docer CE 本文以CentOS 7为例,安装docker CE版本,docker有两种版本,社区版本CE和企业版本EE,此处学习研究以CE版本为例, 两种安装方式可选:1.使用yum ...
Mac OS 上的一些骚操作
本帖记录个人在使用 Mac 操作系统上的一些骚操作,不断更新,以飨读者. 快速移动网页到顶部或底部用双指上下划触摸板吗?NO,我们有更骚的操作: command + ↑ 回到顶部 command + ...
node获取本机动态IP,并对应修改相关JavaScript文件的IP地址
目录由于本机是自动获取分配的动态IP,所以每次重启后需要重新更改与IP相关文件参考时间:2018-08-02,更新时间:2018-11-06 注意:在win10环境运行无问题由于本机是自动获取 ...

NLP（十八） 一维卷积网络IMDB情感分析

NLP（十八） 一维卷积网络IMDB情感分析的更多相关文章

随机推荐

热门专题

NLP（十八）一维卷积网络IMDB情感分析

NLP（十八）一维卷积网络IMDB情感分析的更多相关文章