原文链接:http://www.one2know.cn/nlp19/

  • 使用IMDB情绪数据来比较CNN和RNN两种方法,预处理与上节相同
from __future__ import print_function
import numpy as np
import pandas as pd
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
from keras.datasets import imdb
from sklearn.metrics import accuracy_score,classification_report # 限制最大的特征数
max_features = 15000
max_len = 300
batch_size = 64 # 加载数据
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
print(len(x_train),'train observations')
print(len(x_test),'test observations')

输出:

Using TensorFlow backend.
25000 train observations
25000 test observations
  • 如何实现

    1.预处理

    2.LSTM模型的构建和验证

    3.模型评估
  • 代码
from __future__ import print_function
import numpy as np
import pandas as pd
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional
from keras.datasets import imdb
from sklearn.metrics import accuracy_score,classification_report # 限制最大的特征数
max_features = 15000
max_len = 300
batch_size = 64 # 加载数据
(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)
# print(len(x_train),'train observations')
# print(len(x_test),'test observations') # 通过序列填充将所有的数据整合为一个固定维度,提高运行效率
x_train_2 = sequence.pad_sequences(x_train,maxlen=max_len)
x_test_2 = sequence.pad_sequences(x_test,maxlen=max_len)
print('x_train_2.shape:',x_train_2.shape)
print('x_test_2.shape:',x_test_2.shape)
y_train = np.array(y_train)
y_test = np.array(y_test) # keras框架 => 双向LSTM模型
# 双向LSTM网络有前向和后向连接,使句子中的单词可以同时与左右词汇产生连接
model = Sequential()
model.add(Embedding(max_features,128,input_length=max_len)) # 嵌入层将维数降到128
model.add(Bidirectional(LSTM(64))) # 双向LSTM层
model.add(Dropout(0.5)) # 随机失活
model.add(Dense(1,activation='sigmoid')) # 稠密层 将情感分类0或1
model.compile('adam','binary_crossentropy',metrics=['accuracy']) # 二元交叉熵
print(model.summary()) model.fit(x_train_2,y_train,batch_size=batch_size,epochs=4,validation_split=0.2) # 预测及结果
y_train_predclass = model.predict_classes(x_train_2,batch_size=1000)
y_test_predclass = model.predict_classes(x_test_2,batch_size=1000)
y_train_predclass.shape = y_train.shape
y_test_predclass.shape = y_test.shape
print('\n\nLSTM Bidirectional Sentiment Classification - Train accuracy:',
round(accuracy_score(y_train,y_train_predclass),3))
print('\nLSTM Bidirectional Sentiment Classification of Training data\n',
classification_report(y_train,y_train_predclass))
print('\nLSTM Bidirectional Sentiment Classification - Train Confusion Matrix\n\n',
pd.crosstab(y_train,y_train_predclass,rownames=['Actuall'],colnames=['Predicted']))
print('\nLSTM Bidirectional Sentiment Classification - Test accuracy:',
round(accuracy_score(y_test,y_test_predclass),3))
print('\nLSTM Bidirectional Sentiment Classification of Test data\n',
classification_report(y_test,y_test_predclass))
print('\nLSTM Bidirectional Sentiment Classification - Test Confusion Matrix\n\n',
pd.crosstab(y_test,y_test_predclass,rownames=['Actuall'],colnames=['Predicted']))

输出:

Using TensorFlow backend.
x_train_2.shape: (25000, 300)
x_test_2.shape: (25000, 300)
WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 300, 128) 1920000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128) 98816
_________________________________________________________________
dropout_1 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
=================================================================
Total params: 2,018,945
Trainable params: 2,018,945
Non-trainable params: 0
_________________________________________________________________
None
WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 20000 samples, validate on 5000 samples
Epoch 1/4
2019-07-07 20:03:45.649853: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 64/20000 [..............................] - ETA: 18:21 - loss: 0.6915 - acc: 0.5781
128/20000 [..............................] - ETA: 13:04 - loss: 0.6918 - acc: 0.5938
192/20000 [..............................] - ETA: 11:14 - loss: 0.6915 - acc: 0.5729
256/20000 [..............................] - ETA: 10:19 - loss: 0.6917 - acc: 0.5469
320/20000 [..............................] - ETA: 9:45 - loss: 0.6915 - acc: 0.5469
此处省略一堆epoch的一堆操作 LSTM Bidirectional Sentiment Classification - Train accuracy: 0.955 LSTM Bidirectional Sentiment Classification of Training data
precision recall f1-score support 0 0.96 0.95 0.95 12500
1 0.95 0.96 0.95 12500 accuracy 0.95 25000
macro avg 0.95 0.95 0.95 25000
weighted avg 0.95 0.95 0.95 25000 LSTM Bidirectional Sentiment Classification - Train Confusion Matrix Predicted 0 1
Actuall
0 11928 572
1 561 11939 LSTM Bidirectional Sentiment Classification - Test accuracy: 0.859 LSTM Bidirectional Sentiment Classification of Test data
precision recall f1-score support 0 0.86 0.86 0.86 12500
1 0.86 0.85 0.86 12500 accuracy 0.86 25000
macro avg 0.86 0.86 0.86 25000
weighted avg 0.86 0.86 0.86 25000 LSTM Bidirectional Sentiment Classification - Test Confusion Matrix Predicted 0 1
Actuall
0 10809 1691
1 1829 10671
time============== 2080.618681907654

NLP(十九) 双向LSTM情感分类模型的更多相关文章

  1. NLP学习(2)----文本分类模型

    实战:https://github.com/jiangxinyang227/NLP-Project 一.简介: 1.传统的文本分类方法:[人工特征工程+浅层分类模型] (1)文本预处理: ①(中文) ...

  2. pytorch LSTM情感分类全部代码

    先运行main.py进行文本序列化,再train.py模型训练 dataset.py from torch.utils.data import DataLoader,Dataset import to ...

  3. tensorflow学习笔记(三十九):双向rnn

    tensorflow 双向 rnn 如何在tensorflow中实现双向rnn 单层双向rnn 单层双向rnn (cs224d) tensorflow中已经提供了双向rnn的接口,它就是tf.nn.b ...

  4. Python之路【第二十九篇】:django ORM模型层

    ORM简介 MVC或者MVC框架中包括一个重要的部分,就是ORM,它实现了数据模型与数据库的解耦,即数据模型的设计不需要依赖于特定的数据库,通过简单的配置就可以轻松更换数据库,这极大的减轻了开发人员的 ...

  5. PaddlePaddle︱开发文档中学习情感分类(CNN、LSTM、双向LSTM)、语义角色标注

    PaddlePaddle出教程啦,教程一部分写的很详细,值得学习. 一期涉及新手入门.识别数字.图像分类.词向量.情感分析.语义角色标注.机器翻译.个性化推荐. 二期会有更多的图像内容. 随便,帮国产 ...

  6. [DeeplearningAI笔记]序列模型2.9情感分类

    5.2自然语言处理 觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.9 Sentiment classification 情感分类 情感分类任务简单来说是看一段文本,然后分辨这个人是否喜欢 ...

  7. kaggle——Bag of Words Meets Bags of Popcorn(IMDB电影评论情感分类实践)

    kaggle链接:https://www.kaggle.com/c/word2vec-nlp-tutorial/overview 简介:给出 50,000 IMDB movie reviews,进行0 ...

  8. 基于双向LSTM和迁移学习的seq2seq核心实体识别

    http://spaces.ac.cn/archives/3942/ 暑假期间做了一下百度和西安交大联合举办的核心实体识别竞赛,最终的结果还不错,遂记录一下.模型的效果不是最好的,但是胜在“端到端”, ...

  9. NLP文本情感分类传统模型+深度学习(demo)

    文本情感分类: 文本情感分类(一):传统模型 摘自:http://spaces.ac.cn/index.php/archives/3360/ 测试句子:工信处女干事每月经过下属科室都要亲口交代24口交 ...

随机推荐

  1. rem的基准字体大小的设置

    1.移动端 UI 给的设计稿通常是640px.720px.750px的宽度,但是我们要做适配,兼容不同的终端,rem布局是比较常用的一种方式,比较关键的是确定根节点的字体大小. 这里以640px为例, ...

  2. Jmeter+ant+Jenkins实现接口自动化平台及报告发送

    项目中实现了比较方便的自动化体系,一直没时间总结一下,现抽空整理一番,废话不多说  内容如下: 一.环境准备  jmeter : 编写接口脚本,实现接口测试 ant  :静默执行jmeter脚本,并生 ...

  3. Python flask构建微信小程序订餐系统

    第1章 <Python Flask构建微信小程序订餐系统>课程简介 本章内容会带领大家通览整体架构,功能模块,及学习建议.让大家在一个清晰的开发思路下,进行后续的学习.同时领着大家登陆ht ...

  4. Gridea+GitHub搭建个人博客

    某日闲余时间看到一篇介绍Gridea博客平台的文章,大概看了一下觉得此平台还不错,随即自己进入Gridea官网瞅了瞅.哇,这搭建过程也太简单了吧,比Hexo博客搭建要容易很多,而且还有后台管理客户端, ...

  5. 干货来了!python学习之重难点整理合辑1

    关于装饰器.lambda.鸭子类型.魔法函数的理解仍存有困惑之处,趁周末有时间温故,赶紧去自学了解下相关知识. 1.装饰器是什么: 很多初学者在接触装饰器的时候只做到了肤浅的了解它的概念.组成形态.实 ...

  6. GridView 使用详解

    极力推荐文章:欢迎收藏 Android 干货分享 阅读五分钟,每日十点,和您一起终身学习,这里是程序员Android 本篇文章主要介绍 Android 开发中的部分知识点,通过阅读本篇文章,您将收获以 ...

  7. C# Winform 自定义控件——竖着的Navbar

    效果: 描述: 这是一个可折叠的菜单导航,主要是由panel.picturebox.label完成,界面的颜色用来区分一下各个组合控件,便于调试. 首先,首先是ImageButton: 这个是由Pic ...

  8. .Net异步编程详解入门

    前言 今天周五,早上起床晚了.赶着挤公交上班.但是目前眼前有这么几件事情.刷牙洗脸.泡牛奶.煎蛋.在同步编程眼中.先刷牙洗脸,然后烧水泡牛奶.再煎蛋,最后喝牛奶吃蛋.毫无疑问,在时间紧促的当下.它完了 ...

  9. 关于Java虚拟机运行时数据区域的总结

    Java虚拟机运行时数据区域 程序计数器(Program Counter) 程序计数器作为一个概念模型,这个是用来指示下一条需要执行的字节码指令在哪. Java的多线程实际上是通过线程轮转做到的,如果 ...

  10. Java中使用RestFul接口上传图片到阿里云OSS服务器

    1.接口方法 import java.io.IOException; import javax.servlet.http.HttpServletRequest; import org.springfr ...