Sentiment analysis in nlp
Sentiment analysis in nlp
The goal of the program is to analysis the article title is Sarcasm or not, i use tensorflow 2.5 to solve this problem.
Dataset download url: https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection/home
a sample of the dataset:
{
"article_link": "https://www.huffingtonpost.com/entry/versace-black-code_us_5861fbefe4b0de3a08f600d5",
"headline": "former versace store clerk sues over secret 'black code' for minority shoppers",
"is_sarcastic": 0
}
we want to depend on headline to predict the is_sarcastic, 1 means True,0 means False.
preprocessing
use pandas to read json file.
import pandas as pd
# lines = True means headle the json for each line
df = pd.read_json("Sarcasm_Headlines_Dataset_v2.json" ,lines="True")
df
'''
is_sarcastic headline article_link
0 1 thirtysomething sci... https://www.theonion.co...
1 0 dem rep. totally ... https://www.huffingtonpos..
'''build list for each column
labels = []
sentences = []
urls = []
# a tips for convert series to list
'''
type(df['is_sarcastic'])
# Series
type(df['is_sarcastic'].values)
# ndarray
type(df['is_sarcastic'].values.tolist())
# list
'''
labels = df['is_sarcastic'].values.tolist()
sentences = df['headline'].values.tolist()
urls = df['article_link'].values.tolist()
len(labels) # 28619
len(sentences) # 28619split dataset into train set and test set
# train size is the 2/3 of the all dataset.
train_size = int(len(labels) / 3 * 2)
train_sentences = sentences[0: train_size]
test_sentences = sentences[train_size:]
train_y = labels[0:train_size]
test_y = labels[train_size:]init some parameter
# some parameter
vocab_size = 10000
# input layer to embedding
embedding_dim = 16
# each input sentence length
max_length = 100
# padding method
trunc_type='post'
padding_type='post'
# token the unfamiliar word
oov_tok = "<OOV>"preprocessing on train set and test set
# processing on train set and test set
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(oov_token = oov_tok)
tokenizer.fit_on_texts(train_sentences)
train_X = tokenizer.texts_to_sequences(train_sentences)
# padding the data
train_X = pad_sequences(train_X,
maxlen = max_length,
truncating = trunc_type,
padding = padding_type)
train_X[:2]
# convery the list to nparray
train_y = np.array(train_y)
# same operator to test set
test_X = tokenizer.texts_to_sequences(test_sentences)
test_X = pad_sequences(test_X ,
maxlen = max_length,
truncating = trunc_type,
padding = padding_type)
test_y = np.array(test_y)
build the model
some important functions and args:
tf.keras.layers.Dense # Dense
implements the operation:output = activation(dot(input, kernel) + bias) , a NN layeractivation # Activation function to use. If you don't specify anything, no activation is applied (ie. "linear" activation:
a(x) = x).use_bias # Boolean, whether the layer uses a bias vector.
tf.keras.Sequential # contain a linear stack of layer into a
tf.keras.Model.tf.keras.Model # to train and predict
config the model with losses and metrics with
model.compile(args)optimizer
some args
AdamRMSpropSGDAdagrad
loss # The loss value that will be minimized by the model will then be the sum of all individual losses.
metrices # List of metrics to be evaluated by the model during training and testing.
train the model with
model.fit(x=None,y=None)batch_size # Number of samples per gradient update. If unspecified,
batch_sizewill default to 32.epochs # Number of epochs to train the model
verbose # Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch,verbose=2 is recommended when not running interactively
validation_data #( valid_X, valid_y )
tf.keras.layers.Embedding # Turns positive integers (indexes) into dense vectors of fixed size. as shown in following figure

the purpose of the embedding is making the 1-dim integer proceed the muti-dim vectors add. can find the hide feature and connect to predict the labels. in this program ,every word's emotion direction can be trained many times.
tf.keras.layer.GlobalAveragePooling1D # add all muti-dim vectors ,if the output layer shape is (32, 10, 64), after the pooling, the shape will be changed as (32,64), as shown in following figure
-
code is more simple then theory
# build the model
model = tf.keras.Sequential(
[
# make a word became a 64-dim vector
tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
# add all word vector
tf.keras.layers.GlobalAveragePooling1D(),
# NN
tf.keras.layers.Dense(24, activation = 'relu'),
tf.keras.layers.Dense(1, activation = 'sigmoid')
]
)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam' , metrics = ['accuracy'])
train the model
num_epochs = 30
history = model.fit(train_X, train_y, epochs = num_epochs,
validation_data = (test_X, test_y),
verbose = 2)
after the 30 epochs
Epoch 30/30
597/597 - 8s - loss: 1.8816e-04 - accuracy: 1.0000 - val_loss: 1.2858 - val_accuracy: 0.8216
predict our sentence
mytest_sentence = ["you are so cute", "you are so cute but looks like stupid"]
mytest_X = tokenizer.texts_to_sequences(mytest_sentence)
mytest_X = pad_sequences(mytest_X ,
maxlen = max_length,
truncating = trunc_type,
padding = padding_type)
mytest_y = model.predict(mytest_X)
# if result is bigger then 0.5 ,it means the title is Sarcasm
print(mytest_y > 0.5)
'''
[[False]
[ True]]
'''
reference:
tensorflow API: https://www.tensorflow.org/api_docs/python/tf/keras/Sequential
colab: bit.ly/tfw-sarcembed
Sentiment analysis in nlp的更多相关文章
- Sentiment Analysis resources
Wikipedia: Sentiment analysis (also known as opinion mining) refers to the use of natural language p ...
- NAACL 2013 Paper Mining User Relations from Online Discussions using Sentiment Analysis and PMF
中文简单介绍:本文对怎样基于情感分析和概率矩阵分解从网络论坛讨论中挖掘用户关系进行了深入研究. 论文出处:NAACL'13. 英文摘要: Advances in sentiment analysis ...
- 【Deep Learning Nanodegree Foundation笔记】第 10 课:Sentiment Analysis with Andrew Trask
In this lesson, Andrew Trask, the author of Grokking Deep Learning, will walk you through using neur ...
- 论文阅读:Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
论文标题:Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis 论文链接:http://arxi ...
- 使用RNN进行imdb影评情感识别--use RNN to sentiment analysis
原创帖子,转载请说明出处 一.RNN神经网络结构 RNN隐藏层神经元的连接方式和普通神经网路的连接方式有一个非常明显的区别,就是同一层的神经元的输出也成为了这一层神经元的输入.当然同一时刻的输出是不可 ...
- Deep Learning for NLP 文章列举
Deep Learning for NLP 文章列举 原文链接:http://www.xperseverance.net/blogs/2013/07/2124/ 大部分文章来自: http://w ...
- 转 Deep Learning for NLP 文章列举
原文链接:http://www.xperseverance.net/blogs/2013/07/2124/ 大部分文章来自: http://www.socher.org/ http://deepl ...
- Standford CoreNLP--Sentiment Analysis初探
Stanford CoreNLP功能之一是Sentiment Analysis(情感分析),可以标识出语句的正面或者负面情绪,包括:Positive,Neutral,Negative三个值. 运行有两 ...
- Java自然语言处理NLP工具包
1. Java自然语言处理 LingPipe LingPipe是一个自然语言处理的Java开源工具包.LingPipe目前已有很丰富的功能,包括主题分类(Top Classification).命名实 ...
随机推荐
- VS Code失焦时自动保存编辑器内容
vs code有一个非常好用的功能:就是自动保存. 而且不需要安装什么插件,只需要在编辑器设置就可以了.接下来我们一起来设置吧: 1.打开我们的vs code编辑器.在左下角有个 齿轮图标(管理), ...
- gin框架使用【3.路由参数】
GET url: http://127.0.0.1:8080/users/{id} http://127.0.0.1:8080/users/1 对于id值的获取 package main impo ...
- SSL及GMVPN握手协议详解
之前写过一篇文章搞懂密码学基础及SSL/TLS协议,主要介绍了加密学的基础,并从整体上对SSL协议做了介绍.由于篇幅原因,SSL握手的详细流程没有深入介绍.本文将拆解握手流程,在消息级别对握手进行详细 ...
- Source Generator实战
前言 最近刷B站的时候浏览到了老杨的关于Source Generator的简介视频.其实当初.Net 6刚发布时候看到过微软介绍这个东西,但并没有在意.因为粗看觉得这东西限制蛮多的,毕竟C#是强类型语 ...
- 使用NFS作为Glance存储后端
NFS服务介绍 NFS网络文件系统提供了一种在类UNIX系统上共享文件的方法.目前NFS有3个版本:NFSv2.NFSv3.NFSv4.CentOS7默认使用NFSv4提供服务,优点是提供了有状态的连 ...
- Redis源码漂流记(二)-搭建Redis调试环境
Redis源码漂流记(二)-搭建Redis调试环境 一.目标 搭建Redis调试环境 简要理解Redis命令运转流程 二.前提 1.有一些c知识简单基础(变量命名.常用数据类型.指针等) 可以参考这篇 ...
- 尾递归与 memorize 优化
尾递归与 memorize 优化 本文写于 2020 年 12 月 10 日 递归 递归是一种非常常见的算法思维,在大家刚开始学编程的时候应该就会接触到. 我们可以这么理解递归: function 讲 ...
- (数据科学学习手札136)Python中基于joblib实现极简并行计算加速
本文示例代码及文件已上传至我的Github仓库https://github.com/CNFeffery/DataScienceStudyNotes 1 简介 我们在日常使用Python进行各种数据计算 ...
- drools规则属性(rule attributes)的使用
一.介绍 规则属性是您可以添加到业务规则以修改规则行为的附加规范. 在 DRL 文件中,您通常在规则条件和操作的上方定义规则属性,多个属性位于单独的行中,格式如下: rule "rule_n ...
- 23. Merge k Sorted Lists - LeetCode
Question 23. Merge k Sorted Lists Solution 题目大意:合并链表数组(每个链表中的元素是有序的),要求合并后的链表也是有序的 思路:遍历链表数组,每次取最小节点 ...
