Text Classification
Text Classification
For purpose of word embedding extrinsic evaluation, especially downstream task.
Some concepts are informed from 复旦大学NLP组
Statistical-Based Method
Logistic Regression
Statistics perspective based text classification described as follow[Li Y 2015].
We use Tencent news titles as our text classification dataset. A total of 8,826 titles of four categories (society, entertainment, healthcare, and military) are extracted. The lengths of titles range from 10 to 20 words. We train ℓ2-regularized logistic regression classifiers using the LIBLINEAR package (Fan et al, 2008) with the learned embeddings.
Also described as follow[kiros 2015].
On all datasets, we simply extract skip-thought vectors and train a logistic regression classifier on top.
[Yan Song 2018] also applied this kind of method.
This document classification experiment is performed in a conventional way as that in previous studies [Kiela et al., 2015; Kiros et al., 2015]. For all the documents in training and test datasets, we first construct document level representations by averaging the embeddings from all words in a given document. A logistic regression classifier is then trained on top of the resulted document level representations on the training set and evaluated on the test set.
Linear SVM
It described as follow[Kiela 2015]
we first construct document-level representations by summing the vector representations for all words in a given document. After setting aside a small development set for tuning the hyperparameters of the supervised algorithm, we train a support vector machine (SVM) classifier with a linear kernel and evaluate document topic classification accuracy using ten-fold cross-validation.
Bibliography
复旦大学NLP组. NLP-Beginner. https://github.com/FudanNLP/nlp-beginner
[Li Y. 2015] Li Y, Li W, Sun F, et al. Component-Enhanced Chinese Character Embeddings[J]. empirical methods in natural language processing, 2015: 829-834.
[Kiros 2015] Kiros, Ryan, et al. "Skip-Thought Vectors." Advances in Neural Information Processing Systems 28(2015).
[Yan Song 2018] Song, Yan et al. “Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks.” IJCAI (2018).
[Kiela 2015] Kiela, Douwe et al. “Specializing Word Embeddings for Similarity or Relatedness.” EMNLP (2015).
Text Classification的更多相关文章
- [转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
- [Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
- Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
- 论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...
- CNN tensorflow text classification CNN文本分类的例子
from:http://deeplearning.lipingyang.org/tensorflow-examples-text/ TensorFlow examples (text-based) T ...
- 将迁移学习用于文本分类 《 Universal Language Model Fine-tuning for Text Classification》
将迁移学习用于文本分类 < Universal Language Model Fine-tuning for Text Classification> 2018-07-27 20:07:4 ...
- [Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
- #论文阅读# Universial language model fine-tuing for text classification
论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...
- 论文阅读:《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
随机推荐
- luogu P5342 [TJOI2019]甲苯先生的线段树
传送门 你个好好的省选怎么可以出CF原题啊,你们这个题害人不浅啊,这样子出题像极了cxk,说到cxk,我又想起了他是NBA形象大使,跟我是西游文化大使一样一样的,今年下半年... 别说了,jinsai ...
- tp5+layui实现分页
layui和thinkphp5自己在百度上下载 html代码 <!DOCTYPE html> <html> <head> <meta charset=&quo ...
- vue elementui table组件内容换行
解决方案 tableData = [ { "name": "domain111", "metric": [ "平均耗时" ...
- Flask开发系列之Flask+redis实现IP代理池
Flask开发系列之Flask+redis实现IP代理池 代理池的要求 多站抓取,异步检测:多站抓取:指的是我们需要从各大免费的ip代理网站,把他们公开的一些免费代理抓取下来:一步检测指的是:把这些代 ...
- MathType 7.4.2.480
目录 1. 相关推荐 2. 按 3. 软件介绍 4. 安装步骤 5. 使用说明 6. 下载地址 1. 相关推荐 推荐使用:AxMath(AxMath可以与LaTeX进行交互):https://blog ...
- linux基础命令--lsof
lsof(list open files)作用: 是一个列出当前系统打开文件的工具. 注: 在终端下输入lsof即可显示系统打开的文件,因为 lsof 需要访问核心内存和各种文件,所以必须以 root ...
- Mysql定时器定时删除表数据
由于测试环境有张日志表没定时2分钟程序就狂插数据,导致不到1一个月时间,这张日志表就占用了6.7G的空间,但是日志刷新较快,有些日志就没什么作用,就写了个定时器,定期删除这张表的数据 首先先查看mys ...
- java课堂作业4
第一题 字符串加密问题 1.程序设计思想 读入字符串,然后获取其长度,利用charAt()获取每个位置字符并且对字符加3实现加密处理,并存入新字符串中.如果遇到xyz则减26存入. 2.程序流程图 3 ...
- HDU-6669-Game(模拟,贪心)
链接: https://vjudge.net/problem/HDU-6669 题意: 度度熊在玩一个好玩的游戏. 游戏的主人公站在一根数轴上,他可以在数轴上任意移动,对于每次移动,他可以选择往左或往 ...
- oracle基本语句(第七章、数据库逻辑对象管理)
索引.实体化视图.簇.散列簇.序列.同义词 1.创建表 CREATE TABLE <表名>(<列名1> <数据类型>,……); CREATE GLOBAL TEMP ...