Text Classification
Text Classification
For purpose of word embedding extrinsic evaluation, especially downstream task.
Some concepts are informed from 复旦大学NLP组
Statistical-Based Method
Logistic Regression
Statistics perspective based text classification described as follow[Li Y 2015].
We use Tencent news titles as our text classification dataset. A total of 8,826 titles of four categories (society, entertainment, healthcare, and military) are extracted. The lengths of titles range from 10 to 20 words. We train ℓ2-regularized logistic regression classifiers using the LIBLINEAR package (Fan et al, 2008) with the learned embeddings.
Also described as follow[kiros 2015].
On all datasets, we simply extract skip-thought vectors and train a logistic regression classifier on top.
[Yan Song 2018] also applied this kind of method.
This document classification experiment is performed in a conventional way as that in previous studies [Kiela et al., 2015; Kiros et al., 2015]. For all the documents in training and test datasets, we first construct document level representations by averaging the embeddings from all words in a given document. A logistic regression classifier is then trained on top of the resulted document level representations on the training set and evaluated on the test set.
Linear SVM
It described as follow[Kiela 2015]
we first construct document-level representations by summing the vector representations for all words in a given document. After setting aside a small development set for tuning the hyperparameters of the supervised algorithm, we train a support vector machine (SVM) classifier with a linear kernel and evaluate document topic classification accuracy using ten-fold cross-validation.
Bibliography
复旦大学NLP组. NLP-Beginner. https://github.com/FudanNLP/nlp-beginner
[Li Y. 2015] Li Y, Li W, Sun F, et al. Component-Enhanced Chinese Character Embeddings[J]. empirical methods in natural language processing, 2015: 829-834.
[Kiros 2015] Kiros, Ryan, et al. "Skip-Thought Vectors." Advances in Neural Information Processing Systems 28(2015).
[Yan Song 2018] Song, Yan et al. “Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks.” IJCAI (2018).
[Kiela 2015] Kiela, Douwe et al. “Specializing Word Embeddings for Similarity or Relatedness.” EMNLP (2015).
Text Classification的更多相关文章
- [转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
- [Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
- Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
- 论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...
- CNN tensorflow text classification CNN文本分类的例子
from:http://deeplearning.lipingyang.org/tensorflow-examples-text/ TensorFlow examples (text-based) T ...
- 将迁移学习用于文本分类 《 Universal Language Model Fine-tuning for Text Classification》
将迁移学习用于文本分类 < Universal Language Model Fine-tuning for Text Classification> 2018-07-27 20:07:4 ...
- [Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
- #论文阅读# Universial language model fine-tuing for text classification
论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...
- 论文阅读:《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
随机推荐
- 吴恩达深度学习:2.15python中的广播
1.Broadcasting example (1)下面矩阵描述了来自四种不同的100克碳水化合物,蛋白质和脂肪的卡路里数量 比如说100g苹果所含的热量有56克来自碳水化合物,相比之下来自蛋白质和脂 ...
- python-装饰器2
python-装饰器2 1.函数既“变量 def bar(): print("in the bar") def foo(): print("in the foo" ...
- AIX 逻辑卷简介
1.基本概念 LVM的组成:物理卷PV.卷组VG.逻辑卷LV.物理分区PP.逻辑分区LP.文件系统等 物理卷:物理卷表示AIX可以识别的物理磁盘(hdisk*),一个物理卷指一块硬盘.可以是内部的 ...
- Big Data(七)MapReduce计算框架
二.计算向数据移动如何实现? Hadoop1.x(已经淘汰): hdfs暴露数据的位置 1)资源管理 2)任务调度 角色:JobTracker&TaskTracker JobTracker: ...
- jquery 保留两位小数
jquery 通过 toFixed 保留两位小数 <script> var num = 12.21654; console.log(num.toFixed(2)) </script& ...
- [uboot] (第一章)uboot流程——概述(转)
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明.本文链接:https://blog.csdn.net/ooonebook/article/det ...
- uwsgi 的启动、停止、重启
## 一.概念释义### WSGI WSGI 是一个Web服务器(如nginx)与应用服务器(如uWSGI)通信的一种规范(协议).官方定义是,the Python Web Server Gatewa ...
- 【leetcode】1191. K-Concatenation Maximum Sum
题目如下: Given an integer array arr and an integer k, modify the array by repeating it k times. For exa ...
- Haproxy-4层和7层代理负载实战
目录 HAProxy是什么 HAProxy的核心能力和关键特性 HAProxy的核心功能 HAProxy的关键特性 HAProxy的安装和运行 安装 运行 添加日志 使用HAProxy搭建L7负载均衡 ...
- html abbr标签 语法
html abbr标签 语法 作用:标记一个缩写 大理石平台 说明:<abbr> 标签指示简称或缩写,比如 "WWW" 或 "NATO".通过对缩写 ...