Text Classification
Text Classification
For purpose of word embedding extrinsic evaluation, especially downstream task.
Some concepts are informed from 复旦大学NLP组
Statistical-Based Method
Logistic Regression
Statistics perspective based text classification described as follow[Li Y 2015].
We use Tencent news titles as our text classification dataset. A total of 8,826 titles of four categories (society, entertainment, healthcare, and military) are extracted. The lengths of titles range from 10 to 20 words. We train ℓ2-regularized logistic regression classifiers using the LIBLINEAR package (Fan et al, 2008) with the learned embeddings.
Also described as follow[kiros 2015].
On all datasets, we simply extract skip-thought vectors and train a logistic regression classifier on top.
[Yan Song 2018] also applied this kind of method.
This document classification experiment is performed in a conventional way as that in previous studies [Kiela et al., 2015; Kiros et al., 2015]. For all the documents in training and test datasets, we first construct document level representations by averaging the embeddings from all words in a given document. A logistic regression classifier is then trained on top of the resulted document level representations on the training set and evaluated on the test set.
Linear SVM
It described as follow[Kiela 2015]
we first construct document-level representations by summing the vector representations for all words in a given document. After setting aside a small development set for tuning the hyperparameters of the supervised algorithm, we train a support vector machine (SVM) classifier with a linear kernel and evaluate document topic classification accuracy using ten-fold cross-validation.
Bibliography
复旦大学NLP组. NLP-Beginner. https://github.com/FudanNLP/nlp-beginner
[Li Y. 2015] Li Y, Li W, Sun F, et al. Component-Enhanced Chinese Character Embeddings[J]. empirical methods in natural language processing, 2015: 829-834.
[Kiros 2015] Kiros, Ryan, et al. "Skip-Thought Vectors." Advances in Neural Information Processing Systems 28(2015).
[Yan Song 2018] Song, Yan et al. “Joint Learning Embeddings for Chinese Words and their Components via Ladder Structured Networks.” IJCAI (2018).
[Kiela 2015] Kiela, Douwe et al. “Specializing Word Embeddings for Similarity or Relatedness.” EMNLP (2015).
Text Classification的更多相关文章
- [转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
- [Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
- Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
- 论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...
- CNN tensorflow text classification CNN文本分类的例子
from:http://deeplearning.lipingyang.org/tensorflow-examples-text/ TensorFlow examples (text-based) T ...
- 将迁移学习用于文本分类 《 Universal Language Model Fine-tuning for Text Classification》
将迁移学习用于文本分类 < Universal Language Model Fine-tuning for Text Classification> 2018-07-27 20:07:4 ...
- [Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
- #论文阅读# Universial language model fine-tuing for text classification
论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...
- 论文阅读:《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
随机推荐
- springmvc中的视图模型的返回方式
way1:略过; way2:(神似way1)通过在方法的参数中添加一个Model类型的参数,,该参数由spring自动生成传入, 然后在方法内部使用addAttribute()方式添加模型数据, 最后 ...
- ES6拷贝方法
ES6 中对象拷贝方法: 方法一: Object.assign() // 对象浅拷贝, 复制所有可枚举属性 const obj1 = {a: 1}; const obj2 = {b: 2}; // c ...
- 利用ARouter实现组件间通信,解决子模块调用主模块问题
如果你还没使用过ARouter请你按照这篇下面博客尝试使用下然后再往下看组件通信的内容(不然的话可能会懵逼)Android Studio接入ARouter以及简单使用 如果你使用过ARouter请继续 ...
- Spring基础20——AOP基础
1.什么是AOP AOP(Aspect-Oriented Programming)即面向切面编程,是一种新的方法论,是对那个传统OOP面向对象编程的补充.AOP的主要编程对象是切面(aspect),而 ...
- linux命令详解——tee
tee 重定向输出到多个文件 在执行Linux命令时,我们既想把输出保存到文件中,又想在屏幕上看到输出内容,就可以使用tee命令 要注意的是:在使用管道线时,前一个命令的标准错误输出不会被tee读取. ...
- Linux下关闭Tomcat
正常关闭操作 进入tomcat bin目录,执行 ./shutdown.sh 但是有时会失败 此时通过kill命令关闭 首先输入 ps -ef|grep tomcat 在列出的tomcat中,找到该t ...
- KVM虚拟化简介及安装
kvm是基于图形化的linux操作的 安装图形化界面的知识点: 磁盘空间有两个词: 精简置备:我先在我系统里面去声明我要一个50G的空间,但是呢,我不会把50G都分给你,你用多少,我分给你多少,但是做 ...
- C语言——枚举类型用法
1.枚举的定义 enum 枚举名{ 枚举元 素1,枚举元素2,枚举元素3...}: 2.使用枚举类型的好处 增加程序的可读性,我们都知道在计算机中所有信息都是用二进制来表示的,如果你用二进制来表示某件 ...
- C++ 一周刷完C++基础课程(同C程序进行比较)
**参考bilibili视频av29504365** ### 一段简单的程序Hello World```#include <iostream>using namespace std;int ...
- 微信获取token -1000
最终翻看微信开发api找到需要去配置IP白名单.只需要配置访问来源IP即可.