【Scikit】实现Multi-label text classification代码模板

Refer to: https://stackoverflow.com/a/10527953

code:

# -*- coding: utf-8 -*-

import numpy as np

from sklearn.pipeline import Pipeline

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.svm import LinearSVC

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.multiclass import OneVsRestClassifier

from sklearn.preprocessing import MultiLabelBinarizer

X_train = np.array(["new york is a hell of a town",

                    "new york was originally dutch",

                    "the big apple is great",

                    "new york is also called the big apple",

                    "nyc is nice",

                    "people abbreviate new york city as nyc",

                    "the capital of great britain is london",

                    "london is in the uk",

                    "london is in england",

                    "london is in great britain",

                    "it rains a lot in london",

                    "london hosts the british museum",

                    "new york is great and so is london",

                    "i like london better than new york"])

y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],

                ["new york"],["london"],["london"],["london"],["london"],

                ["london"],["london"],["new york","london"],["new york","london"]]

X_test = np.array(['nice day in nyc',

                   'welcome to london',

                   'london is rainy',

                   'it is raining in britian',

                   'it is raining in britian and the big apple',

                   'it is raining in britian and nyc',

                   'hello welcome to new york. enjoy it here and london too'])

target_names = ['New York', 'London']

mlb = MultiLabelBinarizer()

Y = mlb.fit_transform(y_train_text)

classifier = Pipeline([

    ('vectorizer', CountVectorizer()),

    ('tfidf', TfidfTransformer()),

    ('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, Y)

predicted = classifier.predict(X_test)

all_labels = mlb.inverse_transform(predicted)

for item, labels in zip(X_test, all_labels):

    print('{0} => {1}'.format(item, ', '.join(labels)))

Output:

nice day in nyc => new york

welcome to london => london

london is rainy => london

it is raining in britian => london

it is raining in britian and the big apple => new york

it is raining in britian and nyc => london, new york

hello welcome to new york. enjoy it here and london too => london, new york

【Scikit】实现Multi-label text classification代码模板的更多相关文章

[Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
论文阅读：《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
论文翻译——Character-level Convolutional Networks for Text Classification
论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...
论文解读（XR-Transformer）Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Paper Information Title:Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text C ...
给label text 上色 && 给textfiled placeholder 上色
1.给label text 上色: NSInteger stringLength = ; stringLength = model.ToUserNickName.length; NSMutableAt ...
[转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
[Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

随机推荐

[Beego模型] 四、使用SQL语句进行查询
[Beego模型] 一.ORM 使用方法 [Beego模型] 二.CRUD 操作 [Beego模型] 三.高级查询 [Beego模型] 四.使用SQL语句进行查询 [Beego模型] 五.构造查询 [ ...
poj--1088--DFS(记忆化搜索之经典)
滑雪 Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 68057 Accepted: 25039 Description ...
ALSA学习资料
一.内核文档 Linux Sound Subsystem Documentation 二.一些API 1.snd_pcm_period_elapsed 2.snd_pcm_lib_buffer_by ...
使用JCOOKIES创建http cookie
jCookies,一个功能强大的操作http cookie的jquery插件,他能够让你存储任何数据类型如:字符串,数组,对象等.它通过JavaScript存储Cookies,然后通过服务器端代码如: ...
kubeadm安装kubernetes V1.11.1 集群
之前测试了离线环境下使用二进制方法安装配置Kubernetes集群的方法,安装的过程中听说 kubeadm 安装配置集群更加方便,因此试着折腾了一下.安装过程中,也有一些坑,相对来说操作上要比二进制方 ...
oracle 常用 sql
判断字段值是否为空( mysql 为 ifnull(,)): nvl (Withinfocode,'') as *** 两字段拼接: (1)concat(t.indate, t.intime) as ...
Java高编译低运行错误(ConcurrentHashMap.keySet)
Java高编译低运行错误(ConcurrentHashMap.keySet) 调了一天: https://www.jianshu.com/p/f4996b1ccf2f
python3用BeautifulSoup抓取a标签
# -*- coding:utf-8 -*- #python 2.7 #XiaoDeng #http://tieba.baidu.com/p/2460150866 from bs4 import Be ...
phpBB3.2 自动检测浏览器语言
这是根据HTTP request header里的Accept-Language信息来处理的. 首先看一下Accept-Language的格式 Accept-Language: <languag ...
在Windows服务器上启用TLS 1.2及TLS 1.2基本原理
在Windows服务器上启用TLS 1.2及TLS 1.2基本原理在Windows服务器上启用TLS 1.2及TLS 1.2基本原理最近由于Chrome40不再支持SSL 3.0了,GOOGLE认 ...

【Scikit】实现Multi-label text classification代码模板

【Scikit】实现Multi-label text classification代码模板的更多相关文章

随机推荐

热门专题