【Scikit】实现Multi-label text classification代码模板

Refer to: https://stackoverflow.com/a/10527953

code:

# -*- coding: utf-8 -*-

import numpy as np

from sklearn.pipeline import Pipeline

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.svm import LinearSVC

from sklearn.feature_extraction.text import TfidfTransformer

from sklearn.multiclass import OneVsRestClassifier

from sklearn.preprocessing import MultiLabelBinarizer

X_train = np.array(["new york is a hell of a town",

                    "new york was originally dutch",

                    "the big apple is great",

                    "new york is also called the big apple",

                    "nyc is nice",

                    "people abbreviate new york city as nyc",

                    "the capital of great britain is london",

                    "london is in the uk",

                    "london is in england",

                    "london is in great britain",

                    "it rains a lot in london",

                    "london hosts the british museum",

                    "new york is great and so is london",

                    "i like london better than new york"])

y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],

                ["new york"],["london"],["london"],["london"],["london"],

                ["london"],["london"],["new york","london"],["new york","london"]]

X_test = np.array(['nice day in nyc',

                   'welcome to london',

                   'london is rainy',

                   'it is raining in britian',

                   'it is raining in britian and the big apple',

                   'it is raining in britian and nyc',

                   'hello welcome to new york. enjoy it here and london too'])

target_names = ['New York', 'London']

mlb = MultiLabelBinarizer()

Y = mlb.fit_transform(y_train_text)

classifier = Pipeline([

    ('vectorizer', CountVectorizer()),

    ('tfidf', TfidfTransformer()),

    ('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, Y)

predicted = classifier.predict(X_test)

all_labels = mlb.inverse_transform(predicted)

for item, labels in zip(X_test, all_labels):

    print('{0} => {1}'.format(item, ', '.join(labels)))

Output:

nice day in nyc => new york

welcome to london => london

london is rainy => london

it is raining in britian => london

it is raining in britian and the big apple => new york

it is raining in britian and nyc => london, new york

hello welcome to new york. enjoy it here and london too => london, new york

【Scikit】实现Multi-label text classification代码模板的更多相关文章

[Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
论文阅读：《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
论文翻译——Character-level Convolutional Networks for Text Classification
论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...
论文解读（XR-Transformer）Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Paper Information Title:Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text C ...
给label text 上色 && 给textfiled placeholder 上色
1.给label text 上色: NSInteger stringLength = ; stringLength = model.ToUserNickName.length; NSMutableAt ...
[转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
[Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

随机推荐

Drawing line on a click on ZedGraph Pane
https://stackoverflow.com/questions/12422398/drawing-line-on-a-click-on-zedgraph-pane public Form1() ...
针对UDP丢包问题，进行系统层面和程序层面调优
转自:https://blog.csdn.net/xingzheouc/article/details/49946191 1. UDP概念用户数据报协议(英语:User Datagram Proto ...
App架构师实践指南五之性能优化二
App架构师实践指南五之性能优化二 2018年07月30日 13:08:44 nicolelili1 阅读数:214 从UI和CPU方面来说App流畅体验优化,核心为流畅度/卡顿性能优化. 1.基 ...
JS Range使用整理
1.获取用户网页选中内容 <p>4月13日消息,据台湾媒体报道,32岁的孙燕姿(Sng Ee Tze)和后天将满34岁的荷兰籍印度尼西亚男友纳迪姆(Nadim Van Der Ros)交往 ...
发布库到仓库 maven jcenter JitPack MD
Markdown版本笔记我的GitHub首页我的博客我的微信我的邮箱 MyAndroidBlogs baiqiantao baiqiantao bqt20094 baiqiantao@sina ...
centos7安装postgres-10
目录安装下载yum repo 安装server和客户端初始化db 启动Postgres 设置开机启动修改data目录停止服务迁移data目录重启连接测试修改允许远程其他IP连接前一 ...
IE10、IE11使用 __doPostBack 出现未定义问题
在公司的老项目中分页控件使用了 __doPostBack 方式,在IE兼容模式下正常,在IE10.IE11中 __doPostBack 出现未定义问题. 百度查阅资料得知,这是微软NET环境下的一个B ...
C# Chart使用总结 1 ---------关于图表数据的来源
关于图表数据的来源: 1.通过XValueMember YValueMembers 设置 OleDbConnection conn = new OleDbConnection(connStr); Ol ...
llvm pass
https://polly.llvm.org/docs/Architecture.html#polly-in-the-llvm-pass-pipeline
VTK拾取网格模型上的可见点
消隐与Z-Buffer 使用缓冲器记录物体表面在屏幕上投影所覆盖范围内的全部像素的深度值,依次访问屏幕范围内物体表面所覆盖的每一像素,用深度小(深度用z值表示,z值小表示离视点近)的像素点颜色替代深度 ...

【Scikit】实现Multi-label text classification代码模板

【Scikit】实现Multi-label text classification代码模板的更多相关文章

随机推荐

热门专题