【Scikit】实现Multi-label text classification代码模板
Refer to: https://stackoverflow.com/a/10527953
code:
# -*- coding: utf-8 -*-
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],
["new york"],["london"],["london"],["london"],["london"],
["london"],["london"],["new york","london"],["new york","london"]] X_test = np.array(['nice day in nyc',
'welcome to london',
'london is rainy',
'it is raining in britian',
'it is raining in britian and the big apple',
'it is raining in britian and nyc',
'hello welcome to new york. enjoy it here and london too'])
target_names = ['New York', 'London'] mlb = MultiLabelBinarizer()
Y = mlb.fit_transform(y_train_text) classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))]) classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
all_labels = mlb.inverse_transform(predicted) for item, labels in zip(X_test, all_labels):
print('{0} => {1}'.format(item, ', '.join(labels)))
Output:
nice day in nyc => new york
welcome to london => london
london is rainy => london
it is raining in britian => london
it is raining in britian and the big apple => new york
it is raining in britian and nyc => london, new york
hello welcome to new york. enjoy it here and london too => london, new york
【Scikit】实现Multi-label text classification代码模板的更多相关文章
- [Bayes] Maximum Likelihood estimates for text classification
Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...
- 论文阅读:《Bag of Tricks for Efficient Text Classification》
论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...
- 论文翻译——Character-level Convolutional Networks for Text Classification
论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...
- 论文解读(XR-Transformer)Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification
Paper Information Title:Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text C ...
- 给label text 上色 && 给textfiled placeholder 上色
1.给label text 上色: NSInteger stringLength = ; stringLength = model.ToUserNickName.length; NSMutableAt ...
- [转] Implementing a CNN for Text Classification in TensorFlow
Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...
- [Tensorflow] RNN - 04. Work with CNN for Text Classification
Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...
- Implementing a CNN for Text Classification in TensorFlow
参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...
- 论文列表——text classification
https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...
随机推荐
- Java和.Net在做BS结构项目的比较
渊源: Java的J2EE在1999年形成了其成熟的架构,并且到今天已经有相当成熟的经过检验的企业应用系统.而.Net究其渊源是源自微软以前开发企业应用程序的平台DNA(DistributedNetw ...
- AngularJS中的ng-controller是什么东东
在AngularJS中,ng-controller是最常用的directive.比如: var app = angular.module("app",[]); app.contro ...
- jquery预加载的几种方式
实际编写前端页面时,有时候希望一打开某个页面就加载一些方法.下面是4种预加载方法. ①页面加载完之前执行,与嵌入的js加载方式一样(写jquery插件的时候使用) (function ($) { al ...
- YUV 4:2:0 格式和YUV411格式区别
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/coloriy/article/details/6668447 MPEG 储存的 YU(Cb)V(Cr ...
- Linux及Arm-Linux程序开发笔记(零基础入门篇)
Linux及Arm-Linux程序开发笔记(零基础入门篇) 作者:一点一滴的Beer http://beer.cnblogs.com/ 本文地址:http://www.cnblogs.com/bee ...
- Windows上的git、github部署及基本使用方法
1.介绍 Git是一个开源的分布式版本控制系统,可以有效.高速地处理从很小到非常大的项目版本管理.Git 是 Linus Torvalds 为了帮助管理 Linux 内核开发而开发的一个开放源码的版本 ...
- RPC框架-hessian学习
先说说hessian有什么优点和缺点 一.优点: 比 Java 原生的对象序列化/反序列化速度更快, 序列化出来以后的数据更小.序列化协议跟应用层协议无关, 可以将 Hessian 序列化以后的数据放 ...
- redis 频率限制
方式1: $redis = new Redis(); //以自然时间控制 一自然分钟内超过100次进行限制, 屏蔽多久的时间必须为计数key时间的倍数 $key = 'xxxx'.date('Y-m- ...
- 对span设置鼠标光标样式
<html> <body> <p>请把鼠标移动到单词上,可以看到鼠标指针发生变化:</p> <span style="cursor:au ...
- 阿里云centos7.x 打开80端口(转)
本文转自:https://blog.csdn.net/tengqingyong/article/details/82805053 一 :阿里云centos7.x用iptables打开80端口 1.安装 ...