Refer to: https://stackoverflow.com/a/10527953

code:

# -*- coding: utf-8 -*-
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train_text = [["new york"],["new york"],["new york"],["new york"],["new york"],
["new york"],["london"],["london"],["london"],["london"],
["london"],["london"],["new york","london"],["new york","london"]] X_test = np.array(['nice day in nyc',
'welcome to london',
'london is rainy',
'it is raining in britian',
'it is raining in britian and the big apple',
'it is raining in britian and nyc',
'hello welcome to new york. enjoy it here and london too'])
target_names = ['New York', 'London'] mlb = MultiLabelBinarizer()
Y = mlb.fit_transform(y_train_text) classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))]) classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
all_labels = mlb.inverse_transform(predicted) for item, labels in zip(X_test, all_labels):
print('{0} => {1}'.format(item, ', '.join(labels)))

Output:

nice day in nyc => new york
welcome to london => london
london is rainy => london
it is raining in britian => london
it is raining in britian and the big apple => new york
it is raining in britian and nyc => london, new york
hello welcome to new york. enjoy it here and london too => london, new york

【Scikit】实现Multi-label text classification代码模板的更多相关文章

  1. [Bayes] Maximum Likelihood estimates for text classification

    Naïve Bayes Classifier. We will use, specifically, the Bernoulli-Dirichlet model for text classifica ...

  2. 论文阅读:《Bag of Tricks for Efficient Text Classification》

    论文阅读:<Bag of Tricks for Efficient Text Classification> 2018-04-25 11:22:29 卓寿杰_SoulJoy 阅读数 954 ...

  3. 论文翻译——Character-level Convolutional Networks for Text Classification

    论文地址 Abstract Open-text semantic parsers are designed to interpret any statement in natural language ...

  4. 论文解读(XR-Transformer)Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

    Paper Information Title:Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text C ...

  5. 给label text 上色 && 给textfiled placeholder 上色

    1.给label text 上色: NSInteger stringLength = ; stringLength = model.ToUserNickName.length; NSMutableAt ...

  6. [转] Implementing a CNN for Text Classification in TensorFlow

    Github上的一个开源项目,文档讲得极清晰 Github - https://github.com/dennybritz/cnn-text-classification-tf 原文- http:// ...

  7. [Tensorflow] RNN - 04. Work with CNN for Text Classification

    Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...

  8. Implementing a CNN for Text Classification in TensorFlow

    参考: 1.Understanding Convolutional Neural Networks for NLP 2.Implementing a CNN for Text Classificati ...

  9. 论文列表——text classification

    https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

随机推荐

  1. 使用uploadify上传图片时返回“Cannot read property 'queueData' of undefined”

    在使用uploadify插件上传图片时,遇到一个比较坑的错误:上传时提示“Cannot read property 'queueData' of undefined”. 遇到这个问题有点无语,因为这个 ...

  2. Ubuntu GNOME 13.04将关闭窗口的按钮放在最右边

    转载请注明:转自http://blog.csdn.net/u010811449/article/details/9426187 先上图: 首先打开dconf系统配置编译器. 找到 rog -> ...

  3. C#编程(八十)---------- 异常类

    异常类 在C#里,异常处理就是C#为处理错误情况提供的一种机制.它为每种错误情况提供了定制的处理方式,并且把标志错误的代码预处理错误的代码分离开来. 对.net类来说,一般的异常类System.Exc ...

  4. 你现在还在手动生成set,get方法吗?使用lombok

    JAVA面向对象编程中的封闭性和安全性.封闭性即对类中的域变量进行封闭操作,即用private来修饰他们,如此一来其他类则不能对该变量访问.这样我们就将这些变量封闭在了类内部,这样就提高了数据的安全性 ...

  5. [Mockito] Spring Unit Testing with Mockito

    It is recommened to write unit testing with Mockito in Spring framework, because it is much faster w ...

  6. hadoop from rookie to ninja - 1. Basic Architecture(基础架构)

    1. Daemons(守护进程) 新老架构 老的: Apache Hadoop 1.x (MRv1)   新的: Apache Hadoop 2.x (YARN)-Yet Another Resour ...

  7. numpy 中的 broadcasting 理解

    broadcast 是 numpy 中 array 的一个重要操作. 首先,broadcast 只适用于加减. 然后,broadcast 执行的时候,如果两个 array 的 shape 不一样,会先 ...

  8. 源码版本管理工具 :TFS GIT

    至于svn  ..忽略不计了... 集中式代码管理 CVCS 模式:TFS 分布式代码管理 DVCS 模式:git 两者比较大的差别:tfs 只有一个中央仓储,其他副本都要与中央仓储进行更新.git  ...

  9. Sort_Buffer_Size 设置对服务器性能的影响

    基础知识: 1. Sort_Buffer_Size 是一个connection级参数,在每个connection第一次需要使用这个buffer的时候,一次性分配设置的内存.2. Sort_Buffer ...

  10. java.lang.IllegalStateException——好头疼

    在我东,下下来一个项目总会出现启动不了的问题,这些问题往往在编译的时候发现不了,当你的服务器启动的时候,就是一片片的报错,有些问题可以通过异常的提示信息,判断出来哪里配置错了,但是也有些情况下,从异常 ...