分类的性能评估：准确率、精确率、Recall召回率、F1、F2

import numpy as np

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.linear_model.logistic import LogisticRegression

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.metrics import roc_curve, auc

import matplotlib.pyplot as plt

df = pd.read_csv('./sms.csv')

X_train_raw, X_test_raw, y_train, y_test = train_test_split(df['message'], df['label'], random_state=11)

vectorizer = TfidfVectorizer()

X_train = vectorizer.fit_transform(X_train_raw)

X_test = vectorizer.transform(X_test_raw)

classifier = LogisticRegression()

classifier.fit(X_train, y_train)

scores = cross_val_score(classifier, X_train, y_train, cv=5)

print('Accuracies: %s' % scores)

print('Mean accuracy: %s' % np.mean(scores))

Accuracies: [ 0.95221027  0.95454545  0.96172249  0.96052632  0.95209581]

Mean accuracy: 0.956220068309

precisions = cross_val_score(classifier, X_train, y_train, cv=5, scoring='precision')

print('Precision: %s' % np.mean(precisions))

recalls = cross_val_score(classifier, X_train, y_train, cv=5, scoring='recall')

print('Recall: %s' % np.mean(recalls))

f1s = cross_val_score(classifier, X_train, y_train, cv=5, scoring='f1')

print('F1 score: %s' % np.mean(f1s))

Precision: 0.992542742398

Recall: 0.683605030275

F1 score: 0.809067846627
F1是精确率和召回率的调和平均值。如果精确度为1，召回为0，那F1为0.还有F0.5和F2两种模型，分别偏重精确率和召回率。在一些场景下，召回率比精确率还更重要。

常用分类的对比

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

from sklearn.datasets import make_classification

from sklearn.model_selection import train_test_split

from sklearn.metrics import classification_report

X, y = make_classification(

    n_samples=5000, n_features=100, n_informative=20, n_clusters_per_class=2, random_state=11)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=11)

print('决策树')

clf = DecisionTreeClassifier(random_state=11)

clf.fit(X_train, y_train)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

print('随机森林')

clf = RandomForestClassifier(n_estimators=10, random_state=11)

clf.fit(X_train, y_train)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

print('逻辑回归')

clf = LogisticRegression()

clf.fit(X_train, y_train)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

print('AdaBoost')

clf = AdaBoostClassifier(n_estimators=50, random_state=11)

clf.fit(X_train, y_train)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

print('KNN近邻')

clf = KNeighborsClassifier(n_neighbors=3)

clf.fit(X_train,y_train)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

print('SVM支持向量机')

clf = SVC(kernel='rbf', C=100, gamma=0.1).fit(X, y)

predictions = clf.predict(X_test)

print(classification_report(y_test, predictions))

结果

决策树

              precision    recall  f1-score   support

           0       0.80      0.76      0.78       634

           1       0.76      0.80      0.78       616

    accuracy                           0.78      1250

   macro avg       0.78      0.78      0.78      1250

weighted avg       0.78      0.78      0.78      1250

随机森林

              precision    recall  f1-score   support

           0       0.79      0.86      0.82       634

           1       0.84      0.76      0.80       616

    accuracy                           0.81      1250

   macro avg       0.82      0.81      0.81      1250

weighted avg       0.82      0.81      0.81      1250

逻辑回归

              precision    recall  f1-score   support

           0       0.82      0.85      0.84       634

           1       0.84      0.81      0.83       616

    accuracy                           0.83      1250

   macro avg       0.83      0.83      0.83      1250

weighted avg       0.83      0.83      0.83      1250

AdaBoost
precision recall f1-score support

0 0.83 0.85 0.84 634
1 0.84 0.82 0.83 616

accuracy 0.83 1250
macro avg 0.83 0.83 0.83 1250
weighted avg 0.83 0.83 0.83 1250

KNN近邻
precision recall f1-score support

0 0.93 0.93 0.93 634
1 0.93 0.93 0.93 616

accuracy 0.93 1250
macro avg 0.93 0.93 0.93 1250
weighted avg 0.93 0.93 0.93 1250

SVM支持向量机
precision recall f1-score support

0 1.00 1.00 1.00 634
1 1.00 1.00 1.00 616

accuracy 1.00 1250
macro avg 1.00 1.00 1.00 1250
weighted avg 1.00 1.00 1.00 1250

分类的性能评估：准确率、精确率、Recall召回率、F1、F2的更多相关文章

准确率(Accuracy), 精确率(Precision), 召回率(Recall)和F1-Measure
yu Code 15 Comments 机器学习(ML),自然语言处理(NLP),信息检索(IR)等领域,评估(Evaluation)是一个必要的工作,而其评价指标往往有如下几点:准确率(Accu ...
二分类算法的评价指标：准确率、精准率、召回率、混淆矩阵、AUC
评价指标是针对同样的数据,输入不同的算法,或者输入相同的算法但参数不同而给出这个算法或者参数好坏的定量指标. 以下为了方便讲解,都以二分类问题为前提进行介绍,其实多分类问题下这些概念都可以得到推广. ...
准确率、精确率、召回率、F1
在搭建一个AI模型或者是机器学习模型的时候怎么去评估模型,比如我们前期讲的利用朴素贝叶斯算法做的垃圾邮件分类算法,我们如何取评估它.我们需要一套完整的评估方法对我们的模型进行正确的评估,如果模型效果比 ...
Recall(召回率);Precision（准确率）;F1-Meature（综合评价指标）;true positives；false positives;false negatives.
Recall(召回率);Precision(准确率);F1-Meature(综合评价指标);在信息检索(如搜索引擎).自然语言处理和检测分类中经常会使用这些参数. Precision:被检测出来的信息 ...
Recall(召回率);Precision(准确率);F1-Meature(综合评价指标);true positives；false positives;false negatives..
转自:http://blog.csdn.net/t710smgtwoshima/article/details/8215037 Recall(召回率);Precision(准确率);F1-Meat ...
机器学习classification_report方法及precision精确率和recall召回率说明
classification_report简介 sklearn中的classification_report函数用于显示主要分类指标的文本报告．在报告中显示每个类的精确度,召回率,F1值等信息. 主要 ...
一文让你彻底理解准确率，精准率，召回率，真正率，假正率，ROC/AUC
参考资料:https://zhuanlan.zhihu.com/p/46714763 ROC/AUC作为机器学习的评估指标非常重要,也是面试中经常出现的问题(80%都会问到).其实,理解它并不是非常难 ...
精确率与召回率，RoC曲线与PR曲线
在机器学习的算法评估中,尤其是分类算法评估中,我们经常听到精确率(precision)与召回率(recall),RoC曲线与PR曲线这些概念,那这些概念到底有什么用处呢? 首先,我们需要搞清楚几个拗口 ...
机器学习性能指标精确率、召回率、F1值、ROC、PRC与AUC--周振洋
机器学习性能指标精确率.召回率.F1值.ROC.PRC与AUC 精确率.召回率.F1.AUC和ROC曲线都是评价模型好坏的指标,那么它们之间有什么不同,又有什么联系呢.下面让我们分别来看一下这几个指标 ...

随机推荐

[FreeRTOS]FreeRTOS使用
转自:https://blog.csdn.net/zhzht19861011/article/details/49819109 FreeRTOS系列第1篇---为什么选择FreeRTOS? FreeR ...
石子归并（区间dp 模板）
区间dp入门 #include<iostream> #include<cstdio> #include <cctype> #include<algorithm ...
python开发笔记-如何做数据准备
时间格式: >>> from datetime import date >>> firstday = date.fromtimestamp(1464010200) ...
Python基础初始之二
1.格式化的输出当你遇到这样的需要:字符串中想让某些位置变成动态可传入的,首先考虑用格式化输出 1.格式化输出:% 2. 格式化输出:format 3. 格式化输出:f 2.运算符 3.编码待续
读取txt写入excel
import csv #实现的思想:首先从txt中读取所有的内容,NUM=1当做键,其他当做值,如果查找缺少a,b,c,d,e,f,g# 则NUM不会添加到字典中,然后通过所有的NUM和字典中的KEY ...
JSON—去除JSON数据中的所有HTML标…
package com.linoer.utils; import java.util.ArrayList; import java.util.List; import java.util.regex. ...
c#中的new和override的实例
using System; using System.Collections.Generic; using System.Linq; using System.Text; /* 简单说,抽象方法是需要 ...
vlang 试用
vlang 是最近出来的一门编程语言,集成了rust,golang, 等语言的特性,轻量.简洁.编译快速,详细的比价参数可以参考官方文档安装目前尽管官方提供了linux以及mac 的二进制文件, ...
IIS 站点配置文件
IIS 站点配置文件 C:/Windows/System32/inetsrv/config/applicationHost.config 配置文件示例: <system.application ...
Call to undefined function imagecreatefromjpeg() 让GD支持JPEG格式的图片扩展
安装扩展支持jpeg格式: 第一步:首先下载文件: 版本v8: wget http://www.ijg.org/files/jpegsrc.v8b.tar.gz 版本v9: wget http://w ...

分类的性能评估：准确率、精确率、Recall召回率、F1、F2

分类的性能评估：准确率、精确率、Recall召回率、F1、F2的更多相关文章

随机推荐

热门专题