原文:Learning from Imbalanced Classes

数据不平衡是一个非常经典的问题,数据挖掘、计算广告、NLP等工作经常遇到。该文总结了可能有效的方法,值得参考:

  • Do nothing. Sometimes you get lucky and nothing needs to be done. You can train on the so-called natural (or stratified) distribution and sometimes it works without need for modification.
  • Balance the training set in some way:
    • Oversample the minority class.
    • Undersample the majority class.
    • Synthesize new minority classes.
  • Throw away minority examples and switch to an anomaly detection framework.
  • At the algorithm level, or after it:
    • Adjust the class weight (misclassification costs).
    • Adjust the decision threshold.
    • Modify an existing algorithm to be more sensitive to rare classes.
  • Construct an entirely new algorithm to perform well on imbalanced data.

[导读]Learning from Imbalanced Classes的更多相关文章

  1. (转) Learning from Imbalanced Classes

    Learning from Imbalanced Classes AUGUST 25TH, 2016 If you’re fresh from a machine learning course, c ...

  2. Learning from Imbalanced Classes

    https://www.svds.com/learning-imbalanced-classes/ 下采样即 从大类负类中随机取一部分,跟正类(小类)个数相同,优点就是降低了内存大小,速度快! htt ...

  3. (转)8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

    8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset by Jason Brownlee on August ...

  4. 不平衡学习 Learning from Imbalanced Data

    问题: ICC警情数据分类不均,30+分类,最多的分类数据数量1w+条,只有10个类别数量超过1k,大部分分类数量少于100条. 解决办法: 下采样:通过非监督学习,找出每个分类中的异常点,减少数据. ...

  5. learning scala generic classes

    package com.aura.scala.day01 object genericClasses { def main(args: Array[String]): Unit = { val sta ...

  6. How to handle Imbalanced Classification Problems in machine learning?

    How to handle Imbalanced Classification Problems in machine learning? from:https://www.analyticsvidh ...

  7. 【深度学习Deep Learning】资料大全

    最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books  by Yoshua Bengio, Ian Goodfellow and Aaron C ...

  8. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  9. 机器学习中如何处理不平衡数据(imbalanced data)?

    推荐一篇英文的博客: 8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset 1.不平衡数据集带来的影响 一个不 ...

随机推荐

  1. Missing artifact com.microsoft.sqlserver:sqljdbc4:jar:4.0

    maven构建项目的时候遇到这个错误: 一.直接原因 制定路径下确实没有sqljdbc4.jar文件. 二.根本原因 微软不允许以maven的方式直接下载该文件. 三.解决办法 3.1 手动下载相关库 ...

  2. DP4J -- mnist

    标签(空格分隔): DeepLearning mnist mnist是一个数据集,其中包含很多手写数字的图片,每张图片都已经打上了label: Deep Learning 传统的机器学习神经网络由一层 ...

  3. Java Web之JavaBean

    一.什么是javaBean javaBean是一个遵循特定写法的java类,通常具有如下的特点: 这个java类必须具有一个无参的构造函数. 属性必须私有化. 私有化的属性必须通过public类型的方 ...

  4. (译)关于async与await的FAQ

    传送门:异步编程系列目录…… 环境:VS2012(尽管System.Threading.Tasks在.net4.0就引入,在.net4.5中为其增加了更丰富的API及性能提升,另外关键字”async” ...

  5. iis设置asp站点

    在 IIS 6.0 中,默认设置是特别严格和安全的,这样可以最大限度地减少因以前太宽松的超时和限制而造成的攻击.譬如说默认配置数据库属性实施的最大 ASP 张贴大小为 204,800 个字节,并将各个 ...

  6. sublime-text3 3059基本配置

    1.下载安装官方版注册机语言包 参考安装: http://www.xiumu.org/note/sublime-text-3.shtml 2.插件 Package ControlConvertToUT ...

  7. Codeforces Round #267 (Div. 2)

    A #include <iostream> #include<cstdio> #include<cstring> #include<algorithm> ...

  8. Linux 内核编译

    注:该文章部分内容摘录自以下链接. http://www.cnblogs.com/zhunian/archive/2012/04/04/2431883.html 创建内核的第一步是创建一个 .conf ...

  9. JS原型的问题Object和Function到底是什么关系

    var F = function(){}; Objcert.prototype.a = function(){}; Function.prototype.b = function(){}; F 既能访 ...

  10. 如何透过HTC Vive拍摄Mixed Reality (混合现实)影片

    https://www.vive.com/cn/forum/1706?extra=page%3D1 也许你是一位开发者,想为自己的HTC Vive游戏制作酷炫的宣传片:或者你是游戏主播,想为观众带来高 ...