This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.

In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.

All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.

The categorical data type is useful in the following cases:

  • A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
  • The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
  • As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。

Categorical Data的更多相关文章

  1. Pandas的Categorical Data

    http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...

  2. Pandas的Categorical Data类型

    pandas从0.15版开始提供分类数据类型,用于表示统计学里有限且唯一性数据集,例如描述个人信息的性别一般就男和女两个数据常用'm'和'f'来描述,有时也能对应编码映射为0和1.血型A.B.O和AB ...

  3. [论文]A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

    http://www.cnblogs.com/Azhu/p/4137131.html 这篇论文建议先看了上面这一遍,两篇作者是一样的,方法也一样,这一片论文与上面的不同点在于,使用的数据集是目录数据, ...

  4. Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization

    factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...

  5. pandas入门10分钟——serries其实就是data frame的一列数据

    10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...

  6. MAT 4378 – MAT 5317, Analysis of categorical

    MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of ca ...

  7. Data Visualization and D3.js 笔记(1)

    课程地址: https://classroom.udacity.com/courses/ud507 什么是数据可视化? 高效传达一个故事/概念,探索数据的pattern 通过颜色.尺寸.形式在视觉上表 ...

  8. 第七章 人工智能,7.6 DNN在搜索场景中的应用(作者:仁重)

    7.6 DNN在搜索场景中的应用 1. 背景 搜索排序的特征分大量的使用了LR,GBDT,SVM等模型及其变种.我们主要在特征工程,建模的场景,目标采样等方面做了很细致的工作.但这些模型的瓶颈也非常的 ...

  9. 10分钟学习pandas

    10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...

随机推荐

  1. ThinkPHP框架实现rewrite路由配置

    rewrite路由形式:   //网址/分组/控制器/方法 配置实现rewrite路由的配置: 1. 修改apache的配置 先修改httpd.conf配置文件中的AllowOverrideAll,全 ...

  2. list转datatable,SqlBulkCopy将DataTable中的数据批量插入数据库

    /// <summary> /// 将泛类型集合List类转换成DataTable /// </summary> /// <param name="list&q ...

  3. java实现多种加密模式的AES算法-总有一种你用的着

    https://blog.csdn.net/u013871100/article/details/80100992 AES-128位-无向量-ECB/PKCS7Padding package com. ...

  4. web.xml 通过contextConfigLocation配置spring 的方式

    部署到tomcat后,src目录下的配置文件会和class文件一样,自动copy到应用的 classes目录下 spring的 配置文件在启动时,加载的是web-info目录下的application ...

  5. 微信企业号 发送信息 shell

    微信企业号发送信息shell #可作为shell函数模块调用,用于微信通知.jenkins发版微信通知等等 # 微信API官方文档 https://work.weixin.qq.com/api/doc ...

  6. Map m=new HashMap()

    Map<String,String> m=new HashMap<String,String>() 等于 HashMap<String,String> hashMa ...

  7. 【ABAP系列】SAP ABAP ALV中设置CHECKBOX同时选中事件

    公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[MM系列]SAP ABAP ALV中设置CHE ...

  8. java中java.util.Date和java.sql.Date之间的关系和使用选择

    关系: java.util.Date是java.sql.Date的父类 区别:(java.sql.Date包含年月日信息,java.util.Date包含年月日时分秒) 1:“规范化”的java.sq ...

  9. YOLOV3中Darknet中cfg文件说明和理解

    今天将要说明的是Darknet中的cfg文件,废话少说,直接干!(以cfg/yolov3.cfg为例,其它类似) [net]                        ★ [xxx]开始的行表示网 ...

  10. Linux系统中tomcat的安装及优化

    Linux系统中Tomcat 8 安装 Tomcat 8 安装 官网:http://tomcat.apache.org/ Tomcat 8 官网下载:http://tomcat.apache.org/ ...