Categorical Data
This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.
In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.
All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.
The categorical data type is useful in the following cases:
- A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
- As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).
概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。
Categorical Data的更多相关文章
- Pandas的Categorical Data
http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...
- Pandas的Categorical Data类型
pandas从0.15版开始提供分类数据类型,用于表示统计学里有限且唯一性数据集,例如描述个人信息的性别一般就男和女两个数据常用'm'和'f'来描述,有时也能对应编码映射为0和1.血型A.B.O和AB ...
- [论文]A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
http://www.cnblogs.com/Azhu/p/4137131.html 这篇论文建议先看了上面这一遍,两篇作者是一样的,方法也一样,这一片论文与上面的不同点在于,使用的数据集是目录数据, ...
- Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...
- pandas入门10分钟——serries其实就是data frame的一列数据
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
- MAT 4378 – MAT 5317, Analysis of categorical
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of ca ...
- Data Visualization and D3.js 笔记(1)
课程地址: https://classroom.udacity.com/courses/ud507 什么是数据可视化? 高效传达一个故事/概念,探索数据的pattern 通过颜色.尺寸.形式在视觉上表 ...
- 第七章 人工智能,7.6 DNN在搜索场景中的应用(作者:仁重)
7.6 DNN在搜索场景中的应用 1. 背景 搜索排序的特征分大量的使用了LR,GBDT,SVM等模型及其变种.我们主要在特征工程,建模的场景,目标采样等方面做了很细致的工作.但这些模型的瓶颈也非常的 ...
- 10分钟学习pandas
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
随机推荐
- MySQL主从复制之半同步模式
MySQL主从复制之半同步模式 MySQL半同步介绍: 一般情况下MySQL默认复制模式为异步,何为异步?简单的说就是主服务器上的I/O threads 将binlog写入二进制日志中就返回给客户端一 ...
- CTEX WinEdt 改变默认 pdf viewer
CTEX 2.9.2, WinEdt 7.0 "Options" -> "Excution Modes..." -> "PDF viewe ...
- TreeSet 源码分析
TreeSet 1)底层由 TreeMap 支持的 Set 接口实现,Set 中的元素按照自然顺序或指定的比较器排序. 创建实例 /** * 支持此 Set 的底层的 TreeMap 对象 */ pr ...
- AtomicStampedReference 源码分析
AtomicStampedReference AtomicStampedReference 能解决什么问题?什么时候使用 AtomicStampedReference? 1)AtomicStamped ...
- 树莓派3b折腾指南
最近入手了树梅派3b,搭建了宿舍共享的热点和NAS,搭建透明代理科学上网的计划还没实现. 先报个价,一套折腾下来花了500大洋,树梅派3加外壳200,电源加内存卡100,显示器淘宝二手150,有线键鼠 ...
- bash shell:获取当前脚本的绝对路径(pwd/readlink)
有时候,我们需要知道当前执行的输出shell脚本的所在绝对路径,可以用dirname实现. 我们知道 dirname 可以获取一个文件所在的路径,dirname的用处是: 输出已经去除了尾部的”/”字 ...
- 利用python+tkinter做一个简单的智能电视遥控器
要通过python实现遥控器功能分两步: 第一步:开发图形化界面,以暴风TV的遥控器按钮为例 第二步:使PC端给电视发送相应指令(此步骤需要打开电视的adb开关) 现在就开始第一步操作实现遥控器功能, ...
- 【公众号系列】SAP 主要模块及简介
公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[公众号系列]SAP 主要模块及简介 前言部 ...
- failed to create process ,pip报错问题
- mooc-IDEA live template--006
十二.IntelliJ IDEA -live template 以定时器为例: 1.创建一个Template Group... 2.在创建的Template Group下面,创建一个Live Temp ...