Categorical Data
This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.
In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, ...) are not possible.
All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.
The categorical data type is useful in the following cases:
- A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
- As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).
概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。
Categorical Data的更多相关文章
- Pandas的Categorical Data
http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...
- Pandas的Categorical Data类型
pandas从0.15版开始提供分类数据类型,用于表示统计学里有限且唯一性数据集,例如描述个人信息的性别一般就男和女两个数据常用'm'和'f'来描述,有时也能对应编码映射为0和1.血型A.B.O和AB ...
- [论文]A Link-Based Cluster Ensemble Approach for Categorical Data Clustering
http://www.cnblogs.com/Azhu/p/4137131.html 这篇论文建议先看了上面这一遍,两篇作者是一样的,方法也一样,这一片论文与上面的不同点在于,使用的数据集是目录数据, ...
- Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
factoextra is an R package making easy to extract and visualize the output of exploratory multivaria ...
- pandas入门10分钟——serries其实就是data frame的一列数据
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
- MAT 4378 – MAT 5317, Analysis of categorical
MAT 4378 – MAT 5317, Analysis of categorical data, Assignment 3 1MAT 4378 – MAT 5317, Analysis of ca ...
- Data Visualization and D3.js 笔记(1)
课程地址: https://classroom.udacity.com/courses/ud507 什么是数据可视化? 高效传达一个故事/概念,探索数据的pattern 通过颜色.尺寸.形式在视觉上表 ...
- 第七章 人工智能,7.6 DNN在搜索场景中的应用(作者:仁重)
7.6 DNN在搜索场景中的应用 1. 背景 搜索排序的特征分大量的使用了LR,GBDT,SVM等模型及其变种.我们主要在特征工程,建模的场景,目标采样等方面做了很细致的工作.但这些模型的瓶颈也非常的 ...
- 10分钟学习pandas
10 Minutes to pandas This is a short introduction to pandas, geared mainly for new users. You can se ...
随机推荐
- mysql 无法远程连接 没有监听端口
centos yum安装mysql: 远程连接完成用户授权和防火墙配置,可还是连接不上. 发现mysql没有监听3306端口. 修改mysql配置文件 vi /etc/my.conf 注释掉以下行,重 ...
- hyperworks2019x中模型简化
Defeature→Fillets
- JS new date在IOS出现的问题
实例代码: input = input.replace(/\-/g, "/");//横杠的时间不能被识别,所以要替换程斜杠 let time = new Date(input); ...
- ctrl+r 调用bash曾经的历史命令
在bash界面 按ctrl+r 可以调出, bash中曾经的历史命令, 光标会停留在 第一次被匹配的字符上, (即使后面你再输入被匹配的字符, 光标也不移动) 然后, 根据你的需要 来进行任何一次的操 ...
- leetcode 40. 组合总和 II (python)
给定一个数组 candidates 和一个目标数 target ,找出 candidates 中所有可以使数字和为 target 的组合. candidates 中的每个数字在每个组合中只能使用一次. ...
- 《计算机程式设计》Week2 课堂笔记
本笔记记录自 Coursera课程 <计算机程式设计> 台湾大学 刘邦锋老师 Week2 Control Structure 2-1 If-then-else if then 判断 if ...
- Android加载网络图片报android.os.NetworkOnMainThreadException异常
Android加载网络图片大致可以分为两种,低版本的和高版本的.低版本比如4.0一下或者更低版本的API直接利用Http就能实现了: 1.main.xml <?xml version=" ...
- 升级到mysql5.7无法启动问题解决
漏洞扫描,老项目升级到5.7位成功,启动发现报错:unknown option log_error 线备份my.cnf配置文件, 猜测应该是写法有问题,先把log_error 改成log #log_ ...
- PHP超时提示Fatal error: Maximum execution time of 30,解决方案
PHP执行超时提示如下:Fatal error: Maximum execution time of 30 seconds exceeded in D:\php\AppServ\www\sum3\te ...
- Oracle 修改语言环境
Oracle数据库还是用英文的比较好,毕竟是外国人开发的.而且许多提示都是模板化 的,所以不懂英文,也不要怕,多Google就会了. 唉,安装Oracle 数据库时,手贱语言选择了中文和英语.结果使用 ...