Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:

housing = load_housing_data()
housing.head()

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.

housing.info()

The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.

housing.describe()

For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:

housing["ocean_proximity"].value_counts()

That’s it. I’ll update more functions if I meet in further study.

[Machine Learning with Python] Familiar with Your Data的更多相关文章

  1. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  2. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  3. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  4. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  5. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

  6. [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset

    The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...

  7. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  8. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  9. [Machine Learning with Python] How to get your data?

    Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...

随机推荐

  1. angular4使用代理

    1. 在angular-cli项目根目录下创建proxy.config.json { "/api/v1": { "target": "http://1 ...

  2. Linux 用户管理(二)

    一.groupadd --create a new group 创建新用户 -g  --gid GID 二.groupdel --delete a group 三.passwd --update us ...

  3. PHP数组函数 array_multisort() ----对多个数组或多维数组进行排序

    PHP中array_multisort可以用来一次对多个数组进行排序,或者根据某一维或多维对多维数组进行排序. 关联(string)键名保持不变,但数字键名会被重新索引. 输入数组被当成一个表的列并以 ...

  4. jvm架构以及Tomcat优化

      JVM栈 JVM栈是线程私有的,每个线程创建的同时都会创建JVM栈,JVM栈中存放的为当前线程中局部基本类型的变量(java中定义的八种基本类型:boolean.char.byte.short.i ...

  5. Java底层基础题

    一.Java底层基础题 1.SpringMVC的原理以及返回数据如何渲染到jsp/html上? 答:Spring MVC的核心就是DispatcherServlet , 一个请求经过Dispatche ...

  6. (转)webView清除缓存

    NSURLCache * cache = [NSURLCache sharedURLCache]; [cache removeAllCachedResponses]; [cache setDiskCa ...

  7. 开源OA系统启动:基础数据,工作流设计

    原文:http://www.cnblogs.com/kwklover/archive/2007/01/13/bpoweroa_03_baseandworkflowdesign.html自从开源OA系统 ...

  8. HDU 4565 So Easy! 矩阵快速幂

    题意: 求\(S_n=\left \lceil (a+\sqrt{b})^n \right \rceil mod \, m\)的值. 分析: 设\((a+\sqrt{b})^n=A_n+B_n \sq ...

  9. JavaScript: 2015 年回顾与展望

    链接:http://www.sitepoint.com/javascript-2015-review/ JavaScript经历了一个不平凡的一年.尽管到5月份已经20年了,关于JS的新闻.项目和兴趣 ...

  10. mysql update连表

    UPDATE price_air_item t1 LEFT JOIN order_item t2 ON t1.ORDER_ITEM_ID = t2.ORDER_ITEM_ID SET t1.BUYER ...