Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:

housing = load_housing_data()
housing.head()

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.

housing.info()

The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.

housing.describe()

For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:

housing["ocean_proximity"].value_counts()

That’s it. I’ll update more functions if I meet in further study.

[Machine Learning with Python] Familiar with Your Data的更多相关文章

  1. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  2. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  3. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  4. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  5. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

  6. [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset

    The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...

  7. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  8. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  9. [Machine Learning with Python] How to get your data?

    Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...

随机推荐

  1. H5bulider中的微信支付配置注意事项

    一.云打包安卓自定义证书的生成: 签名算法名称: SHA1withRSA主体公共密钥算法:1024 位 RSA 密钥密钥库类型:JKS 1.下载JDK1.6安装,切换到bin目录,打开命令行: 2.生 ...

  2. 持续化集成Jenkins的系统配置

    最近在研究selenium2自动化测试,用到持续化集成jenkins.由于之前仅限于使用,而没有真正动手配置过,所以现在学习从零开始,搭建持续化集成,故而有了这篇博客. 先介绍一下项目持续集成测试,这 ...

  3. CSS(非布局样式)

    CSS(非布局样式) 问题1.CSS样式(选择器)的优先级 1.计算权重 2.!important 3.内联样式比外嵌样式高 4.后写的优先级高 问题2.雪碧图的作用 1.减少 HTTP 请求数,提高 ...

  4. 大数据学习——scala函数与方法

    package com /** * Created by Administrator on 2019/4/8. */ object TestMap { def ttt(f: Int => Int ...

  5. Nginx从入门到放弃-第4章 深度学习篇

    4-1 Nginx动静分离_动静分离场景演示 4-2 Nginx动静分离_动静分离场景演示1 4-3 Nginx的动静分离_动静分离场景演示2 4-4 Rewrite规则_rewrite规则的作用 4 ...

  6. day04_08 while循环02

    练习题: 1.输出九九乘法表 2.使用#号输出一个长方形,用户可以指定宽和高,如果长为3,高为4,则输出一个 横着有3个#号,竖着有4个#号 的长方形. 3.如何输出一个如下的直角三角形,用户指定输出 ...

  7. python学习-- for和if结合使用

    for和if结合使用: <h1> {% for i in contents %} {{ i }}{# 注意i也要用两个大括号 #} {% endfor %} </h1> < ...

  8. Leetcode 462.最少移动次数使数组元素相等

    最少移动次数使数组元素相等 给定一个非空整数数组,找到使所有数组元素相等所需的最小移动数,其中每次移动可将选定的一个元素加1或减1. 您可以假设数组的长度最多为10000. 例如: 输入: [1,2, ...

  9. 【mysql 优化 1】优化概述

    原文地址:Optimization Overview 数据库性能取决于几个数据库层面的因素,比如:表设计,查询语句,配置. 这些软件结构导致你必须在CPU和I/O 操作的硬件层面做到尽可能的最小化和高 ...

  10. 【java基础 17】集合中各实现类的性能分析

    大致的再回顾一下java集合框架的基本情况 一.各Set实现类的性能分析 1.1,HashSet用于添加.查询 HashSet和TreeSet是Set的两个典型实现,HashSet的性能总是比Tree ...