Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:

housing = load_housing_data()
housing.head()

The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.

housing.info()

The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.

housing.describe()

For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:

housing["ocean_proximity"].value_counts()

That’s it. I’ll update more functions if I meet in further study.

[Machine Learning with Python] Familiar with Your Data的更多相关文章

  1. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  2. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  3. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  4. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  5. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

  6. [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset

    The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...

  7. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  8. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  9. [Machine Learning with Python] How to get your data?

    Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...

随机推荐

  1. vue $emit子组件传出多个参数,如何在父组件中在接收所有参数的同时添加自定义参数

    Vue.js 父子组件通信的十种方式 前言 很多时候用$emit携带参数传出事件,并且又需要在父组件中使用自定义参数时,这时我们就无法接受到子组件传出的参数了.找到了两种方法可以同时添加自定义参数的方 ...

  2. 学习路由器vue-router

    vue-router:vue官方路由管理器. 功能:嵌套的路由/视图表模块化的.基于组件的路由配置路由参数.查询.通配符基于 Vue.js 过渡系统的视图过渡效果细粒度的导航控制带有自动激活的 CSS ...

  3. matplotlib学习记录 一

    from matplotlib import pyplot as plt # 先实例一个图片,传入图片参数,10宽,5高,分辨率为80 image = plt.figure(figsize=(10,5 ...

  4. UVa - 12096 集合栈计算机(STL)

    [题意] 有一个专门为了集合运算而设计的“集合栈”计算机.该机器有一个初始为空的栈,并且支持以下操作:PUSH:空集“{}”入栈DUP:把当前栈顶元素复制一份后再入栈UNION:出栈两个集合,然后把两 ...

  5. Linux学习-函式库管理

    动态与静态函式库 首先我们要知道的是,函式库的类型有哪些?依据函式库被使用的类型而分为两大类,分别是静态 (Static) 与动态 (Dynamic) 函式库两类. 静态函式库的特色: 扩展名:(扩展 ...

  6. Java构造器(construtor)与垃圾收集器(GB)

    在Java中,程序员会在乎内存中的两块空间. 堆(heap)和栈(stack). 当java虚拟机启动时, 它会从底层的操作系统取得一块内存, 并且以此块内存来执行java程序. 在Java中, 实例 ...

  7. JBuilder生成Exe

    首先保证工程可以通过绿箭头执行 然后在File菜单中选择New,先建立Archive下的Application 接下来的界面中大部分可以直接选择“Next”,除了下面的第3步,会询问是否需要将工程引用 ...

  8. [python学习篇][书籍学习][python standrad library][内建类型]迭代器类型

    我们已经知道,可以直接作用于for循环的数据类型有以下几种:一类是集合数据类型,如list.tuple.dict.set.str等:一类是generator,包括生成器和带yield的generato ...

  9. [adb 学习篇] python将adb命令集合到一个工具上

    https://testerhome.com/topics/6938 qzhi的更全面,不过意思是一样的,另外补充一个开源的https://github.com/264768502/adb_wrapp ...

  10. 【Istio】error initializing configuration '/etc/istio/proxy/envoy-rev0.json': malformed IP address: istio-statsd-prom-bridge

    今天遇到一个问题,istio的组件一直在重启,查看log大概是这个样子 --03T07::.935580Z info Epoch starting --03T07::.936317Z info Env ...