[Machine Learning with Python] Familiar with Your Data
Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:
housing = load_housing_data()
housing.head()
The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.
housing.info()
The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.
housing.describe()
For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:
housing["ocean_proximity"].value_counts()
That’s it. I’ll update more functions if I meet in further study.
[Machine Learning with Python] Familiar with Your Data的更多相关文章
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
- [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset
The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] How to get your data?
Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...
随机推荐
- 【android】安卓平台版本和API版本的对应关系
安卓平台版本和API版本对应关系
- python爬虫基础11-selenium大全5/8-动作链
Selenium笔记(5)动作链 本文集链接:https://www.jianshu.com/nb/25338984 简介 一般来说我们与页面的交互可以使用Webelement的方法来进行点击等操作. ...
- perl-basic-分支&循环
if elsif shorter if: if+condition放在句子尾部. use strict; use warnings; my $word = "antidisestablish ...
- Java-framework-Vaadin
安装vaadin: (1) 首先试了maven+vaadin. 安装maven: 1. unzip apache-maven-3.3.9-bin.zip 2. modify PATH environm ...
- poj-2386 lake counting(搜索题)
Time limit1000 ms Memory limit65536 kB Due to recent rains, water has pooled in various places in Fa ...
- hdu-1338 game predictions(贪心题)
Suppose there are M people, including you, playing a special card game. At the beginning, each playe ...
- Linux下open函数、read函数、write函数记录
open() #include<sys/types.h> #include<sys/stat.h> #include<fcntl.h> int open( cons ...
- win7下设置git客户端
msysgit官网: http://msysgit.github.io/ 下载msysgit http://msysgit.googlecode.com/files/Git-1.8.5.2-previ ...
- Linux学习-核心的编译与安装
编译核心与核心模块 核心与核心模块需要先编译起来,而编译的过程其实非常简单,你可以先使用『 make help 』去查 阅一下所有可用编译参数, 就会知道有底下这些基本功能: [root@study ...
- STL学习笔记1--vector
C++STL(Standard Template Library)标准模板库是通用类模板和算法的集合.包含一些标准的数据结构的实现如queues(队列),lists(链表),stacks(栈)等.ST ...