[Machine Learning with Python] Familiar with Your Data
Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:
housing = load_housing_data()
housing.head()
The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.
housing.info()
The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.
housing.describe()
For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:
housing["ocean_proximity"].value_counts()
That’s it. I’ll update more functions if I meet in further study.
[Machine Learning with Python] Familiar with Your Data的更多相关文章
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
- [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset
The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] How to get your data?
Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...
随机推荐
- ipvsadm启动报错解决方法
Centos7 yum -y install ipvadm 安装后,启动ipvsadm却报错. Redirecting to /bin/systemctl start ipvsadm.service ...
- LeetCode(260) Single Number III
题目 Given an array of numbers nums, in which exactly two elements appear only once and all the other ...
- 字符串-POJ3974-Palindrome
Palindrome Time Limit: 15000MS Memory Limit: 65536K Description Andy the smart computer science stud ...
- NPM包的安装及卸载
NPM全名:node package manager,是node包管理工具,负责安装.卸载.更新等.新版的NodeJS已经集成了npm.所以装好NodeJS的同时,npm也已经装好了! 可以用cmd命 ...
- selenium 自动化测试 Chrome 大于 63 版本 不能重定向问题解决办法
Chrome 一些信息: Chrome 63 以后,浏览器默认屏蔽了重定向 Chrome 63 版本,设置了禁止更新,有些情况还是会更新到最新版本 解决过程: 在博客上查到 selenium 里 给 ...
- SVR回归
1.python支持向量机回归svr预测 https://blog.csdn.net/u012581541/article/details/51181041 https://www.cnblogs.c ...
- python - work - 2
#-*- coding:utf-8 -*-# author:jiaxy# datetime:2018/11/3 11:48# software: PyCharm Community Edition d ...
- python 学习分享-基础篇
1.python起手式 写下第一个代码,打印‘hello world’ print('hello world') 2.变量 变量是为了存储信息,在程序中被调用,标识数据名称或类型. 变量定义的规则: ...
- 浅析 Node.js 的 vm 模块以及运行不信任代码
在一些系统中,我们希望给用户提供插入自定义逻辑的能力,除了 RPC 和 REST 之外,运行客户提供的代码也是比较常用的方法,好处是可以极大地减少在网络上的耗时.JavaScript 是一种非常流行而 ...
- 基于UDP的交互的实例
1.实现简单的客户端.服务端聊天交互 问题是:客户端不能单独一直发消息回复.. 服务端: import socket server=socket.socket(socket.AF_INET,socke ...