[Machine Learning with Python] Familiar with Your Data
Here I list some useful functions in Python to get familiar with your data. As an example, we load a dataset named housing which is a DataFrame object. Usually, the first thing to do is get top five rows the dataset by head() function:
housing = load_housing_data()
housing.head()
The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values.
housing.info()
The describe() function will return statistics including count, mean, median, std, min, max and quantiles of each feature.
housing.describe()
For categorical varibles, we usually hope to see the labels and the count for each label. value_counts() function works here:
housing["ocean_proximity"].value_counts()
That’s it. I’ll update more functions if I meet in further study.
[Machine Learning with Python] Familiar with Your Data的更多相关文章
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
- [Machine Learning with Python] My First Data Preprocessing Pipeline with Titanic Dataset
The Dataset was acquired from https://www.kaggle.com/c/titanic For data preprocessing, I firstly def ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] How to get your data?
Using Pandas Library The simplest way is to read data from .csv files and store it as a data frame o ...
随机推荐
- mysql的字符串连接符
以前用SQL Server 连接字符串是用“+”,现在数据库用mysql,写个累加两个字段值SQL语句居然不支持"+",郁闷了半天在网上查下,才知道mysql里的+是数字相加的操作 ...
- Python之路--序列化
序列化的目的 1.以某种存储形式使自定义对象持久化 2.将对象从一个地方传递到另一个地方 3.使程序更具有维护性 json json多语言通用 四个功能:dumps.dump.loads.load # ...
- LeetCode(5)Longest Palindromic Substring
题目 Given a string S, find the longest palindromic substring in S. You may assume that the maximum le ...
- STM32HAL学习博客
https://www.cnblogs.com/wt88/category/1297945.html
- 《鸟哥的Linux私房菜》学习笔记(3)——根文件系统
一.Linux目录结构 rootfs:根文件系统,根是"/". 1./boot 系统启动相关的文件,如内核.intrd.以及grub(bootloader) root@hao:~# ...
- IAR调试时出现IAR one or more breakpoints could not be set and have been disabled的解决办法
问题:在IAR调试时,单步执行的时候绿色箭头一直指向汇编界面,不指向C语言界面,并且不能在C语言界面设置断点,以及在代码编辑界面,设置断点,点调试时总提示IAR one or more breakpo ...
- javaweb通过接口来实现多个文件压缩和下载(包括单文件下载,多文件批量下载)
原博客地址:https://blog.csdn.net/weixin_37766296/article/details/80044000 将多个文件压缩并下载下来:(绿色为修改原博客的位置) 注意:需 ...
- hdu 1011 Starship Troopers(树形背包)
Starship Troopers Time Limit: 10000/5000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Other ...
- JS实现——Base64编码解码,带16进制显示
在网上找了个JS实现的Base64编码转换,所以就想自己研究下,界面如下: 将代码以BASE64方式加密.解密 请输入要进行编码或解码的字符: 编码结果以ASCII码16进制显示 解码结果以ASCII ...
- JavaScript日期时间格式化函数
这篇文章主要介绍了JavaScript日期时间格式化函数分享,需要的朋友可以参考下 这个函数经常用到,分享给大家. 函数代码: //格式化参数说明: //y:年,M:月,d:日,h:时,m分,s:秒, ...