[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- webpack4搭建Vue开发环境笔记~~持续更新
项目git地址 一.node知识 __dirname: 获取当前文件所在路径,等同于path.dirname(__filename) console.log(__dirname); // Prints ...
- vue渲染函数&JSX
Vue推荐在绝大多数情况下使用template来创建你的HTML.然而在一些场景中,你真的需要JavaScript的完全编程能力,这时你可以使用render函数,它比template跟接近编译器. 虚 ...
- noip2017行记
前奏 写了所有的变量名在linux下测,结果发现并没什么用...听说将所有的变量加上下划线,加上自己的名字作为前缀.. lgj,“感觉今年有网络流,今年要立个flag”,zjr“你咋不上天儿” 在车上 ...
- UVa 11651 Krypton Number System DP + 矩阵快速幂
题意: 有一个\(base(2 \leq base \leq 6)\)进制系统,这里面的数都是整数,不含前导0,相邻两个数字不相同. 而且每个数字有一个得分\(score(1 \leq score \ ...
- Python之多线程与多进程(二)
多进程 上一章:Python多线程与多进程(一) 由于GIL的存在,Python的多线程并没有实现真正的并行.因此,一些问题使用threading模块并不能解决 不过Python为并行提供了一个替代方 ...
- python网络编程相关
什么是网络套接字socket?简述基于tcp协议的套接字的通信流程. 为了区别不同的应用程序进程和连接,许多计算机操作系统为应用程序与TCP/IP协议交互提供了称为套接字 (Socket)的接口,区分 ...
- django 缓存 实现
由于Django构建得是动态网站,每次客户端请求都要严重依赖数据库,当程序访问量大时,耗时必然会更加明显,最简单解决方式是使用:缓存,缓存将一个某个views的返回值保存至内存或者memcache中, ...
- JAVA-基础(六) Java.serialization 序列化
序 列 化 序列化(serialization)是把一个对象的状态写入一个字节流的过程. Serializable接口 只有一个实现Serializable接口的对象可以被序列化工具存储和恢复.Ser ...
- Install Oracle 11G Release 2 (11.2) on Oracle Linux 7 (OEL7)
Install Oracle 11G Release 2 (11.2) on Oracle Linux 7 (OEL7) This article presents how to install Or ...
- API生命周期第二阶段——设计:如何设计API(基于swagger进行说明)
题外话 在新的项目中,推行了swagger用于对API的设计.以期待解决我们上篇博客中说到了一些现象,提升工作效率.那么,结合到当时和全项目组成员做培训,以及后续的给主要应用者做培训,以及当初自己接触 ...