[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline
. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha
to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap
) called jet
. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s
), and the color represents the price (option c
).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False
fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr()
method and look at the value by sort_values()
:
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix
function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- paper:synthesizable finite state machine design techniques using the new systemverilog 3.0 enhancements 之 FSM Coding Goals
1.the fsm coding style should be easily modifiable to change state encoding and FSM styles. FSM 的的 状 ...
- 五一4天就背这些Python面试题了,Python面试题No12
第1题: Python 中的 os 模块常见方法? os 属于 python内置模块,所以细节在官网有详细的说明,本道面试题考察的是基础能力了,所以把你知道的都告诉面试官吧 官网地址 https:// ...
- 天问之Linux内核中的不明白的地方
1. Linux 0.11\linux\kernel\exit.c 文件中, 无论是send_sig()函数还是kill_session()函数中,凡是涉及到发送信号的地方,都是直接 (*p)- ...
- emacs写cnblog博客
emacs的版本 org-mode版本 参考链接: 用Emacs管理博客园博客 用emacs org-mode写cnblogs博客 用emacs org-mode写博客 & 发布到博客 ...
- jenkins匿名用户登录 - 安全设置
最近自己安装配置jenkins,但是跑任务时,发现是匿名账户登录,且提示: 后来发现搭建好jenkins之后,默认就是匿名用户登录的,可以在安装设置菜单里进行账户管理. 1.登录Jenkins服务器, ...
- dubbo rpc filter实现剖析(一)
2.6.3版本,之前读的是2.4.9版本 本篇主要阐述dubbo rpc的filter的实现,包括作用,用法,原理,与Spring Cloud在这些能力的对比. 共提供了多少个?是哪些?发布时默认装配 ...
- verilog写的LCD1602 显示
在读本文之前,请先阅读 LCD1602 的 datasheet(百度到处都是) ,熟悉有关的11条指令集. LCD1602的11个指令集链接 http://www.cnblogs.com/aslmer ...
- ICM Technex 2018 and Codeforces Round #463 (Div. 1 + Div. 2, combined)
靠这把上了蓝 A. Palindromic Supersequence time limit per test 2 seconds memory limit per test 256 megabyte ...
- Model View Controller(MVC) in PHP
The model view controller pattern is the most used pattern for today’s world web applications. It ha ...
- hdu1599 find the mincost route floyd求出最小权值的环
find the mincost route Time Limit: 1000/2000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/O ...