Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

 %matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

Scatter Matrix

 from pandas.plotting import scatter_matrix

 attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")

Histogram

Histogram is a useful method to study the distribution of numeric attributes.

 %matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()

For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章

  1. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  2. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  3. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  4. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  5. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  6. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. paper:synthesizable finite state machine design techniques using the new systemverilog 3.0 enhancements 之 FSM Coding Goals

    1.the fsm coding style should be easily modifiable to change state encoding and FSM styles. FSM 的的 状 ...

  2. 五一4天就背这些Python面试题了,Python面试题No12

    第1题: Python 中的 os 模块常见方法? os 属于 python内置模块,所以细节在官网有详细的说明,本道面试题考察的是基础能力了,所以把你知道的都告诉面试官吧 官网地址 https:// ...

  3. 天问之Linux内核中的不明白的地方

    1. Linux 0.11\linux\kernel\exit.c 文件中, 无论是send_sig()函数还是kill_session()函数中,凡是涉及到发送信号的地方,都是直接    (*p)- ...

  4. emacs写cnblog博客

    emacs的版本 org-mode版本   参考链接: 用Emacs管理博客园博客   用emacs org-mode写cnblogs博客 用emacs org-mode写博客 & 发布到博客 ...

  5. jenkins匿名用户登录 - 安全设置

    最近自己安装配置jenkins,但是跑任务时,发现是匿名账户登录,且提示: 后来发现搭建好jenkins之后,默认就是匿名用户登录的,可以在安装设置菜单里进行账户管理. 1.登录Jenkins服务器, ...

  6. dubbo rpc filter实现剖析(一)

    2.6.3版本,之前读的是2.4.9版本 本篇主要阐述dubbo rpc的filter的实现,包括作用,用法,原理,与Spring Cloud在这些能力的对比. 共提供了多少个?是哪些?发布时默认装配 ...

  7. verilog写的LCD1602 显示

    在读本文之前,请先阅读 LCD1602 的 datasheet(百度到处都是) ,熟悉有关的11条指令集. LCD1602的11个指令集链接 http://www.cnblogs.com/aslmer ...

  8. ICM Technex 2018 and Codeforces Round #463 (Div. 1 + Div. 2, combined)

    靠这把上了蓝 A. Palindromic Supersequence time limit per test 2 seconds memory limit per test 256 megabyte ...

  9. Model View Controller(MVC) in PHP

    The model view controller pattern is the most used pattern for today’s world web applications. It ha ...

  10. hdu1599 find the mincost route floyd求出最小权值的环

    find the mincost route Time Limit: 1000/2000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/O ...