Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

 %matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

Scatter Matrix

 from pandas.plotting import scatter_matrix

 attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")

Histogram

Histogram is a useful method to study the distribution of numeric attributes.

 %matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()

For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章

  1. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  2. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  3. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  4. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  5. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  6. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. 点击tr实现选择checkbox功能,点击checkobx的时候阻止冒泡事件, jquery给checkbox添加checked属性或去掉checked属性不能使checkobx改变状态

    给tr添加点击事件,使用find方法查找tr下的所有层级的元素,children只查找下一层级的元素,所以使用find.find的返回值为jquery对象,在这个项目中不知道为什么使用jquery给c ...

  2. 【markdown】图片的处理

    1st: ![tip](link) 2ed: ![tip][id] [id]:base64string 本地图片 先把本地图片文件转换成base64位编码 然后把 link 替换成生成的base64编 ...

  3. LeetCode(173) Binary Search Tree Iterator

    题目 Implement an iterator over a binary search tree (BST). Your iterator will be initialized with the ...

  4. Nordic Collegiate Programming Contest 2015​ B. Bell Ringing

    Method ringing is used to ring bells in churches, particularly in England. Suppose there are 6 bells ...

  5. Linux的档案权限与目录配置练习题

    1.请说明/bin与/usr/bin目录所防止的执行文件有何不同之处:/bin主要放置在开机时,以及进入单人维护模式后还能够被使用的指令,至于/usr/bin则是大部分软件提供的指令放置处 2.请说明 ...

  6. STL学习笔记2--list

    List --- 双向列表 List是线性列表结构,数据查找需要一个接一个,不能直接得到元素地址,检索时间与目标元素的位置成正比.但是插入数据比较快,可以在任何位置插入数据或者删除数据.list特点是 ...

  7. 记一次WMS的系统改造(1)-分析问题

    海外落地中的困境 目前面临主要的问题是"人",仓储系统主要辅助仓储人员进行生产,所以人变了其实一切就都已经变了,系统在海外面临最大的问题就是人变了. 这套软件是在国内的运营体系 ...

  8. dubbo rpc filter实现剖析(一)

    2.6.3版本,之前读的是2.4.9版本 本篇主要阐述dubbo rpc的filter的实现,包括作用,用法,原理,与Spring Cloud在这些能力的对比. 共提供了多少个?是哪些?发布时默认装配 ...

  9. Python_Virtualenv及Pycharm配置

    Virtualenv存在的意义 在Python使用过程中,你是否有遇到过同时需要开发多个应用的情况? 假设A应用需要使用DJango1.X版本,而B应用需要使用DJango2.X的版本,而你全局开发环 ...

  10. Flask_单例模式

    在flask实现单例模式的方法有多种: 这里我们列举五种,行吗? 第一种: 国际惯例:基于文件导入 第二种: 基于类的单例模式: 它又分两种: 一种加锁,一种不加锁. 不加锁的话,可以并发,但是我们的 ...