Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.

Scatter Plot

housing.plot(kind="scatter", x="longitude", y="latitude")

You can set the parameter alpha to study the density of points:

housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)

The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).

 %matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")

Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).

Scatter Matrix

 from pandas.plotting import scatter_matrix

 attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")

Histogram

Histogram is a useful method to study the distribution of numeric attributes.

 %matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()

For single attribute, you can use the following statement:

housing["median_income"].hist()

Correlation Plot

We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():

corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)

Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.

from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))

[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章

  1. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  2. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  3. Getting started with machine learning in Python

    Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...

  4. Python (1) - 7 Steps to Mastering Machine Learning With Python

    Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...

  5. 《Learning scikit-learn Machine Learning in Python》chapter1

    前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...

  6. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  7. 【Machine Learning】Python开发工具:Anaconda+Sublime

    Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...

  8. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  9. Machine Learning的Python环境设置

    Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...

随机推荐

  1. 多线程辅助类之CountDownLatch(三)

    CountDownLatch信号灯是一个同步辅助类,在完成一组正在其他线程中执行的操作之前,它允许一个或多个线程一直等待.它可以实现多线程的同步互斥功能,和wait和notify方法实现功能类似,具体 ...

  2. java/jsp执行sql语句的方式

    首先给出sql驱动包 引入sql包 import java.sql.*;//java <%@ page import="java.sql.*"%>//jsp 连接mys ...

  3. python3爬取豆瓣top250电影

    需求:爬取豆瓣电影top250的排名.电影名称.评分.评论人数和一句话影评 环境:python3.6.5 准备工作: 豆瓣电影top250(第1页)网址:https://movie.douban.co ...

  4. 通过SWD J-Link使用J-Link RTT Viewer来查看打印日志

    详细的说明可以参考:https://www.cnblogs.com/iini/p/9279618.html sdk版本: 15.2.0 例程目录:\nRF5_SDK_15.2.0_9412b96\ex ...

  5. huu 1251

    #include <iostream> #include <cstdio> #include <cstring> #include <string> # ...

  6. py文件转exe时包含paramiko模块出错解决方法

    问题描述:python代码中包含paramiko模块的远程登录ssh,在用pyInstaller转为exe时报错, 报错提示为“No handlers could be found for logge ...

  7. dwr介绍及配置

    DWR 编辑 DWR(Direct Web Remoting)是一个用于改善web页面与Java类交互的远程服务器端Ajax开源框架,可以帮助开发人员开发包含AJAX技术的网站.它可以允许在浏览器里的 ...

  8. java内存模型学习

    根据 JVM 规范,JVM 内存共分为虚拟机栈.堆.方法区.程序计数器.本地方法栈五个部分. 虚拟机的内存模型分为两部分:一部分是线程共享的,包括 Java 堆和方法区:另一部分是线程私有的,包括虚拟 ...

  9. SVM用于线性回归

    SVM用于线性回归 方法分析 在样本数据集()中,不是简单的离散值,而是连续值.如在线性回归中,预测房价.与线性回归类型,目标函数是正则平方误差函数: 在SVM回归算法中,目的是训练出超平面,采用作为 ...

  10. The 2018 ACM-ICPC Asia Qingdao Regional Contest, Online

    A Live Love DreamGrid is playing the music game Live Love. He has just finished a song consisting of ...