[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- GoF23种设计模式之行为型模式之备忘录模式
一.概述 在不破坏封装性的前提下,捕获一个对象的内部状态,并在该对象的外部保存这个状态.以便以后可以将该对象恢复到原先保存的状态. 二.适用性 1.当需要保存一个对象在某个时刻的状态( ...
- STM32HAL学习博客
https://www.cnblogs.com/wt88/category/1297945.html
- PTA 银行排队问题之单队列多窗口加VIP服务 队列+模拟
假设银行有K个窗口提供服务,窗口前设一条黄线,所有顾客按到达时间在黄线后排成一条长龙.当有窗口空闲时,下一位顾客即去该窗口处理事务.当有多个窗口可选择时,假设顾客总是选择编号最小的窗口. 有些银行会给 ...
- 搜索引擎elasticsearch常用指令演示
目录 交互方式 常用操作示例 添加文档 删除文档 修改文档 查询 简单查询 高级多条件查询 交互方式 操作ES有3种方式: kibana控制台(Dev Tools) Http + json api接口 ...
- POJ:1328-Radar Installation
Radar Installation Time Limit: 1000MS Memory Limit: 10000K Description Assume the coasting is an inf ...
- 【报错】invalid or unexpected token
结果发现是把英文的,写成了中文字符,系统没法识别.
- 1507: [NOI2003]Editor(块状链表)
1507: [NOI2003]Editor Time Limit: 5 Sec Memory Limit: 162 MBSubmit: 4157 Solved: 1677[Submit][Stat ...
- CI 分页类的使用
分页本身很简单,无非就是一个 [limit $offset, $length] 的过程. $length 是每页显示的数据量,这个是固定的.要确定的就只有 $offset了. 在CI中的分页类同样要依 ...
- hashlib.md5加密
对一个字符串拼接时间,然后对其进行md5加密: import hashlib,time a='123absg' b=time.time() c='%s|%s'%(a,b) print(c) m=has ...
- 用Navicat Premium快速查看mysql数据库版本信息
在出现的界面输入命令 select version();