[Machine Learning with Python] Data Visualization by Matplotlib Library
Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic command %matplotlib inline. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap) called jet. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s), and the color represents the price (option c).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Scatter Matrix
from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms",
"housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
save_fig("scatter_matrix_plot")
Histogram
Histogram is a useful method to study the distribution of numeric attributes.
%matplotlib inline
import matplotlib.pyplot as plt
housing.hist(bins=50, figsize=(20,15))
save_fig("attribute_histogram_plots")
plt.show()
For single attribute, you can use the following statement:
housing["median_income"].hist()
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr() method and look at the value by sort_values():
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))
[Machine Learning with Python] Data Visualization by Matplotlib Library的更多相关文章
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- Getting started with machine learning in Python
Getting started with machine learning in Python Machine learning is a field that uses algorithms to ...
- Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
- 《Learning scikit-learn Machine Learning in Python》chapter1
前言 由于实验原因,准备入坑 python 机器学习,而 python 机器学习常用的包就是 scikit-learn ,准备先了解一下这个工具.在这里搜了有 scikit-learn 关键字的书,找 ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- In machine learning, is more data always better than better algorithms?
In machine learning, is more data always better than better algorithms? No. There are times when mor ...
- Machine Learning的Python环境设置
Machine Learning目前经常使用的语言有Python.R和MATLAB.如果采用Python,需要安装大量的数学相关和Machine Learning的包.一般安装Anaconda,可以把 ...
随机推荐
- HBase0.94.2-cdh4.2.0需求评估测试报告1.0之一
hbase是bigtable的开源山寨版本.是建立的hdfs之上,提供高可靠性.高性能.列存储.可伸缩.实时读写的数据库系统.它介于nosql和RDBMS之间,仅能通过主键(row key)和主键的r ...
- SMP IRQ Affinity
转:非常有用的方法,调式神器 SMP IRQ Affinity Background: Whenever a piece of hardware, such as disk controller or ...
- CodeForces 519E 树形DP A and B and Lecture Rooms
给出一棵树,有若干次询问,每次询问距两个点u, v距离相等的点的个数. 情况还挺多的,少侠不妨去看官方题解.^_^ #include <iostream> #include <cst ...
- day01_09.你已学会编程
目前你已经学会编程: 学会变量,运算,控制,你就学会了编程,我擦?真的,假的? 1.打印1-100,自己试试看呗 <?php $num = 1; while($num<=100){ ech ...
- bootstrap里的fileimput的小问题
fileinput 是bootstrap 里面一个非常好的插件 于是我很开心的开始的使用了 $("#file_upload").fileinput({ uploadUrl: &qu ...
- 如何用jquery+json来写页面
以下是json数据表: [ { "p" : "银川市", "c" : [{"c1":"兴庆区"},{ ...
- JS手风琴特效
<!DOCTYPE html><html> <head> <meta charset="UTF-8"> <title>& ...
- picPick使用研究
WhiteBoard白板功能很强大. 可以直接在网页上进行圈画,然后截图. ImageEditor是一个很好用的画图功能,比windows画图的箭头好看.
- ubuntu 安装tomcat<服务器>
一.下载tomcat 可以先下载到本地,然后ftp到服务器 官方 Apache Tomcat 的下载页面(下面的链接是apache自己的镜像服务器的地址,不同网络连接的话,apache会给出不同的镜像 ...
- 【CTSC2010】产品销售(bzoj1920)
数据结构优化网络流…… 重新定义一下题目的各种条件: 第 $i$ 天能生产 $a_i$ 个物品: 第 $i$ 天有 $b_i$ 个物品的需求: 每存储一天物品(把订单提前一天)需要 $c_i$ 的花费 ...