Your data vis “Spidey-sense” & the need for a robust “utility belt”
@theboysmithy did a great piece on coming up with an alternate view for a timeline for an FT piece.
Here’s an excerpt (read the whole piece, though, it’s worth it):
Here is an example from a story recently featured in the FT: emerging- market populations are expected to age more rapidly than those in developed countries. The figures alone are compelling: France is expected to take 157 years (from 1865 to 2022) to triple the proportion of its population aged over 65, from 7 per cent to 21 per cent; for China, the equivalent period is likely to be just 34 years (from 2001 to 2035).
You may think that visualising this story is as simple as creating a bar chart of the durations ordered by length. In fact, we came across just such a chart from a research agency.
But, to me, this approach generates “the feeling” — and further scrutiny reveals specific problems. A reader must work hard to memorise the date information next to the country labels to work out if there is a relationship between the start date and the length of time taken for the population to age. The chart is clearly not ideal, but how do we improve it?
Alan went on to talk about the process of improving the vis, eventually turning to Joseph Priestly for inspiration. Here’s their makeover:
Alan used D3 to make this, which had me head scratching for a bit. Bostock is genius & I :heart: D3 immensely, but I never really thought of it as a “canvas” for doing general data visualization creation for something like a print publication (it’s geared towards making incredibly data-rich interactive visualizations). It’s 100% cool to do so, though. It has fine-grained control over every aspect of a visualization and you can easily turn SVGs into PDFs or use them in programs like Illustrator to make the final enhancements. However, D3 is not the only tool that can make a chart like this.
I made the following in R (of course):

The annotations in Alan’s image were (99% most likely) made with something like Illustrator. I stopped short of fully reproducing the image (life is super-crazy, still), but could have done so (the entire image is one ggplot2 object).
This isn’t an “R > D3” post, though, since I use both. It’s about (a) reinforcing Alan’s posits that we should absolutely take inspiration from historical vis pioneers (so read more!) + need a diverse visualization “utility belt” (ref: Batman) to ensure you have the necessary tools to make a given visualization; (b) trusting your “Spidey-sense” when it comes to evaluating your creations/decisions; and, (c) showing that R is a great alternative to D3 for something like this :-)
Spider-man (you expected headier references from a dude with a shield avatar?) has this ability to sense danger right before it happens and if you’re making an effort to develop and share great visualizations, you definitely have this same sense in your DNA (though I would not recommend tossing pie charts at super-villains to stop them). When you’ve made something and it just doesn’t “feel right”, look to other sources of inspiration or reach out to your colleagues or the community for ideas or guidance. You can and do make awesome things, and you do have a “Spidey-sense”. You just need to listen to it more, add depth and breadth to your “utility belt” and keep improving with each creation you release into the wild.
R code for the ggplot vis reproduction is below, and it + the CSV file referenced are in this gist.
library(ggplot2)
library(dplyr)
ft <- read.csv("ftpop.csv", stringsAsFactors=FALSE)
arrange(ft, start_year) %>%
mutate(country=factor(country, levels=c(" ", rev(country), " "))) -> ft
ft_labs <- data_frame(
x=c(1900, 1950, 2000, 2050, 1900, 1950, 2000, 2050),
y=c(rep(" ", 4), rep(" ", 4)),
hj=c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
vj=c(1, 1, 1, 1, 0, 0, 0, 0)
)
ft_lines <- data_frame(x=c(1900, 1950, 2000, 2050))
ft_ticks <- data_frame(x=seq(1860, 2050, 10))
gg <- ggplot()
# tick marks & gridlines
gg <- gg + geom_segment(data=ft_lines, aes(x=x, xend=x, y=2, yend=16),
linetype="dotted", size=0.15)
gg <- gg + geom_segment(data=ft_ticks, aes(x=x, xend=x, y=16.9, yend=16.6),
linetype="dotted", size=0.15)
gg <- gg + geom_segment(data=ft_ticks, aes(x=x, xend=x, y=1.1, yend=1.4),
linetype="dotted", size=0.15)
# double & triple bars
gg <- gg + geom_segment(data=ft, size=5, color="#b0657b",
aes(x=start_year, xend=start_year+double, y=country, yend=country))
gg <- gg + geom_segment(data=ft, size=5, color="#eb9c9d",
aes(x=start_year+double, xend=start_year+double+triple, y=country, yend=country))
# tick labels
gg <- gg + geom_text(data=ft_labs, aes(x, y, label=x, hjust=hj, vjust=vj), size=3)
# annotations
gg <- gg + geom_label(data=data.frame(), hjust=0, label.size=0, size=3,
aes(x=1911, y=7.5, label="France is set to take\n157 years to triple the\nproportion ot its\npopulation aged 65+,\nChina only 34 years"))
gg <- gg + geom_curve(data=data.frame(), aes(x=1911, xend=1865, y=9, yend=15.5),
curvature=-0.5, arrow=arrow(length=unit(0.03, "npc")))
gg <- gg + geom_curve(data=data.frame(), aes(x=1915, xend=2000, y=5.65, yend=5),
curvature=0.25, arrow=arrow(length=unit(0.03, "npc")))
# pretty standard stuff here
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(1860, 2060))
gg <- gg + scale_y_discrete(drop=FALSE)
gg <- gg + labs(x=NULL, y=NULL, title="Emerging markets are ageing at a rapid rate",
subtitle="Time taken for population aged 65 and over to double and triple in proportion (from 7% of total population)",
caption="Source: http://on.ft.com/1Ys1W2H")
gg <- gg + theme_minimal()
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(plot.margin=margin(10,10,10,10))
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(size=9.5, margin=margin(b=10)))
gg <- gg + theme(plot.caption=element_text(size=7, margin=margin(t=-10)))
gg
Your data vis “Spidey-sense” & the need for a robust “utility belt”的更多相关文章
- Fitting Bayesian Linear Mixed Models for continuous and binary data using Stan: A quick tutorial
I want to give a quick tutorial on fitting Linear Mixed Models (hierarchical models) with a full var ...
- Machine Learning and Data Mining(机器学习与数据挖掘)
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
- JavaScript资源大全中文版(Awesome最新版)
Awesome系列的JavaScript资源整理.awesome-javascript是sorrycc发起维护的 JS 资源列表,内容包括:包管理器.加载器.测试框架.运行器.QA.MVC框架和库.模 ...
- PCI Express(四) - The transaction layer
原文出处:http://www.fpga4fun.com/PCI-Express4.html 感觉没什么好翻译的,都比较简单,主要讲了TLP的帧结构 In the transaction layer, ...
- Task schedule 分类: 比赛 HDU 查找 2015-08-08 16:00 2人阅读 评论(0) 收藏
Task schedule Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) Total ...
- Doubles 分类: POJ 2015-06-12 18:24 11人阅读 评论(0) 收藏
Doubles Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 19954 Accepted: 11536 Descrip ...
- codevs 3732 解方程
神题不可言会. f(x+p)=f(x)(mod p) #include<iostream> #include<cstdio> #include<cstring> # ...
- notes: the architecture of GDB
1. gdb structure at the largest scale,GDB can be said to have two sides to it:1. The "symbol si ...
- poj 2531 Network Saboteur(经典dfs)
题目大意:有n个点,把这些点分别放到两个集合里,在两个集合的每个点之间都会有权值,求可能形成的最大权值. 思路:1.把这两个集合标记为0和1,先默认所有点都在集合0里. 2 ...
随机推荐
- Spark入门实战
星星之火,可以燎原 Spark简介 Spark是一个开源的计算框架平台,使用该平台,数据分析程序可自动分发到集群中的不同机器中,以解决大规模数据快速计算的问题,同时它还向上提供一个优雅的编程范式,使得 ...
- Caffe学习系列(四)之--训练自己的模型
前言: 本文章记录了我将自己的数据集处理并训练的流程,帮助一些刚入门的学习者,也记录自己的成长,万事起于忽微,量变引起质变. 正文: 一.流程 1)准备数据集 2)数据转换为lmdb格式 3)计算 ...
- jQuery的工作原理
jQuery是为了改变javascript的编码方式而设计的. jQuery本身并不是UI组件库或其他的一般AJAX类库. 那么它是如何实现它的声明的呢? 先看一段简短的使用流程: (1).查找(创建 ...
- Javascript的内容
JS简介和变量 {JS的三种方式} 1 HTML中内嵌JS(不提倡使用) <button onclick="javascript:alert ...
- CF #Manthan, Codefest 16 C. Spy Syndrome 2 Trie
题目链接:http://codeforces.com/problemset/problem/633/C 大意就是给个字典和一个字符串,求一个用字典中的单词恰好构成字符串的匹配. 比赛的时候是用AC自动 ...
- Golang 微信机器人包
一. 最近用在学习golang,写了个小工具练练手.通过golang模拟微信网页端,接收微信服务器的消息并定制.可接入图灵机器人的api实现一个微信机器人的小玩具,当然了,可以有更多更好玩的玩法. 二 ...
- mui开发app之webview是什么
WebView(网络视图)能加载显示网页,可以将其视为一个浏览器,webview被封装在html5+,plus对象中,底层由java,OC实现. 先来谈谈我对webview的理解: 使用mui开发的a ...
- Java并发包分析——BlockingQueue
之前因为找实习的缘故,博客1个多月没有写了.找实习的经历总算告一段落,现在重新更新博客,这次的内容是分析Java并发包中的阻塞队列 关于阻塞队列,我之前是一直充满好奇,很好奇这个阻塞是怎么实现.现在我 ...
- Windows 10 碎片整理程序使用
1.打开 此电脑 2.右击任意一个磁盘,这里以C盘为例,点击 属性 3.点击 工具 选项卡--->优化 4.选中其中的一个盘 优化 5.然后静等结束,就好啦. 虽然不知道有什么用,...
- CSS完美实现iframe高度自适应(支持跨域)
Iframe的强大功能偶就不多说了,它不但被开发人员经常运用,而且黑客们也常常使用它,总之用过的人知道它的强大之处,但是Iframe有个致命的"BUG"就是iframe的高度无法自 ...