I had started a “52 Vis” initiative back in 2016 to encourage folks to get practice making visualizations since that’s the only way to get better at virtually anything. Life got crazy, 52 Vis fell to the wayside and now there are more visible alternatives such as Makeover Mondayand Workout Wednesday. They’re geared towards the “T” crowd (I’m not giving a closed source and locked-in-data product any more marketing than two links) but that doesn’t mean R, Python or other open-tool/open-data communities can’t join in for the ride and learning experience.

This week’s workout is a challenge to reproduce or improve upon a chart by Matt Stiles. You should go to both (give them the clicks and eyeballs they both deserve since they did great work). They both chose a line chart, but the whole point of these exercises is to try out new things to help you learn how to communicate better. I chose to use geom_segment() to make mini-column charts since that:

  • eliminates the giant rose-coloured rectangles that end up everywhere
  • helps show the differences a bit better (IMO), and
  • also helps highlight some of the states that have had more difficulties than others

Click/tap to “embiggen”. I kept the same dimensions that Andy did but unlike Matt’s creation this is a plain ol’ PNG as I didn’t want to deal with web fonts (I’m on a Museo Sans Condensed kick at the moment but don’t have it in my TypeKit config yet). I went with official annual unemployment numbers as they may be calculated/adjusted differently (I didn’t check, but I knew that data source existed, so I used it).

One reason I’m doing this is a quote on the Workout Wednesday post:

This will be a very tedious exercise. To provide some context, this took me 2-3 hours to create. Don’t get discouraged and don’t feel like you have to do it all in one sitting. Basically, try to make yours look identical to mine.

This took me 10 minutes to create in R:

#' ---
#' output:
#' html_document:
#' keep_md: true
#' ---
#+ message=FALSE
library(ggplot2)
library(hrbrmisc)
library(readxl)
library(tidyverse) # Use official BLS annual unemployment data vs manually calculating the average
# Source: https://data.bls.gov/timeseries/LNU04000000?years_option=all_years&periods_option=specific_periods&periods=Annual+Data
read_excel("~/Data/annual.xlsx", skip=10) %>%
mutate(Year=as.character(as.integer(Year)), Annual=Annual/100) -> annual_rate # The data source Andy Kriebel curated for you/us: https://1drv.ms/x/s!AhZVJtXF2-tD1UVEK7gYn2vN5Hxn #ty Andy!
read_excel("~/Data/staadata.xlsx") %>%
left_join(annual_rate) %>%
filter(State != "District of Columbia") %>%
mutate(
year = as.Date(sprintf("%s-01-01", Year)),
pct = (Unemployed / `Civilian Labor Force Population`),
us_diff = -(Annual-pct),
col = ifelse(us_diff<0,
"Better than U.S. National Average",
"Worse than U.S. National Average")
) -> df credits <- "Notes: Excludes the District of Columbia. 2016 figure represents October rate.\nData: U.S. Bureau of Labor Statistics <https://www.bls.gov/lau/staadata.txt>\nCredit: Matt Stiles/The Daily Viz <thedailyviz.com>" #+ state_of_us, fig.height=21.5, fig.width=8.75, fig.retina=2
ggplot(df, aes(year, us_diff, group=State)) +
geom_segment(aes(xend=year, yend=0, color=col), size=0.5) +
scale_x_date(expand=c(0,0), date_labels="'%y") +
scale_y_continuous(expand=c(0,0), label=scales::percent, limit=c(-0.09, 0.09)) +
scale_color_manual(name=NULL, expand=c(0,0),
values=c(`Better than U.S. National Average`="#4575b4",
`Worse than U.S. National Average`="#d73027")) +
facet_wrap(~State, ncol=5, scales="free_x") +
labs(x=NULL, y=NULL, title="The State of U.S. Jobs: 1976-2016",
subtitle="Percentage points below or above the national unemployment rate, by state. Negative values represent unemployment rates\nthat were lower — or better, from a jobs perspective — than the national rate.",
caption=credits) +
theme_hrbrmstr_msc(grid="Y", strip_text_size=9) +
theme(panel.background=element_rect(color="#00000000", fill="#f0f0f055")) +
theme(panel.spacing=unit(0.5, "lines")) +
theme(plot.subtitle=element_text(family="MuseoSansCond-300")) +
theme(legend.position="top")

Swap out ~/Data for where you stored the files.

The “weird” looking comments enable me to spin the script and is pretty much just the inverse markup for knitr R Markdown documents. As the comments say, you should really thank Andy for curating the BLS data for you/us.

If I really didn’t pine over aesthetics it would have taken me 5 minutes (most of that was waiting for re-rendering). Formatting the blog post took much longer. Plus, I can update the data source and re-run this in the future without clicking anything. This re-emphasizes a caution I tell my students: beware of dragon droppings (“drag-and-drop data science/visualization tools”).

Hopefully you presently follow or will start following Workout Wednesday and Makeover Monday and dedicate some time to hone your skills with those visualization katas.

转自:https://rud.is/b/2017/01/18/workout-wednesday-redux-2017-week-3/

Workout Wednesday Redux (2017 Week 3)的更多相关文章

  1. January 25 2017 Week 4 Wednesday

    In every triumph, there's a lot of try. 每个胜利背后都有许多尝试. There's a lot of try behind every success, and ...

  2. November 15th, 2017 Week 46th Wednesday

    Of all the tribulations in this world, boredom is the one most hard to bear. 所有的苦难中,无聊是最难以忍受的. When ...

  3. November 08th, 2017 Week 45th Wednesday

    Keep your face to the sunshine and you cannot see the shadow. 始终面朝阳光,我们就不会看到黑暗. I love sunshine, but ...

  4. November 01st, 2017 Week 44th Wednesday

    People always want to lead an active life, and is not it? 人们总要乐观生活,不是吗? Be active, and walk towards ...

  5. October 25th, 2017 Week 43rd Wednesday

    Perseverance is not a long race; it is many short races one after another. 坚持不是一个长跑,她是很多一个接一个的短跑. To ...

  6. October 18th 2017 Week 42nd Wednesday

    Only someone who is well-prepared has the opportunity to improvise. 只有准备充分的人才能够尽兴表演. From the first ...

  7. October 11th 2017 Week 41st Wednesday

    If you don't know where you are going, you might not get there. 如果你不知道自己要去哪里,你可能永远到不了那里. The reward ...

  8. October 04th 2017 Week 40th Wednesday

    We teach people how to remember, we never teach them how to grow. 我们教会人们如何记忆,却从来不教他们如何成长. Without pr ...

  9. September 27th 2017 Week 39th Wednesday

    We both look up at the same stars, yet we see such different things. 我们仰望同一片星空,却看见了不同的事物. Looking up ...

随机推荐

  1. package(1):tm

    tm包是R语言中为文本挖掘提供综合性处理的package,进行操作前载入tm包,vignette命令可以让你得到相关的文档说明.使用默认安装的R平台是不带tm  package的,在安装的过程中,它会 ...

  2. iterable

    iterable 阅读: 148111 遍历Array可以采用下标循环,遍历Map和Set就无法使用下标.为了统一集合类型,ES6标准引入了新的iterable类型,Array.Map和Set都属于i ...

  3. SqlParameter参数类型为int32时候的传值陷阱

    前2天在使用SqlParameter传递参数的时候遇到一个小坑,这里分享一下. SqlParameter para=new SqlParameter("@IsDeleted",0) ...

  4. 初识bd时的一些技能小贴士

    既然小豆腐如此给力,而且充分的利用主动学习的优势,已经有了迅速脑补,压倒式的优势,不过这只是表面而已,一切才刚刚开始,究竟鹿死谁手,还有待验证. 以上可以看到,小豆腐为什么拼命的要teach我们了么, ...

  5. Mvc自定义验证

    假设我们书店需要录入一本书,为了简单的体现我们的自定义验证,我们的实体定义的非常简单,就两个属性,一个名称Name,一个出版社Publisher. public class BookInfo { pu ...

  6. CF CROC 2016 Intellectual Inquiry

    题目链接:http://codeforces.com/contest/655/problem/E 大意是Bessie只会英文字母表中的前k种字母,现在有一个长度为m+n的字母序列,Bessie已经知道 ...

  7. inform表单验证,正则表达式,用户名,身份证,密码,验证码

    最近利用空闲时间写了部分表单验证,包括用户名,身份证,密码,验证码,仅为自己巩固最近所学的知识 表单的样式使用的是table布局,因为觉得DIV布局定位比较麻烦,table有三列,分别为基本信息,输入 ...

  8. JDK源码之PriorityQueue源码剖析

    除特别注明外,本站所有文章均为原创,转载请注明地址 一.优先队列的应用 优先队列在程序开发中屡见不鲜,比如操作系统在进行进程调度时一种可行的算法是使用优先队列,当一个新的进程被fork()出来后,首先 ...

  9. Android -- 从源码解析Handle+Looper+MessageQueue机制

    1,今天和大家一起从底层看看Handle的工作机制是什么样的,那么在引入之前我们先来了解Handle是用来干什么的 handler通俗一点讲就是用来在各个线程之间发送数据的处理对象.在任何线程中,只要 ...

  10. typedef和define的详细区别

    typedef是一种在计算机编程语言中用来声明自定义数据类型,配合各种原有数据类型来达到简化编程的目的的类型定义关键字. #define是预处理指令.下面让我们一起来看. typedef是C语言语句, ...