Introduction to Data Visualization – Theory, R & ggplot2

The topic of data visualization is very popular in the data science community. The market size for visualization products is valued at $4 Billion and is projected to reach $7 Billion by the end of 2022 according to Mordor Intelligence. While we have seen amazing advances in the technology to display information, the understanding of how, why, and when to use visualization techniques has not kept up. Unfortunately, people are often taught how to make a chart before even thinking about whether or not it’s appropriate.

In short, are you adding value to your work or are you simply adding this to make it seem less boring? Let’s take a look at some examples before going through the Stoltzmaniac Data Visualization Philosophy.


I have to give credit to Junk Charts – it inspired a lot of this post.

One author at Vox wanted to show the cause of death in all of Shakespeare

Is this not insane!?!?!

Using a legend instead of data callouts is the only thing that could have made this worse. The author could easily have used a number of other tools to get the point across. While wordles are not ideal for any work requiring exact proportions, it does make for a great visual in this article.Junk Charts Article.

To be clear, I’m not close to being perfect when it comes to visualizations in my blog. The sizes, shapes, font colors, etc. tend to get out of control and I don’t take the time in R to tinker with all of the details. However, when it comes to displaying things professionally, it has to be spot on! So, I’ll walk through my theory and not worry too much about aesthetics (save that for a time when you’re getting paid).


The Good, The Bad, The Ugly

“The Good” visualizations:

  • Clearly illustrate a point
  • Are tailored to the appropriate audience
    • Analysts may want detail
    • Executives may want a high-level view
  • Are tailored to the presentation medium
    • A piece in an academic journal can be analyzed slowly and carefully
    • A slide in front of 5,000 people in a conference will be glanced at quickly
  • Are memorable to those who care about the material
  • Make an impact which increases the understanding of the subject matter

“The Bad” visualizations:

  • Are difficult to interpret
  • Are unintentionally misleading
  • Contain redundant and boring information

“The Ugly” visualizations:

  • Are almost impossible to interpret
  • Are filled with completely worthless information
  • Are intentionally created to mislead the audience
  • Are inaccurate

Coming soon:

  • Introduction to the ggplot2 in R and how it works
  • Determining whether or not you need a visualization
  • Choosing the type of plot to use depending on the use case
  • Visualization beyond the standard charts and graphs

As always, the code used in this post is on my GitHub

转自:https://www.stoltzmaniac.com/data-visualization-part-1/

DATA VISUALIZATION – PART 1的更多相关文章

  1. 7 Tools for Data Visualization in R, Python, and Julia

    7 Tools for Data Visualization in R, Python, and Julia Last week, some examples of creating visualiz ...

  2. Data Visualization 课程 笔记1

    对数据可视化比较有兴趣,因此最近在看coursera上伊利诺伊大学香槟分校的数据可视化课程,做了一些笔记. 1. 定义 Data visualization is a high bandwidth c ...

  3. DATA VISUALIZATION – PART 2

    A Quick Overview of the ggplot2 Package in R While it will be important to focus on theory, I want t ...

  4. Data Visualization – Banking Case Study Example (Part 1-6)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  5. D3.js & Data Visualization & SVG

    D3.js & Data Visualization & SVG https://davidwalsh.name/learning-d3 // import {scaleLinear} ...

  6. charts & data visualization

    charts & data visualization https://www.sitepoint.com/15-best-javascript-charting-libraries/ Can ...

  7. 学习笔记之Bokeh Data Visualization | DataCamp

    Bokeh Data Visualization | DataCamp https://www.datacamp.com/courses/interactive-data-visualization- ...

  8. 学习笔记之Introduction to Data Visualization with Python | DataCamp

    Introduction to Data Visualization with Python | DataCamp https://www.datacamp.com/courses/introduct ...

  9. 学习笔记之Data Visualization

    Data visualization - Wikipedia https://en.wikipedia.org/wiki/Data_visualization Data visualization o ...

随机推荐

  1. 插入排序的优化非希尔【不靠谱地讲可以优化到O(nlogn)】 USACO 丑数

    首先我们先介绍一下普通的插排,就是我们现在一般写的那种,效率是O(n^2)的. 普通的插排基于的思想就是找位置,然后插入进去,其他在它后面的元素全部后移,下面是普通插排的代码: #include< ...

  2. 一个web应用的诞生(11)--在探首页

    就要面对本章的一个难点了,说是难点可能仅仅对于我来说,毕竟我是一个js渣,既然首页打算使用动态加载的形式,那么与后台交互的方式就要进行选择,目前比较流行的为RESTful的形式,关于RESTful的文 ...

  3. node.js系列(实例):原生node.js实现接收前台post请求提交数据

    前台界面: 前台代码: <form class="form-horizontal" method="post" action="http:127 ...

  4. JAVA常用集合源码解析系列-ArrayList源码解析(基于JDK8)

    文章系作者原创,如有转载请注明出处,如有雷同,那就雷同吧~(who care!) 一.写在前面 这是源码分析计划的第一篇,博主准备把一些常用的集合源码过一遍,比如:ArrayList.HashMap及 ...

  5. hadoop环境搭建之关于NAT模式静态IP的设置 ---VMware12+CentOs7

    很久没有更新了,主要是没有时间,今天挤出时间验证了一下,果然还是有些问题的,不过已经解决了,就发上来吧. PS:小豆腐看仔细了哦~ 关于hadoop环境搭建,从单机模式,到伪分布式,再到完全分布式,我 ...

  6. bzoj3064 Tyvj 1518 CPU监控

    Description Bob需要一个程序来监视CPU使用率.这是一个很繁琐的过程,为了让问题更加简单,Bob会慢慢列出今天会在用计算机时做什么事. Bob会干很多事,除了跑暴力程序看视频之外,还会做 ...

  7. WF4.0以上使用代码完整自定义动态生成执行工作流Xaml文件

    给大家分享一下,如何完全使用代码自定义的创建生成工作流文件(用代码创建Xaml文件),并且动态加载运行所生成的工作流. 工作流生成后 在Xaml文件里的主要节点如下: 输入输出参数 <x:Mem ...

  8. pthread的lowlevellock

    pthread的lowlevellock是futex的最简单的锁应用.也是pthread其它同步原语最基本的锁.lowlevellock提供(或实现)了三种锁(方法),一是基于0或1的互斥的锁规则,二 ...

  9. node.js—express+ejs、express+swig、

    安装:npm install -g express-generator 普通express 网站 创建:express testWeb 安装依赖:npm install 修改app.js文件并运行 找 ...

  10. UserManager

    刚刚学习servlet,打算学做一个小项目把前边学到的知识做一个总结. 由于只是实现了一些简单的功能,所以美工就凑合着看吧(美工其实也不太会). 首先项目整体架构如图 项目准备工作: 要用到mysql ...