In preparation for a R Workgroup meeting, I started thinking about what would be my "Top 5 R Functions". I ruled out the functions for basic mechanics - save, load, mean, etc. - they're obviously critical, but every programming language has them, so there's nothing especially "R" about them. I also ruled out the fancy statistical analysis functions like (g)lmer -- most people (including me) start using R because they want to run those analyses so it seemed a little redundant. I started using R because I wanted to do growth curve analysis, so it seems like a weak endorsement to say that I like R because it can do growth curve analysis. No, I like R because it makes (many) somewhat complex data operations really, really easy. Understanding how take advantage of these R functions is what transformed my view of R from purely functional (I need to do analysis X and R has functions for doing analysis X) to an all-purpose tool that allows me to do data processing, management, analysis, and visualization extremely quickly and easily. So, here are the 5 functions that did that for me:

  1. subset() for making subsets of data (natch)
  2. merge() for combining data sets in a smart and easy way
  3. melt() for converting from wide to long data formats
  4. dcast() for converting from long to wide data formats, and for making summary tables
  5. ddply() for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
For anyone interested, I posted my R Workgroup notes on how to use these functions on RPubs. Side note: after a little configuration, I found it super easy to write these using knitr, "knit" them into a webpage, and post that page on RPubs.
 
Conspicuously missing from the above list is ggplot, which I think deserves a special lifetime achievement award for how it has transformed how I think about data exploration and data visualization. I'm planning that for the next R Workgroup meeting.

My "Top 5 R Functions"(转)的更多相关文章

  1. Non-standard evaluation, how tidy eval builds on base R

    As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not somet ...

  2. 在top命令下kill和renice进程

    For common process management tasks, top is so great because it gives an overview of the most active ...

  3. 使用r.js来打包模块化的javascript文件

    前面的话 r.js(下载)是requireJS的优化(Optimizer)工具,可以实现前端文件的压缩与合并,在requireJS异步按需加载的基础上进一步提供前端优化,减小前端文件大小.减少对服务器 ...

  4. How-to: Do Statistical Analysis with Impala and R

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 基于R语言的时间序列指数模型

    时间序列: (或称动态数列)是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列.时间序列分析的主要目的是根据已有的历史数据对未来进行预测.(百度百科) 主要考虑的因素: 1.长期趋势(Lon ...

  6. 基于R语言的ARIMA模型

    A IMA模型是一种著名的时间序列预测方法,主要是指将非平稳时间序列转化为平稳时间序列,然后将因变量仅对它的滞后值以及随机误差项的现值和滞后值进行回归所建立的模型.ARIMA模型根据原序列是否平稳以及 ...

  7. Create and format Word documents using R software and Reporters package

    http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...

  8. keep or remove data frame columns in R

    You should use either indexing or the subset function. For example : R> df <- data.frame(x=1:5 ...

  9. a note of R software write Function

    Functionals “To become significantly more reliable, code must become more transparent. In particular ...

随机推荐

  1. java基础之类与对象1

    从学习java到现在估计都有一年了,然而在这一年里基本处于三天打鱼五天晒网,感觉自己再不做点改变就是个废人了..T - T. 趁着重新复习java的时间,也顺便用博客来记录学习的过程.好了,废话不说了 ...

  2. package(1):tm

    tm包是R语言中为文本挖掘提供综合性处理的package,进行操作前载入tm包,vignette命令可以让你得到相关的文档说明.使用默认安装的R平台是不带tm  package的,在安装的过程中,它会 ...

  3. PAT 1057

    Stack is one of the most fundamental data structures, which is based on the principle of Last In Fir ...

  4. Linux轻松使用vim

    VIM命令---Vi IMproved, a programmers text editor文本编辑 1>gedit   图形文本编辑工具 2>vim      字符界面的编辑工具 写脚本 ...

  5. IO流输入 输出流 字符字节流

    一.流 1.流的概念 流是一组有顺序的,有起点和终点的字节集合,是对数据传输的总称或抽象.即数据在两设备间的传输称为流,流的本质是数据传输,根据数据传输特性将流抽象为各种类,方便更直观的进行数据操作. ...

  6. 浅谈聚类算法(K-means)

    聚类算法(K-means)目的是将n个对象根据它们各自属性分成k个不同的簇,使得簇内各个对象的相似度尽可能高,而各簇之间的相似度尽量小. 而如何评测相似度呢,采用的准则函数是误差平方和(因此也叫K-均 ...

  7. ashMap源码阅读与解析

    目录结构 导入语 HashMap构造方法 put()方法解析 addEntry()方法解析 get()方法解析 remove()解析 HashMap如何进行遍历 导入语 HashMap是我们最常见也是 ...

  8. 针对Mac的DuckHunter攻击演示

    0x00 HID 攻击 HID是Human Interface Device的缩写,也就是人机交互设备,说通俗一点,HID设备一般指的是键盘.鼠标等等这一类用于为计算机提供数据输入的设备. DuckH ...

  9. css优先级之特殊性

    在前端开发的时候,css构建样式规则,这个时候我们会遇到一个问题:当我们对同一个元素做多个样式规则,其中发生了冲突的时候,css是如何选择最终呈现的样式 如下: div{ color:red; } d ...

  10. CAS单点登录(SSO)服务端的部署和配置---连接MySQL进行身份认证

    一.修改系统host,加入 127.0.0.1 server.test.com127.0.0.1 client1.test.com127.0.0.1 client2.test.com 二.安装grad ...