In preparation for a R Workgroup meeting, I started thinking about what would be my "Top 5 R Functions". I ruled out the functions for basic mechanics - save, load, mean, etc. - they're obviously critical, but every programming language has them, so there's nothing especially "R" about them. I also ruled out the fancy statistical analysis functions like (g)lmer -- most people (including me) start using R because they want to run those analyses so it seemed a little redundant. I started using R because I wanted to do growth curve analysis, so it seems like a weak endorsement to say that I like R because it can do growth curve analysis. No, I like R because it makes (many) somewhat complex data operations really, really easy. Understanding how take advantage of these R functions is what transformed my view of R from purely functional (I need to do analysis X and R has functions for doing analysis X) to an all-purpose tool that allows me to do data processing, management, analysis, and visualization extremely quickly and easily. So, here are the 5 functions that did that for me:

  1. subset() for making subsets of data (natch)
  2. merge() for combining data sets in a smart and easy way
  3. melt() for converting from wide to long data formats
  4. dcast() for converting from long to wide data formats, and for making summary tables
  5. ddply() for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
For anyone interested, I posted my R Workgroup notes on how to use these functions on RPubs. Side note: after a little configuration, I found it super easy to write these using knitr, "knit" them into a webpage, and post that page on RPubs.
 
Conspicuously missing from the above list is ggplot, which I think deserves a special lifetime achievement award for how it has transformed how I think about data exploration and data visualization. I'm planning that for the next R Workgroup meeting.

My "Top 5 R Functions"(转)的更多相关文章

  1. Non-standard evaluation, how tidy eval builds on base R

    As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not somet ...

  2. 在top命令下kill和renice进程

    For common process management tasks, top is so great because it gives an overview of the most active ...

  3. 使用r.js来打包模块化的javascript文件

    前面的话 r.js(下载)是requireJS的优化(Optimizer)工具,可以实现前端文件的压缩与合并,在requireJS异步按需加载的基础上进一步提供前端优化,减小前端文件大小.减少对服务器 ...

  4. How-to: Do Statistical Analysis with Impala and R

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 基于R语言的时间序列指数模型

    时间序列: (或称动态数列)是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列.时间序列分析的主要目的是根据已有的历史数据对未来进行预测.(百度百科) 主要考虑的因素: 1.长期趋势(Lon ...

  6. 基于R语言的ARIMA模型

    A IMA模型是一种著名的时间序列预测方法,主要是指将非平稳时间序列转化为平稳时间序列,然后将因变量仅对它的滞后值以及随机误差项的现值和滞后值进行回归所建立的模型.ARIMA模型根据原序列是否平稳以及 ...

  7. Create and format Word documents using R software and Reporters package

    http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...

  8. keep or remove data frame columns in R

    You should use either indexing or the subset function. For example : R> df <- data.frame(x=1:5 ...

  9. a note of R software write Function

    Functionals “To become significantly more reliable, code must become more transparent. In particular ...

随机推荐

  1. RabbitMQ-从基础到实战(6)— 与Spring集成

    0.目录 RabbitMQ-从基础到实战(1)- Hello RabbitMQ RabbitMQ-从基础到实战(2)- 防止消息丢失 RabbitMQ-从基础到实战(3)- 消息的交换(上) Rabb ...

  2. 关于在网页拼接时出现:提示Uncaught SyntaxError: missing ) after argument list;错误的原因分析

    1:网页拼接不完善,可能哪里漏了:),},</XX>...等 2:如果有动态数据写入的话,请注意转义动态数据,如图(是转义后的内容,不会报错): 在写方法时:onclick中,注意单双引号 ...

  3. KEIL中逻辑分析仪的使用

    本学期开了门嵌入式的课程,在实验课上用到了一款基于ARM Cortex-M3处理器的LPC1768的实验板.本来这种课程我觉得应该可以学到很多东西,可是我发现实验课上老师基本只是讲了xx实验课的要求, ...

  4. IOS 私有变量 私有属性的书写方法

    一.早期只能定义在.h文件中.用@private 关键字来定义私有变量. @interface ViewController{ @private Bool _isBool; } @end 二.允许在. ...

  5. MAC Mysql 重置密码

    使用mac电脑,当mysql登录密码忘记时,需要重置密码.步骤如下: 1. 关闭当前正在运行的mysql进程. A.进入"偏好设置",选择mysql, 再选"stop m ...

  6. es6面试问题——Promise

    话说刚换工作一个月有余,在上家公司干的实在是不开心,然后就出来抱着试试的心态出来面了几家公司,大多数公司问的前端问题也就那么多,其中有个面试问题让我记忆犹新,只因为没有答上来,哈哈! 当时面试官问我怎 ...

  7. Atom手动安装插件和模块的解决方案

    最近开始使用Atom编辑器写作.为了预览带LaTeX公式的markdown文档,尝试安装插件markdown-preview-plus,但是总是失败.经过仔细查看错误输出和网上相关问答,发现尽管报错为 ...

  8. uoj#228 基础数据结构练习题

    题面:http://uoj.ac/problem/228 正解:线段树. 我们可以发现,开根号时一个区间中的数总是趋近相等.判断一个区间的数是否相等,只要判断最大值和最小值是否相等就行了.如果这个区间 ...

  9. GBDT与LR融合提升广告点击率预估模型

    1GBDT和LR融合      LR模型是线性的,处理能力有限,所以要想处理大规模问题,需要大量人力进行特征工程,组合相似的特征,例如user和Ad维度的特征进行组合.      GDBT天然适合做特 ...

  10. HTTP协议详解以及URL具体访问过程

    1.简介 1.1.HTTP协议是什么? 即超文本传输协议(HTTP,HyperText Transfer Protocol)是互联网上应用最为广泛的一种网络协议,所有的WWW文件都必须遵守这个标准.从 ...