In preparation for a R Workgroup meeting, I started thinking about what would be my "Top 5 R Functions". I ruled out the functions for basic mechanics - save, load, mean, etc. - they're obviously critical, but every programming language has them, so there's nothing especially "R" about them. I also ruled out the fancy statistical analysis functions like (g)lmer -- most people (including me) start using R because they want to run those analyses so it seemed a little redundant. I started using R because I wanted to do growth curve analysis, so it seems like a weak endorsement to say that I like R because it can do growth curve analysis. No, I like R because it makes (many) somewhat complex data operations really, really easy. Understanding how take advantage of these R functions is what transformed my view of R from purely functional (I need to do analysis X and R has functions for doing analysis X) to an all-purpose tool that allows me to do data processing, management, analysis, and visualization extremely quickly and easily. So, here are the 5 functions that did that for me:

  1. subset() for making subsets of data (natch)
  2. merge() for combining data sets in a smart and easy way
  3. melt() for converting from wide to long data formats
  4. dcast() for converting from long to wide data formats, and for making summary tables
  5. ddply() for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
For anyone interested, I posted my R Workgroup notes on how to use these functions on RPubs. Side note: after a little configuration, I found it super easy to write these using knitr, "knit" them into a webpage, and post that page on RPubs.
 
Conspicuously missing from the above list is ggplot, which I think deserves a special lifetime achievement award for how it has transformed how I think about data exploration and data visualization. I'm planning that for the next R Workgroup meeting.

My "Top 5 R Functions"(转)的更多相关文章

  1. Non-standard evaluation, how tidy eval builds on base R

    As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not somet ...

  2. 在top命令下kill和renice进程

    For common process management tasks, top is so great because it gives an overview of the most active ...

  3. 使用r.js来打包模块化的javascript文件

    前面的话 r.js(下载)是requireJS的优化(Optimizer)工具,可以实现前端文件的压缩与合并,在requireJS异步按需加载的基础上进一步提供前端优化,减小前端文件大小.减少对服务器 ...

  4. How-to: Do Statistical Analysis with Impala and R

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 基于R语言的时间序列指数模型

    时间序列: (或称动态数列)是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列.时间序列分析的主要目的是根据已有的历史数据对未来进行预测.(百度百科) 主要考虑的因素: 1.长期趋势(Lon ...

  6. 基于R语言的ARIMA模型

    A IMA模型是一种著名的时间序列预测方法,主要是指将非平稳时间序列转化为平稳时间序列,然后将因变量仅对它的滞后值以及随机误差项的现值和滞后值进行回归所建立的模型.ARIMA模型根据原序列是否平稳以及 ...

  7. Create and format Word documents using R software and Reporters package

    http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...

  8. keep or remove data frame columns in R

    You should use either indexing or the subset function. For example : R> df <- data.frame(x=1:5 ...

  9. a note of R software write Function

    Functionals “To become significantly more reliable, code must become more transparent. In particular ...

随机推荐

  1. 美团点评DBProxy读写分离使用说明

    目的 因为业务架构上需要实现读写分离,刚好前段时间美团点评开源了在360Atlas基础上开发的读写分离中间件DBProxy,关于其介绍在官方文档已经有很详细的说明了,其特性主要有:读写分离.负载均衡. ...

  2. 卷积神经网络CNN公式推导走读

      0有全连接网络,为什么还需要RNN 图像处理领域的特殊性,      全连接网络缺点:                              RNN解决办法:      1参数太多       ...

  3. 转账示例(四):service层面实现(线程管理Connection,AOP思想,动态代理)(本例采用QueryRunner来执行sql语句,数据源为C3P0)

    用了AOP(面向切面编程),实现动态代理,service层面隐藏了开启事务.1.自行创建C3P0Uti,account数据库,导入Jar包 2.Dao层面 接口: package com.learni ...

  4. Spring+SpringMVC+MyBatis+easyUI整合优化篇(九)数据层优化-jdbc连接池简述、druid简介

    日常啰嗦 终于回到既定轨道上了,这一篇讲讲数据库连接池的相关知识,线程池以后有机会再结合项目单独写篇文章(自己给自己挖坑,不知道什么时候能填上),从这一篇文章开始到本阶段结束的文章都会围绕数据库和da ...

  5. 【Tomcat源码学习】-5.请求处理

    前四章节,主要对Tomcat启动过程中,容器加载.应用加载.连接器初始化进行了相关的原理和代码流程进行了学习.接下来开始进行接受网络请求后的相关处理学习.   一.整体流程      基于上一节图示进 ...

  6. GitHub上最受欢迎的iOS开源项目TOP20

    AFNetworking 在众多iOS开源项目中,AFNetworking可以称得上是最受开发者欢迎的库项目.AFNetworking是一个轻量级的iOS.Mac OS X网络通信类库,现在是GitH ...

  7. JS中的几种函数

    函数可以说是js中最具特色的地方,在这里我将分享一下有关函数的相关知识: 包装函数:        (function foo(){...})作为函数表达式意味着foo只能在...所代表的位置中被访问 ...

  8. 内存管理之slab分配器

    基本思想 与传统的内存管理模式相比, slab 缓存分配器提供了很多优点.首先,内核通常依赖于对小对象的分配,它们会在系统生命周期内进行无数次分配.slab 缓存分配器通过对类似大小的对象进行缓存而提 ...

  9. object detection技术演进:RCNN、Fast RCNN、Faster RCNN

    object detection我的理解,就是在给定的图片中精确找到物体所在位置,并标注出物体的类别.object detection要解决的问题就是物体在哪里,是什么这整个流程的问题.然而,这个问题 ...

  10. 运用三角不等式加速Kmeans聚类算法

    运用三角不等式加速Kmeans聚类算法 引言:最近在刷<数据挖掘导论>,第九章, 9.5.1小节有提到,可以用三角不等式,减少不必要的距离计算,从而达到加速聚类算法的目的.这在超大数据量的 ...