In preparation for a R Workgroup meeting, I started thinking about what would be my "Top 5 R Functions". I ruled out the functions for basic mechanics - save, load, mean, etc. - they're obviously critical, but every programming language has them, so there's nothing especially "R" about them. I also ruled out the fancy statistical analysis functions like (g)lmer -- most people (including me) start using R because they want to run those analyses so it seemed a little redundant. I started using R because I wanted to do growth curve analysis, so it seems like a weak endorsement to say that I like R because it can do growth curve analysis. No, I like R because it makes (many) somewhat complex data operations really, really easy. Understanding how take advantage of these R functions is what transformed my view of R from purely functional (I need to do analysis X and R has functions for doing analysis X) to an all-purpose tool that allows me to do data processing, management, analysis, and visualization extremely quickly and easily. So, here are the 5 functions that did that for me:

  1. subset() for making subsets of data (natch)
  2. merge() for combining data sets in a smart and easy way
  3. melt() for converting from wide to long data formats
  4. dcast() for converting from long to wide data formats, and for making summary tables
  5. ddply() for doing split-apply-combine operations, which covers a huge swath of the most tricky data operations
For anyone interested, I posted my R Workgroup notes on how to use these functions on RPubs. Side note: after a little configuration, I found it super easy to write these using knitr, "knit" them into a webpage, and post that page on RPubs.
 
Conspicuously missing from the above list is ggplot, which I think deserves a special lifetime achievement award for how it has transformed how I think about data exploration and data visualization. I'm planning that for the next R Workgroup meeting.

My "Top 5 R Functions"(转)的更多相关文章

  1. Non-standard evaluation, how tidy eval builds on base R

    As with many aspects of the tidyverse, its non-standard evaluation (NSE) implementation is not somet ...

  2. 在top命令下kill和renice进程

    For common process management tasks, top is so great because it gives an overview of the most active ...

  3. 使用r.js来打包模块化的javascript文件

    前面的话 r.js(下载)是requireJS的优化(Optimizer)工具,可以实现前端文件的压缩与合并,在requireJS异步按需加载的基础上进一步提供前端优化,减小前端文件大小.减少对服务器 ...

  4. How-to: Do Statistical Analysis with Impala and R

    sklearn实战-乳腺癌细胞数据挖掘(博客主亲自录制视频教程) https://study.163.com/course/introduction.htm?courseId=1005269003&a ...

  5. 基于R语言的时间序列指数模型

    时间序列: (或称动态数列)是指将同一统计指标的数值按其发生的时间先后顺序排列而成的数列.时间序列分析的主要目的是根据已有的历史数据对未来进行预测.(百度百科) 主要考虑的因素: 1.长期趋势(Lon ...

  6. 基于R语言的ARIMA模型

    A IMA模型是一种著名的时间序列预测方法,主要是指将非平稳时间序列转化为平稳时间序列,然后将因变量仅对它的滞后值以及随机误差项的现值和滞后值进行回归所建立的模型.ARIMA模型根据原序列是否平稳以及 ...

  7. Create and format Word documents using R software and Reporters package

    http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...

  8. keep or remove data frame columns in R

    You should use either indexing or the subset function. For example : R> df <- data.frame(x=1:5 ...

  9. a note of R software write Function

    Functionals “To become significantly more reliable, code must become more transparent. In particular ...

随机推荐

  1. XML文档结构

    <?xml version="1.0" encoding="UTF-8"?> <books> <bool id="bk1 ...

  2. Windows下JIRA6.3.6安装、汉化、破解

    一.MySQL建库和建账号 1. mysql中创建数据库jiradb create database jiradb character set 'UTF8'; 2.创建数据库用户并赋于权限 creat ...

  3. Caffe学习系列(四)之--训练自己的模型

    前言: 本文章记录了我将自己的数据集处理并训练的流程,帮助一些刚入门的学习者,也记录自己的成长,万事起于忽微,量变引起质变. 正文: 一.流程 1)准备数据集  2)数据转换为lmdb格式  3)计算 ...

  4. ABAP 7.4 新语法-内嵌生命和内表操作

    1.内嵌声明 2.内表操作 3.opensql ************************************************************************ 1. ...

  5. Nodejs进阶:MD5入门介绍及crypto模块的应用

    本文摘录自<Nodejs学习笔记>,更多章节及更新,请访问 github主页地址.欢迎加群交流,群号 197339705. 简介 MD5(Message-Digest Algorithm) ...

  6. 从网络通信角度谈web性能优化

    衡量一个网站的性能有多个指标,DNS解析时间,TCP链接时间,HTTP重定向时间,等待服务器响应时间等等,从用户角度来看,就可以归结为该网站访问速度的快慢.也就是说性能等于网站的访问速度. 早些年Am ...

  7. .net应用程序中添加chm帮助文档打开显示此程序无法显示网页问题

    在做.net大作业时添加了chm帮助文档结果在打开时显示“此程序无法显示网页问题”,但是把帮助文档拷到别的路径下却显示正常, 经过从网上查找,终于找到了答案: (1).chm文件的路径中不能含有“#” ...

  8. 基于JS的问卷调查

    主要工作 因为代码不好展示,也不好截长图,可以去看我的GitHub地址:https://github.com/14glwu/MyBlog/blob/master/questionnaire.html ...

  9. 思考题:用Use Case获取需求的方法是否有什么缺陷,还有什么地方需要改进?(提示:是否对所有的应用领域都适用?使用的方便性?.......)

    思考题: 用Use Case获取需求的方法是否有什么缺陷,还有什么地方需要改进?(提示:是否对所有的应用领域都适用?使用的方便性?.......) 简答: 一.用例解释: 在软件工程中,用例是一种在开 ...

  10. python循环

    #!/usr/bin/python #coding:utf-8 lcf_age = "19" count = 0 while count < 3: c_lcf_age = i ...