Prerequisite

Hadoop 2.2 has been installed (and the below installation steps should be applied on each of Hadoop node)

 

Step 1. Install R (by yum)

[hadoop@c0046220 yum.repos.d]$ sudo yum update

 

[hadoop@c0046220 yum.repos.d]$ yum search r-project

 

[hadoop@c0046220 yum.repos.d]$ sudo yum install R

...

Installed:

R.x86_64 0:3.0.2-1.el6

 

Dependency Installed:

R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6

R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1

java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6

libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2

netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6

tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6

texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2

texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6

texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6

texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6

 

Complete!

 

Validation:

[hadoop@c0046220 yum.repos.d]$ R

 

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"

Copyright (C) 2013 The R Foundation for Statistical Computing

Platform: x86_64-redhat-linux-gnu (64-bit)

 

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

 

Natural language support but running in an English locale

 

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

 

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

 

>

 

 

Step 2. Install RHadoop

2.1 Getting RHadoop Packages

Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.

[hadoop@c0046220 RHadoop]$ cd /tmp

[hadoop@c0046220 tmp]$ mkdir RHadoop

[hadoop@c0046220 tmp]$ cd RHadoop

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz

 

[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz

 

2.2 Install R packages that RHadoop depends on.

[hadoop@c0046220 java]$ echo $JAVA_HOME

/usr/java/jdk1.8.0_05

 

[hadoop@c0046220 java]$ sudo -i

[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05

[root@c0046220 ~]# R CMD javareconf

[root@c0046220 ~]# R

...

> .libPaths();

[1] "/usr/lib64/R/library" "/usr/share/R/library"

 

> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))

> #install.packages("caTools") #needed for rmr2

 

2.3 Install RHadoop

Set environment variables

[hadoop@c0046220 ~]$ vi ~/.bashrc

# set HADOOP locations for RHADOOP

export HADOOP_CMD=$HADOOP_HOME/bin/hadoop

export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar

[hadoop@c0046220 ~]$ source .bashrc

 

[hadoop@c0040084 R]$ sudo -i

[root@c0040084 ~]# R

...

> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");

> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");

> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");

> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);

> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);

 

Step 3. Validation

Load and initialize the rhdfs package, and execute some simple commands as below:

library(rhdfs)

hdfs.init()

hdfs.ls("/")

[hadoop@c0046220 ~]$ R

...

> library(rhdfs)

Loading required package: rJava

...

Be sure to run hdfs.init()

> hdfs.init()

14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

> hdfs.ls("/")

permission owner group size modtime file

1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps

2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data

3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output

4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp

5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user

6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr

 

Load and initialize the rmr2 package, and execute some simple commands as below:

library(rmr2)

from.dfs(to.dfs(1:100))

from.dfs(mapreduce(to.dfs(1:100)))

[hadoop@c0046220 ~]$ R

...

> library(rmr2)

Loading required package: Rcpp

Loading required package: RJSONIO

Loading required package: bitops

Loading required package: digest

Loading required package: functional

Loading required package: reshape2

Loading required package: stringr

Loading required package: plyr

Loading required package: caTools

 

> from.dfs(to.dfs(1:100))

...

$key

NULL

 

$val

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

[91] 91 92 93 94 95 96 97 98 99 100

 

> from.dfs(mapreduce(to.dfs(1:100)))

...

$key

NULL

 

$val

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

[91] 91 92 93 94 95 96 97 98 99 100

 

 

library(rmr2)

input<- '/user/hadoop/tmp.txt'

wordcount = function(input, output = NULL, pattern = " "){

wc.map = function(., lines) {

keyval(unlist( strsplit( x = lines,split = pattern)),1)

}

 

wc.reduce =function(word, counts ) {

keyval(word, sum(counts))

}

 

mapreduce(input = input ,output = output, input.format = "text",

map = wc.map, reduce = wc.reduce,combine = T)

}

 

wordcount(input)

 

> library(rmr2)

> input<- '/user/hadoop/tmp.txt'

> wordcount = function(input, output = NULL, pattern = " "){

+ wc.map = function(., lines) {

+ keyval(unlist( strsplit( x = lines,split = pattern)),1)

+ }

+

+ wc.reduce =function(word, counts ) {

+ keyval(word, sum(counts))

+ }

+

+ mapreduce(input = input ,output = output, input.format = "text",

+ map = wc.map, reduce = wc.reduce,combine = T)

+ }

>

> wordcount(input)

...

14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully

14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45

File System Counters

FILE: Number of bytes read=11018

FILE: Number of bytes written=278566

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=2004

HDFS: Number of bytes written=11583

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Failed reduce tasks=1

Launched map tasks=2

Launched reduce tasks=2

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=23412

Total time spent by all reduces in occupied slots (ms)=13859

Map-Reduce Framework

Map input records=24

Map output records=112

Map output bytes=10522

Map output materialized bytes=11024

Input split bytes=208

Combine input records=112

Combine output records=114

Reduce input groups=105

Reduce shuffle bytes=11024

Reduce input records=114

Reduce output records=112

Spilled Records=228

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=569

CPU time spent (ms)=3700

Physical memory (bytes) snapshot=574214144

Virtual memory (bytes) snapshot=6258499584

Total committed heap usage (bytes)=365953024

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=1796

File Output Format Counters

Bytes Written=11583

rmr

reduce calls=110

14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35

function ()

{

fname

}

<environment: 0x37d70d0>

>

>

> from.dfs("/tmp/file612355aa2e35")

$key

[1] "-"

[2] "of"

[3] "Hong"

[4] "Paul's"

[5] "School"

[6] "College"

[7] "Graduate"

...

References

https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf

http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes

 

http://www.rdatamining.com/tutorials/rhadoop

http://blog.fens.me/rhadoop-rhadoop/

http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/

 

https://github.com/RevolutionAnalytics/RHadoop/wiki

https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr

Install RHadoop with Hadoop 2.2 – Red Hat Linux的更多相关文章

  1. Red hat Linux(Centos 5/6)安装R语言

    Red hat Linux(Centos 5/6)安装R语言1 wget http://cran.rstudio.com/src/base/R-3/R-3.0.2.tar.gz2 tar xzvf R ...

  2. red hat Linux 使用CentOS yum源更新

    red hat linux是商业版软件,没有经过注册是无法使用红帽 yum源更新软件的,使用CentOS源更新操作如下: 1.删除red hat linux 原有的yum 源 rpm -aq | gr ...

  3. red hat linux之Samba、DHCP、DNS、FTP、Web的安装与配置

    本教程是在red hat linux 6.0环境下简单测试!教程没有图片演示,需要具有一定Linux基础知识,很多地方的配置需要根据自己的情况修改,照打不一定可以配置成功.(其他不足后续修改添加) y ...

  4. Red Hat Linux认证

    想系统的学习一下Linux,了解了一些关于Red Hat Linux认证的信息.整理如下. 当前比较常见的是RHCE认证,即Red Hat Certified Engineer.最高级别的是RHCA ...

  5. Red Hat linux 如何增加swap空间

    按步骤介绍 Red Hat linux 如何增加swap空间 方法/步骤 第一步:确保系统中有足够的空间来用做swap交换空间,我使用的是KVM,准备在一个独立的文件系统中添加一个swap交换文件,在 ...

  6. 分享red hat linux 6上安装oracle11g时遇到的gcc: error trying to exec 'cc1': execvp: No such file or directory的问题处理过程

    安装环境:Red Hat Linux 6.5_x64.oracle11g 64bit 报错详情: 安装到68%时弹窗报错: 调用makefile '/test/app/Administrators/p ...

  7. Red Hat Linux 挂载外部资源

    在我们安装的Red Hat Linux 中.当中一半机器为最主要的server配置,没有桌面环境.在从U盘上复制文件的时候可就犯难了.在网上查了查才知道.要訪问U盘就必须先将它们挂载到Linux系统的 ...

  8. Red Hat Linux 安装 (本地、网络安装)

    Red Hat Linux 安装 (本地.网络安装) 650) this.width=650;" onclick='window.open("http://blog.51cto.c ...

  9. 在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装全过程

            本教程讲述了通过在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装的过程,并详细介绍了如何制作网络启动盘的细节.演示直观,讲解通俗易懂,特别适合初学者 ...

随机推荐

  1. 第一节:Scrapy开源框架初探

       Scrapy是一个为了爬取网站数据,提取结构性数据而编写的应用框架. 可以应用在包括数据挖掘,信息处理或存储历史数据等一系列的程序中.  具体开发流程如下:   一.确定待抓取网站 当您需要从某 ...

  2. FC和SCSI

    IDE(Integrated Drive Electronics)即"电子集成驱动器",它的本意是指把"硬盘控制器"与"盘体"集成在一起的硬 ...

  3. js点击弹出div层

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...

  4. Android 4.4(KitKat)中VSync信号的虚拟化

    原文地址:http://blog.csdn.net/jinzhuojun/article/details/17293325 Android 4.1(Jelly Bean)引入了Vsync(Vertic ...

  5. angular细节整理

    记录angularjs中比较容易忽视的问题 1.关于动态生成ui-sref的问题 ui-route中ui-sref中的路径无法动态生成的,如果要实现动态生成ui-sref路径,可以使用$state.g ...

  6. (Excel导出失败)检索COM类工厂中CLSID为{00024500-0000-0000-C000-000000000046}的组件时失

    在DCOM 中不存在WORD.EXCEL等OFFICE组件   最近在做一个关于office转存PDF的Web项目.开发过程一切顺利. 起初在网上找到一些Word,PPT转PDF的代码.很好用.一切顺 ...

  7. Excel 2007中的新文件格式

    *.xlsx:基于XML文件格式的Excel 2007工作簿缺省格式 *.xlsm:基于XML且启用宏的Excel 2007工作簿 *.xltx:Excel2007模板格式 *.xltm:Excel ...

  8. (转)php中global和$GLOBALS[]的分析之一

    PHP 的全局变量和 C 语言有一点点不同,在 C 语言中,全局变量在函数中自动生效,除非被局部变量覆盖     这可能引起一些问题,有些人可能漫不经心的改变一个全局变量.PHP 中全局变量在函数中使 ...

  9. DownloadProvider调试

    由于身边的同事离职,最近又接手了一个模块,DownloadProvider, 也就是安卓中自带的下载管理.此模块的代码量比较少,但是最近阅读代码却发现还是由不少知识点的.之前同事在此模块做了一个关于D ...

  10. $(document).ready(function(){}),$().ready(function(){})和$(function(){})三个有区别么

    三者都是一样的,最完整的写法是:$(document).ready(function(){})ready() 函数仅能用于当前文档,因此无需选择器.所以document选择器可以不要,那么就可以写成: ...