Install RHadoop with Hadoop 2.2 – Red Hat Linux
Prerequisite
Hadoop 2.2 has been installed (and the below installation steps should be applied on each of Hadoop node)
Step 1. Install R (by yum)
[hadoop@c0046220 yum.repos.d]$ sudo yum update
[hadoop@c0046220 yum.repos.d]$ yum search r-project
[hadoop@c0046220 yum.repos.d]$ sudo yum install R
...
Installed:
R.x86_64 0:3.0.2-1.el6
Dependency Installed:
R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6
R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1
java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6
libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2
netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6
tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6
texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2
texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6
texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6
texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6
Complete!
Validation:
[hadoop@c0046220 yum.repos.d]$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Step 2. Install RHadoop
2.1 Getting RHadoop Packages
Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.
[hadoop@c0046220 RHadoop]$ cd /tmp
[hadoop@c0046220 tmp]$ mkdir RHadoop
[hadoop@c0046220 tmp]$ cd RHadoop
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz
2.2 Install R packages that RHadoop depends on.
[hadoop@c0046220 java]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_05
[hadoop@c0046220 java]$ sudo -i
[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05
[root@c0046220 ~]# R CMD javareconf
[root@c0046220 ~]# R
...
> .libPaths();
[1] "/usr/lib64/R/library" "/usr/share/R/library"
> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
> #install.packages("caTools") #needed for rmr2
2.3 Install RHadoop
Set environment variables
[hadoop@c0046220 ~]$ vi ~/.bashrc
# set HADOOP locations for RHADOOP
export HADOOP_CMD=$HADOOP_HOME/bin/hadoop
export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
[hadoop@c0046220 ~]$ source .bashrc
[hadoop@c0040084 R]$ sudo -i
[root@c0040084 ~]# R
...
> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");
> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");
> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");
> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);
> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);
Step 3. Validation
Load and initialize the rhdfs package, and execute some simple commands as below:
library(rhdfs)
hdfs.init()
hdfs.ls("/")
[hadoop@c0046220 ~]$ R
...
> library(rhdfs)
Loading required package: rJava
...
Be sure to run hdfs.init()
> hdfs.init()
14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/")
permission owner group size modtime file
1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps
2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data
3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output
4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp
5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user
6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr
Load and initialize the rmr2 package, and execute some simple commands as below:
library(rmr2)
from.dfs(to.dfs(1:100))
from.dfs(mapreduce(to.dfs(1:100)))
[hadoop@c0046220 ~]$ R
...
> library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
> from.dfs(to.dfs(1:100))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> from.dfs(mapreduce(to.dfs(1:100)))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
library(rmr2)
input<- '/user/hadoop/tmp.txt'
wordcount = function(input, output = NULL, pattern = " "){
wc.map = function(., lines) {
keyval(unlist( strsplit( x = lines,split = pattern)),1)
}
wc.reduce =function(word, counts ) {
keyval(word, sum(counts))
}
mapreduce(input = input ,output = output, input.format = "text",
map = wc.map, reduce = wc.reduce,combine = T)
}
wordcount(input)
> library(rmr2)
> input<- '/user/hadoop/tmp.txt'
> wordcount = function(input, output = NULL, pattern = " "){
+ wc.map = function(., lines) {
+ keyval(unlist( strsplit( x = lines,split = pattern)),1)
+ }
+
+ wc.reduce =function(word, counts ) {
+ keyval(word, sum(counts))
+ }
+
+ mapreduce(input = input ,output = output, input.format = "text",
+ map = wc.map, reduce = wc.reduce,combine = T)
+ }
>
> wordcount(input)
...
14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully
14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=11018
FILE: Number of bytes written=278566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2004
HDFS: Number of bytes written=11583
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed reduce tasks=1
Launched map tasks=2
Launched reduce tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=23412
Total time spent by all reduces in occupied slots (ms)=13859
Map-Reduce Framework
Map input records=24
Map output records=112
Map output bytes=10522
Map output materialized bytes=11024
Input split bytes=208
Combine input records=112
Combine output records=114
Reduce input groups=105
Reduce shuffle bytes=11024
Reduce input records=114
Reduce output records=112
Spilled Records=228
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=569
CPU time spent (ms)=3700
Physical memory (bytes) snapshot=574214144
Virtual memory (bytes) snapshot=6258499584
Total committed heap usage (bytes)=365953024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=11583
rmr
reduce calls=110
14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35
function ()
{
fname
}
<environment: 0x37d70d0>
>
>
> from.dfs("/tmp/file612355aa2e35")
$key
[1] "-"
[2] "of"
[3] "Hong"
[4] "Paul's"
[5] "School"
[6] "College"
[7] "Graduate"
...
References
https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf

http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes
http://www.rdatamining.com/tutorials/rhadoop
http://blog.fens.me/rhadoop-rhadoop/
http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/
https://github.com/RevolutionAnalytics/RHadoop/wiki
https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr
Install RHadoop with Hadoop 2.2 – Red Hat Linux的更多相关文章
- Red hat Linux(Centos 5/6)安装R语言
Red hat Linux(Centos 5/6)安装R语言1 wget http://cran.rstudio.com/src/base/R-3/R-3.0.2.tar.gz2 tar xzvf R ...
- red hat Linux 使用CentOS yum源更新
red hat linux是商业版软件,没有经过注册是无法使用红帽 yum源更新软件的,使用CentOS源更新操作如下: 1.删除red hat linux 原有的yum 源 rpm -aq | gr ...
- red hat linux之Samba、DHCP、DNS、FTP、Web的安装与配置
本教程是在red hat linux 6.0环境下简单测试!教程没有图片演示,需要具有一定Linux基础知识,很多地方的配置需要根据自己的情况修改,照打不一定可以配置成功.(其他不足后续修改添加) y ...
- Red Hat Linux认证
想系统的学习一下Linux,了解了一些关于Red Hat Linux认证的信息.整理如下. 当前比较常见的是RHCE认证,即Red Hat Certified Engineer.最高级别的是RHCA ...
- Red Hat linux 如何增加swap空间
按步骤介绍 Red Hat linux 如何增加swap空间 方法/步骤 第一步:确保系统中有足够的空间来用做swap交换空间,我使用的是KVM,准备在一个独立的文件系统中添加一个swap交换文件,在 ...
- 分享red hat linux 6上安装oracle11g时遇到的gcc: error trying to exec 'cc1': execvp: No such file or directory的问题处理过程
安装环境:Red Hat Linux 6.5_x64.oracle11g 64bit 报错详情: 安装到68%时弹窗报错: 调用makefile '/test/app/Administrators/p ...
- Red Hat Linux 挂载外部资源
在我们安装的Red Hat Linux 中.当中一半机器为最主要的server配置,没有桌面环境.在从U盘上复制文件的时候可就犯难了.在网上查了查才知道.要訪问U盘就必须先将它们挂载到Linux系统的 ...
- Red Hat Linux 安装 (本地、网络安装)
Red Hat Linux 安装 (本地.网络安装) 650) this.width=650;" onclick='window.open("http://blog.51cto.c ...
- 在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装全过程
本教程讲述了通过在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装的过程,并详细介绍了如何制作网络启动盘的细节.演示直观,讲解通俗易懂,特别适合初学者 ...
随机推荐
- sql给整数补零
update hs_user.clientorder a set a.stockcode = lpad(a.stockcode,6,'0') where a.market = 'SZ'
- bzoj2049: [Sdoi2008]Cave 洞穴勘测
lct入门题? 得换根了吧TAT 这大概不是很成熟的版本.. #include<iostream> #include<cstring> #include<cstdlib& ...
- android圆角View实现及不同版本这间的兼容
在做我们自己的APP的时候,为了让APP看起来更加的好看,我们就需要将我们的自己的View做成圆角的,毕竟主流也是将很多东西做成圆角,和苹果的外观看起来差不多,看起来也还不错. 要将一个View做成圆 ...
- 在 foreach 里使用引用要注意的陷阱(转)
从一道面试题开始 在开始本节内容前,我们先来看看一道还算比较常见的PHP面试题: 1 $arr = array('1','2','3'); 2 3 foreach($arr as &$v) ...
- 解决XCode 4.x SVN无法连接的问题
XCode升级到4.X版本后,确实好用了不少.但普通都存在SVN无法连接的问题.XCode4.x Source Control功能迁移到了File - Source Control目录下,也出现了一些 ...
- [转] add-apt-repository
PS: 有些项目提供的是deb 地址,那么把deb地址加到repository里,下面是一个例子: sudo apt-get update sudo add-apt-repository 'deb h ...
- python简单小爬虫爬取易车网图片
上代码: import requests,urllib.request from bs4 import BeautifulSoup url = 'http://photo.bitauto.com/' ...
- mysql sql语句大全(2)
1.说明:创建数据库 CREATE DATABASE database-name 2.说明:删除数据库 drop database dbname 3.说明:备份sql server --- 创建 备份 ...
- FineUI属性的简单总结
.PageManager控件— 页面级别的控制(包括主题.语言 等等) 覆盖web.config中自定义结点的配置 EnablePageLoading:是否启用页面的第一次加载提示,默认居中显示加载图 ...
- sql查阅每一月的数据
因为项目中需要做数据报表的功能,需要统计每个月的销售额.我找到下面的sql语句.后来经过自己的测试,发现第二句才是可以用的, //String sql="SELECT year(buydat ...