Install RHadoop with Hadoop 2.2 – Red Hat Linux
Prerequisite
Hadoop 2.2 has been installed (and the below installation steps should be applied on each of Hadoop node)
Step 1. Install R (by yum)
[hadoop@c0046220 yum.repos.d]$ sudo yum update
[hadoop@c0046220 yum.repos.d]$ yum search r-project
[hadoop@c0046220 yum.repos.d]$ sudo yum install R
...
Installed:
R.x86_64 0:3.0.2-1.el6
Dependency Installed:
R-core.x86_64 0:3.0.2-1.el6 R-core-devel.x86_64 0:3.0.2-1.el6 R-devel.x86_64 0:3.0.2-1.el6 R-java.x86_64 0:3.0.2-1.el6
R-java-devel.x86_64 0:3.0.2-1.el6 bzip2-devel.x86_64 0:1.0.5-7.el6_0 fontconfig-devel.x86_64 0:2.8.0-3.el6 freetype-devel.x86_64 0:2.3.11-14.el6_3.1
java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-1.62.1.11.11.90.el6_4 kpathsea.x86_64 0:2007-57.el6_2 libRmath.x86_64 0:3.0.2-1.el6 libRmath-devel.x86_64 0:3.0.2-1.el6
libXft-devel.x86_64 0:2.3.1-2.el6 libXmu.x86_64 0:1.1.1-2.el6 libXrender-devel.x86_64 0:0.9.7-2.el6 libicu.x86_64 0:4.2.1-9.1.el6_2
netpbm.x86_64 0:10.47.05-11.el6 netpbm-progs.x86_64 0:10.47.05-11.el6 pcre-devel.x86_64 0:7.8-6.el6 psutils.x86_64 0:1.17-34.el6
tcl.x86_64 1:8.5.7-6.el6 tcl-devel.x86_64 1:8.5.7-6.el6 tex-preview.noarch 0:11.85-10.el6 texinfo.x86_64 0:4.13a-8.el6
texinfo-tex.x86_64 0:4.13a-8.el6 texlive.x86_64 0:2007-57.el6_2 texlive-dvips.x86_64 0:2007-57.el6_2 texlive-latex.x86_64 0:2007-57.el6_2
texlive-texmf.noarch 0:2007-38.el6 texlive-texmf-dvips.noarch 0:2007-38.el6 texlive-texmf-errata.noarch 0:2007-7.1.el6 texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6
texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6 texlive-texmf-errata-latex.noarch 0:2007-7.1.el6 texlive-texmf-fonts.noarch 0:2007-38.el6 texlive-texmf-latex.noarch 0:2007-38.el6
texlive-utils.x86_64 0:2007-57.el6_2 tk.x86_64 1:8.5.7-5.el6 tk-devel.x86_64 1:8.5.7-5.el6 zlib-devel.x86_64 0:1.2.3-29.el6
Complete!
Validation:
[hadoop@c0046220 yum.repos.d]$ R
R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Step 2. Install RHadoop
2.1 Getting RHadoop Packages
Download packages rhdfs, rhbase and rmr2 from https://github.com/RevolutionAnalytics/RHadoop/wiki/Downloads and then run the R code below.
[hadoop@c0046220 RHadoop]$ cd /tmp
[hadoop@c0046220 tmp]$ mkdir RHadoop
[hadoop@c0046220 tmp]$ cd RHadoop
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhdfs/master/build/rhdfs_1.0.8.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rmr2/3.1.0/build/rmr2_3.1.0.tar.gz
[hadoop@c0046220 RHadoop]$ wget https://raw.githubusercontent.com/RevolutionAnalytics/rhbase/master/build/rhbase_1.2.0.tar.gz
2.2 Install R packages that RHadoop depends on.
[hadoop@c0046220 java]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_05
[hadoop@c0046220 java]$ sudo -i
[root@c0046220 ~]# export JAVA_HOME=/usr/java/jdk1.8.0_05
[root@c0046220 ~]# R CMD javareconf
[root@c0046220 ~]# R
...
> .libPaths();
[1] "/usr/lib64/R/library" "/usr/share/R/library"
> install.packages(c("rJava", "Rcpp", "RJSONIO", "bitops", "digest", "functional", "stringr", "plyr", "reshape2", "caTools"))
> #install.packages("caTools") #needed for rmr2
2.3 Install RHadoop
Set environment variables
[hadoop@c0046220 ~]$ vi ~/.bashrc
# set HADOOP locations for RHADOOP
export HADOOP_CMD=$HADOOP_HOME/bin/hadoop
export HADOOP_STREAMING=/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar
[hadoop@c0046220 ~]$ source .bashrc
[hadoop@c0040084 R]$ sudo -i
[root@c0040084 ~]# R
...
> Sys.setenv(HADOOP_HOME="/opt/hadoop/hadoop-2.2.0");
> Sys.setenv(HADOOP_CMD="/opt/hadoop/hadoop-2.2.0/bin/hadoop");
> Sys.setenv(HADOOP_STREAMING="/opt/hadoop/hadoop-2.2.0/share/hadoop/tools/lib/hadoop-streaming-2.2.0.jar");
> install.packages(pkgs="/tmp/RHadoop/rhdfs_1.0.8.tar.gz",repos=NULL);
> install.packages(pkgs="/tmp/RHadoop/rmr2_3.1.0.tar.gz",repos=NULL);
Step 3. Validation
Load and initialize the rhdfs package, and execute some simple commands as below:
library(rhdfs)
hdfs.init()
hdfs.ls("/")
[hadoop@c0046220 ~]$ R
...
> library(rhdfs)
Loading required package: rJava
...
Be sure to run hdfs.init()
> hdfs.init()
14/05/15 10:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> hdfs.ls("/")
permission owner group size modtime file
1 drwxr-xr-x hadoop supergroup 0 2014-05-14 03:05 /apps
2 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:40 /data
3 drwxr-xr-x hadoop supergroup 0 2014-05-12 09:45 /output
4 drwxrwx--- hadoop supergroup 0 2014-05-15 10:02 /tmp
5 drwxr-xr-x hadoop supergroup 0 2014-05-14 05:48 /user
6 drwxr-xr-x hadoop supergroup 0 2014-05-13 06:43 /usr
Load and initialize the rmr2 package, and execute some simple commands as below:
library(rmr2)
from.dfs(to.dfs(1:100))
from.dfs(mapreduce(to.dfs(1:100)))
[hadoop@c0046220 ~]$ R
...
> library(rmr2)
Loading required package: Rcpp
Loading required package: RJSONIO
Loading required package: bitops
Loading required package: digest
Loading required package: functional
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
> from.dfs(to.dfs(1:100))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> from.dfs(mapreduce(to.dfs(1:100)))
...
$key
NULL
$val
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
library(rmr2)
input<- '/user/hadoop/tmp.txt'
wordcount = function(input, output = NULL, pattern = " "){
wc.map = function(., lines) {
keyval(unlist( strsplit( x = lines,split = pattern)),1)
}
wc.reduce =function(word, counts ) {
keyval(word, sum(counts))
}
mapreduce(input = input ,output = output, input.format = "text",
map = wc.map, reduce = wc.reduce,combine = T)
}
wordcount(input)
> library(rmr2)
> input<- '/user/hadoop/tmp.txt'
> wordcount = function(input, output = NULL, pattern = " "){
+ wc.map = function(., lines) {
+ keyval(unlist( strsplit( x = lines,split = pattern)),1)
+ }
+
+ wc.reduce =function(word, counts ) {
+ keyval(word, sum(counts))
+ }
+
+ mapreduce(input = input ,output = output, input.format = "text",
+ map = wc.map, reduce = wc.reduce,combine = T)
+ }
>
> wordcount(input)
...
14/05/15 10:18:40 INFO mapreduce.Job: Job job_1399887026053_0013 completed successfully
14/05/15 10:18:40 INFO mapreduce.Job: Counters: 45
File System Counters
FILE: Number of bytes read=11018
FILE: Number of bytes written=278566
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2004
HDFS: Number of bytes written=11583
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed reduce tasks=1
Launched map tasks=2
Launched reduce tasks=2
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=23412
Total time spent by all reduces in occupied slots (ms)=13859
Map-Reduce Framework
Map input records=24
Map output records=112
Map output bytes=10522
Map output materialized bytes=11024
Input split bytes=208
Combine input records=112
Combine output records=114
Reduce input groups=105
Reduce shuffle bytes=11024
Reduce input records=114
Reduce output records=112
Spilled Records=228
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=569
CPU time spent (ms)=3700
Physical memory (bytes) snapshot=574214144
Virtual memory (bytes) snapshot=6258499584
Total committed heap usage (bytes)=365953024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1796
File Output Format Counters
Bytes Written=11583
rmr
reduce calls=110
14/05/15 10:18:40 INFO streaming.StreamJob: Output directory: /tmp/file612355aa2e35
function ()
{
fname
}
<environment: 0x37d70d0>
>
>
> from.dfs("/tmp/file612355aa2e35")
$key
[1] "-"
[2] "of"
[3] "Hong"
[4] "Paul's"
[5] "School"
[6] "College"
[7] "Graduate"
...
References
https://s3.amazonaws.com/RHadoop/RHadoop2.0.2u2_Installation_Configuration_for_RedHat.pdf

http://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Installing-R-under-Unix_002dalikes
http://www.rdatamining.com/tutorials/rhadoop
http://blog.fens.me/rhadoop-rhadoop/
http://datamgmt.com/installing-r-and-rstudio-on-redhat-or-centos-linux/
https://github.com/RevolutionAnalytics/RHadoop/wiki
https://github.com/RevolutionAnalytics/RHadoop/wiki/Which-Hadoop-for-rmr
Install RHadoop with Hadoop 2.2 – Red Hat Linux的更多相关文章
- Red hat Linux(Centos 5/6)安装R语言
Red hat Linux(Centos 5/6)安装R语言1 wget http://cran.rstudio.com/src/base/R-3/R-3.0.2.tar.gz2 tar xzvf R ...
- red hat Linux 使用CentOS yum源更新
red hat linux是商业版软件,没有经过注册是无法使用红帽 yum源更新软件的,使用CentOS源更新操作如下: 1.删除red hat linux 原有的yum 源 rpm -aq | gr ...
- red hat linux之Samba、DHCP、DNS、FTP、Web的安装与配置
本教程是在red hat linux 6.0环境下简单测试!教程没有图片演示,需要具有一定Linux基础知识,很多地方的配置需要根据自己的情况修改,照打不一定可以配置成功.(其他不足后续修改添加) y ...
- Red Hat Linux认证
想系统的学习一下Linux,了解了一些关于Red Hat Linux认证的信息.整理如下. 当前比较常见的是RHCE认证,即Red Hat Certified Engineer.最高级别的是RHCA ...
- Red Hat linux 如何增加swap空间
按步骤介绍 Red Hat linux 如何增加swap空间 方法/步骤 第一步:确保系统中有足够的空间来用做swap交换空间,我使用的是KVM,准备在一个独立的文件系统中添加一个swap交换文件,在 ...
- 分享red hat linux 6上安装oracle11g时遇到的gcc: error trying to exec 'cc1': execvp: No such file or directory的问题处理过程
安装环境:Red Hat Linux 6.5_x64.oracle11g 64bit 报错详情: 安装到68%时弹窗报错: 调用makefile '/test/app/Administrators/p ...
- Red Hat Linux 挂载外部资源
在我们安装的Red Hat Linux 中.当中一半机器为最主要的server配置,没有桌面环境.在从U盘上复制文件的时候可就犯难了.在网上查了查才知道.要訪问U盘就必须先将它们挂载到Linux系统的 ...
- Red Hat Linux 安装 (本地、网络安装)
Red Hat Linux 安装 (本地.网络安装) 650) this.width=650;" onclick='window.open("http://blog.51cto.c ...
- 在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装全过程
本教程讲述了通过在Red Hat Linux服务器端假设NSF Server来进行Linux系统安装的过程,并详细介绍了如何制作网络启动盘的细节.演示直观,讲解通俗易懂,特别适合初学者 ...
随机推荐
- C++设计模式---职责链模式
职责链模式:使多个对象都有机会处理请求,从而避免请求的发送者和接收者之间的耦合关系.将这个对象连成一条链,并沿这条链传递该请求,直到有一个对象处理它为止. 这里发出这个请求的客户端并不知道这当中的哪一 ...
- Android 图标右上角添加数字提醒
方法一:使用开源项目ViewBadger,github上的地址:https://github.com/jgilfelt/android-viewbadger 效果如图所示: <TextView ...
- python 学习笔记 基础
python对象三要素: identity(值):对应于内存的地址,不可修改type(类型):不可修改value(值): mutable :可以修改 immutable:不可以修改 引用计数当引用计数 ...
- C++:类成员函数的重载、覆盖和隐藏区别?
#include <iostream> class A { public: void func() { std::cout << "Hello" <& ...
- python小练习,打出1-100之间的所有偶数,设计一个函数,在桌面上创建10个文件,并以数字命名,复利计算函数
练习一:打出1-100之间的所有偶数 def even_print(): for i in range(1,101): if i % 2 == 0: print (i) even_print() #列 ...
- C#解leetcode 238. Product of Array Except Self
Given an array of n integers where n > 1, nums, return an array output such that output[i] is equ ...
- 11、SQL Server 视图、数据库快照
SQL Server 视图 什么是视图? 视图是一个虚拟的表,内容源于查询的结果集.只有当视图上建立了索引后,才会具体化. 视图可以筛选和处理数据,而不是直接访问基础表.如:创建一个视图,只展示源表中 ...
- CentOS6.6x86_64 部署 Nginx1.62+MySQL5.6.20+PHP5.6.4
准备工作 切换到管理员身份 su - 安装编译扩展 yum install -y gcc-c++ 创建数据库目录.代码目录 mkdir /mnt/data /mnt/www 安装Nginx 1.6.2 ...
- 工时统计的sql练习--包含时间处理
//按月统计,除去周末的考勤,(工时,请假,缺勤) --建表sql 创建[dbo].[AbsenceHourld]CREATE TABLE [dbo].[AbsenceHourld]( [id] [i ...
- .NET读取Excel
1.代码 string strConn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + Path + ";Ext ...