R语言学习笔记之<在Linux上遇到的问题集锦>

Standalone模式：Standalone模式运行的Spark集群对不同的应用程序采用先进先出（FIFO）的顺序进行调度。默认情况下每个应用程序会独占所有可用节点的资源。

现在版本的SparkR只能运行在standalone模式下

问题1：安装问题

由于R涉及到Fortran语言，要下载gcc-gfortran包

安装步骤：1)将R-3.2.3.tar.gz解压 2)./configure 3)make 4)make install(这步可以没有) 5)配置环境变量 vi .bash_profile

./configure的时候会出现以下错误：

--with-readline=yes (default) and headers/libs are not available 这是由于需要依赖readline-devel包的缘故 yum install readline-devel即可

configure: error: cannot compile a simple Fortran program 这是由于需要依赖gcc-gfortran包的缘故 yum install gcc-gfortran即可

configure: error: --with-x=yes (default) and X11 headers/libs are not available 这是由于需要依赖libXt-devel包的缘故 yum install libXt-devel即可

以上步骤依赖了较多的包：①gcc ②gcc-c++ ③readline-devel ④gcc-gfortran ⑤libXt-devel

yum install libXt-devel
yum install readline-devel

yum install gcc
yum install gcc-c++
yum install gcc-gfortran
tar -zxvf R-3.2.3.tar.gz
cd R-3.2.3
./configure
make

问题2：

unsupported URL scheme

Warning: unable to access index for repository https://rweb.crmda.ku.edu/cran/src/contrib

镜像问题，解决方式有两种：1）换镜像，即在选择的时候改 2）install.packages("RODBC", dependencies = TRUE, repos = "http://cran.rstudio.com/")

问题3：在安装R包的时候遇见错误

configure: error: "ODBC headers sql.hand sqlext.h not found"

是因为没有在Linux 下安装ODBC包。RODBC 需要 unixODBC 和unixODBC development 包，使用YUM 安装之后即可解决。

yum install unixODBC

yum install unixODBC-devel

则之后再install.packages("RODBC", dependencies = TRUE, repos = "http://cran.rstudio.com/")

一直连不上远程数据库，要查看一下是不是网络不通，ping一下远程主机。

SparkR编程示例：

#如果直接调用的sparkR,则不用设置Sys.setenv和.libPaths，直接library(SparkR)即可

#Sys.setenv(SPARK_HOME = "D:/StudySoftWare/Spark/spark-1.5.2-bin-hadoop2.6")
#.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master = "local")

#sc <- sparkR.init(master = "spark://192.168.133.11:7077") 以集群方式运行
sqlContext <- sparkRSQL.init(sc)
DF <- createDataFrame(sqlContext, faithful)
head(DF)
localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
df <- createDataFrame(sqlContext, localDF)
# Print its schema
printSchema(df)
# root
# |-- name: string (nullable = true)
# |-- age: double (nullable = true)
# Create a DataFrame from a JSON file
path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
peopleDF <- jsonFile(sqlContext, path)
printSchema(peopleDF)
# Register this DataFrame as a table.
registerTempTable(peopleDF, "people")
# SQL statements can be run by using the sql methods provided by sqlContext
teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
# Call collect to get a local data.frame
teenagersLocalDF <- collect(teenagers)
# Print the teenagers in our dataset
print(teenagersLocalDF)
# Stop the SparkContext now
sparkR.stop()

Java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory 遇到这种错误是因为：

looks like the issue was that code was looking for Rscript under "/usr/bin". Our default installation was /usr/revolutionr.
Just created a link Rscript in /usr/bin that points to /usr/revolution/bin/Revoscript

或者拷贝一份Rscript到/usr/bin目录下即可,参考：https://github.com/RevolutionAnalytics/RHadoop/issues/87

示例二：wordCount

library(SparkR)
sparkR.stop()
#调用sparkR的时候会自动的初始化一个SparkContext，默认是local模式
sc <- sparkR.init(master="spark://<pre name="code" class="plain">192.168.133.11

:7077","WordCount")#sparkR.init(master = "", appName = "SparkR",sparkHome = Sys.getenv("SPARK_HOME"), sparkEnvir = list(),sparkExecutorEnv = list(), s#parkJars = "", sparkPackages = "")

lines <- SparkR:::textFile(sc, "hdfs://namenode主机名/user/root/test/word.txt")

words <- SparkR:::flatMap(lines, function(line) { strsplit(line, " ")[[1]] })

wordCount <- SparkR:::lapply(words, function(word) { list(word, 1L) })

counts <- SparkR:::reduceByKey(wordCount, "+", 2L)

#如果要保存到hdfs中，则path要写成"hdfs://namenode主机名/user/root/test/sparkR.txt") path要给出全路径

SparkR:::saveAsTextFile(counts, "hdfs://namenode主机名/user/root/test/sparkR.txt")
##如果要保存到hdfs中，则path要写成"hdfs://namenode主机名/user/root/test/sparkR.txt") path要给出全路径
##如果要将createDataFrame(hc,生成的 sparkr dataframe 以文件形式存到hive中需要先将其转为rdd
data_in_rdd <- SparkR:::toRDD(data_in)
SparkR:::saveAsTextFile(data_in_rdd, evo_table_name_lower_with_path)

output <- SparkR:::collect(counts)

API documentation1：http://amplab-extras.github.io/SparkR-pkg/rdocs/1.2/index.html，该网址给出的API要这样调用SparkR:::函数名

API documentation2：http://spark.apache.org/docs/1.5.2/api/R/index.html，该网址给出的API可以直接调用。

R语言学习笔记之<在Linux上遇到的问题集锦>的更多相关文章

R语言学习笔记之: 论如何正确把EXCEL文件喂给R处理
博客总目录:http://www.cnblogs.com/weibaar/p/4507801.html ---- 前言: 应用背景兼吐槽继续延续之前每个月至少一次更新博客,归纳总结学习心得好习惯. ...
R语言学习笔记（二）
今天主要学习了两个统计学的基本概念:峰度和偏度,并且用R语言来描述. > vars<-c("mpg","hp","wt") &g ...
R语言学习笔记：小试R环境
买了三本R语言的书,同时使用来学习R语言,粗略翻下来感觉第一本最好: <R语言编程艺术>The Art of R Programming <R语言初学者使用>A Beginne ...
R语言学习笔记：基础知识
1.数据分析金字塔 2.[文件]-[改变工作目录] 3.[程序包]-[设定CRAN镜像] [程序包]-[安装程序包] 4.向量 c() 例:x=c(2,5,8,3,5,9) 例:x=c(1:100) ...
R语言学习笔记1——R语言中的基本对象
R语言,一种自由软件编程语言与操作环境,主要用于统计分析.绘图.数据挖掘.R本来是由来自新西兰奥克兰大学的Ross Ihaka和Robert Gentleman开发(也因此称为R),现在由“R开发核心 ...
R语言学习笔记——C#中如何使用R语言setwd()函数
在R语言编译器中,设置当前工作文件夹可以用setwd()函数. > setwd("e://桌面//")> setwd("e:\桌面\")> s ...
R语言学习笔记
向量化的函数向量化的函数 ifelse/which/where/any/all/cumsum/cumprod/对于矩阵而言,可以使用rowSums/colSums.对于“穷举所有组合问题" ...
R语言学习笔记：分析学生的考试成绩
孩子上初中时拿到过全年级一次考试所有科目的考试成绩表,正好可以用于R语言的统计分析学习.为了不泄漏孩子的姓名,就用学号代替了,感兴趣可以下载测试数据进行练习. num class chn math e ...
R语言学习笔记：字符串处理
想在R语言中生成一个图形文件的文件名,前缀是fitbit,后面跟上月份,再加上".jpg",先不百度,试了试其它语言的类似语法,没一个可行的: C#中:"fitbit&q ...

随机推荐

C#开发Windows Services服务--服务安装失败的解决办法
问题1:“System.Security.SecurityException:未找到源,但未能搜索某些或全部事件日志.不可访问的日志: Security.” 正在运行事务处理安装. 正在开始安装的“安 ...
SYS_R12 MOAC多组织的四个应用（案例）
2014-05-31 Created By BaoXinjian
PLSQL_统计信息系列09_统计信息在不同数据库中迁移
2014-01-05 Created By BaoXinjian
Linux时间子系统(六) POSIX timer
一.前言在用户空间接口函数文档中,我们描述了和POSIX timer相关的操作,主要包括创建一个timer.设定timer.获取timer的状态.获取timer overrun的信息.删除timer ...
Python hex() 函数
描述 hex() 函数用于将10进制整数转换成16进制整数. 语法 hex 语法: hex(x) 参数说明: x -- 10进制整数返回值返回16进制整数. 实例以下实例展示了 hex 的使用方 ...
get_class 返回对象的类名
get_class — 返回对象的类名传一个对象,可以把这个对象的类名返回出来(字符串) 参考: http://php.net/manual/zh/function.get-class.php
ecshop中ajax的调用
1.首先ecshop是如何定义ajax对象的. ecshop中的ajax对象是在js/transport.js文件中定义的.里面是ajax对象文件.声明了一个var Ajax = Transport; ...
PHPEXCEL导出excel表格中长数字文本自动转为科学计数法的解决办法
方法一:前面加空格 $objActSheet->setCellValue('A1', ' '.'330602198804224688'); 方法二: $objActSheet->setCe ...
Java：集合，Map接口框架图
Java集合大致可分为Set.List和Map三种体系,其中Set代表无序.不可重复的集合:List代表有序.重复的集合:而Map则代表具有映射关系的集合.Java 5之后,增加了Queue体系集合, ...
C/C++ 数据结构之算法（面试）
数据结构中的排序算法. 排序算法的相关知识: (1)排序的概念:所谓排序就是要整理文件中的记录,使之按关键字递增(或递减)次序排列起来. (2)稳定的排序方法:在待排序的文件中,若存在多个关键字相同的 ...

R语言学习笔记之<在Linux上遇到的问题集锦>

R语言学习笔记之<在Linux上遇到的问题集锦>的更多相关文章

随机推荐

热门专题