Yeah, that’s probably because the head() you’re invoking there is defined for SparkR DataFrames
[1] (note how you don’t have to use the SparkR::: namepsace in front of it), but SparkR:::textFile()
returns an RDD object, which is more like a distributed list data structure the way you’re
applying it over that .md text file. If you want to look at the first item or first several
items in the RDD, I think you want to use SparkR:::first() or SparkR:::take(), both of which
are applied to RDDs.

Just remember that all the functions described in the public API [2] for SparkR right now
are related mostly to working with DataFrames. You’ll have to use the R command line doc
or look at the RDD source code for all the private functions you might want (which includes
the doc strings used to make the R doc), whichever you find easier.

Alek

[1] -- http://spark.apache.org/docs/latest/api/R/head.html
[2] -- https://spark.apache.org/docs/latest/api/R/index.html
[3] -- https://github.com/apache/spark/blob/master/R/pkg/R/RDD.R

From: Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>
Date: Thursday, June 25, 2015 at 3:49 PM
To: Aleksander Eskilson <Alek.Eskilson@cerner.com<mailto:Alek.Eskilson@cerner.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: sparkR could not find function "textFile"

Hi Alek,

Just a follow up question. This is what I did in sparkR shell:

lines <- SparkR:::textFile(sc, "./README.md")
head(lines)

And I am getting error:

"Error in x[seq_len(n)] : object of type 'S4' is not subsettable"

I'm wondering what did I do wrong. Thanks in advance.

Wei

2015-06-25 13:44 GMT-07:00 Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>:
Hi Alek,

Thanks for the explanation, it is very helpful.

Cheers,
Wei

2015-06-25 13:40 GMT-07:00 Eskilson,Aleksander <Alek.Eskilson@cerner.com<mailto:Alek.Eskilson@cerner.com>>:
Hi there,

The tutorial you’re reading there was written before the merge of SparkR for Spark 1.4.0
For the merge, the RDD API (which includes the textFile() function) was made private, as the
devs felt many of its functions were too low level. They focused instead on finishing the
DataFrame API which supports local, HDFS, and Hive/HBase file reads. In the meantime, the
devs are trying to determine which functions of the RDD API, if any, should be made public
again. You can see the rationale behind this decision on the issue’s JIRA [1].

You can still make use of those now private RDD functions by prepending the function call
with the SparkR private namespace, for example, you’d use
SparkR:::textFile(…).

Hope that helps,
Alek

[1] -- https://issues.apache.org/jira/browse/SPARK-7230<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D7230&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=7RxLcWCdPWHoYk05KGwnohDZDileOX4Wo7Ht5SFge4I&s=ruNsApqV-sn8sBzSgJW0PIZ5beD_TvhLulQjeabR7p8&e=>

From: Wei Zhou <zhweisophie@gmail.com<mailto:zhweisophie@gmail.com>>
Date: Thursday, June 25, 2015 at 3:33 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: sparkR could not find function "textFile"

Hi all,

I am exploring sparkR by activating the shell and following the tutorial here https://amplab-extras.github.io/SparkR-pkg/<https://urldefense.proofpoint.com/v2/url?u=https-3A__amplab-2Dextras.github.io_SparkR-2Dpkg_&d=AwMFaQ&c=NRtzTzKNaCCmhN_9N2YJR-XrNU1huIgYP99yDsEzaJo&r=0vZw1rBdgaYvDJYLyKglbrax9kvQfRPdzxLUyWSyxPM&m=aL4A2Pv9tHbhgJUX-EnuYx2HntTnrqVpegm6Ag-FwnQ&s=qfOET1UvP0ECAKgnTJw8G13sFTi_PhiJ8Q89fMSgH_Q&e=>

And when I tried to read in a local file with textFile(sc, "file_location"), it gives an error
could not find function "textFile".

By reading through sparkR doc for 1.4, it seems that we need sqlContext to import data, for
example.

people <- read.df(sqlContext, "./examples/src/main/resources/people.json", "json"

)
And we need to specify the file type.

My question is does sparkR stop supporting general type file importing? If not, would appreciate
any help on how to do this.

PS, I am trying to recreate the word count example in sparkR, and want to import README.md
file, or just any file into sparkR.

Thanks in advance.

Best,
Wei

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation
and are intended only for the addressee. The information contained in this message is confidential
and may constitute inside or non-public information under international, federal, or state
securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such
information is strictly prohibited and may be unlawful. If you are not the addressee, please
promptly delete this message and notify the sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024<tel:%28%2B1%29%20%28816%29221-1024>.

sparkR could not find function "textFile"的更多相关文章

  1. j解决sparkr中使用某些r的原生函数 发生错误Error: class(objId) == "jobj" is not TRUE的问题

    Create table function in Spark in R not working João_Andre  (3) 询问的问题 | 2016年12月10日 06:03BLUEMIXRSPA ...

  2. 通过百度echarts实现数据图表展示功能

    现在我们在工作中,在开发中都会或多或少的用到图表统计数据显示给用户.通过图表可以很直观的,直接的将数据呈现出来.这里我就介绍说一下利用百度开源的echarts图表技术实现的具体功能. 1.对于不太理解 ...

  3. 在CentOS上安装并运行SparkR

    环境配置—— 操作系统:CentOS 6.5 JDK版本:1.7.0_67 Hadoop集群版本:CDH 5.3.0 安装过程—— 1.安装R yum install -y R 2.安装curl-de ...

  4. Apache Spark技术实战之5 -- SparkR的安装及使用

    欢迎转载,转载请注明出处,徽沪一郎. 概要 根据论坛上的信息,在Sparkrelease计划中,在Spark 1.3中有将SparkR纳入到发行版的可能.本文就提前展示一下如何安装及使用SparkR. ...

  5. SparkR安装部署及数据分析实例

    1. SparkR的安装配置 1.1.       R与Rstudio的安装 1.1.1.           R的安装 我们的工作环境都是在Ubuntu下操作的,所以只介绍Ubuntu下安装R的方法 ...

  6. (转载)SPARKR,对RDD操作的介绍

    原以为,用sparkR不能做map操作, 搜了搜发现可以. lapply等同于map, 但是不能操作spark RDD. spark2.0以后, sparkR增加了 dapply, dapplycol ...

  7. SPARKR,对RDD操作的介绍

    (转载)SPARKR,对RDD操作的介绍   原以为,用sparkR不能做map操作, 搜了搜发现可以. lapply等同于map, 但是不能操作spark RDD. spark2.0以后, spar ...

  8. sparkR介绍及安装

    sparkR介绍及安装 SparkR是AMPLab发布的一个R开发包,为Apache Spark提供了轻量的前端.SparkR提供了Spark中弹性分布式数据集(RDD)的API,用户可以在集群上通过 ...

  9. SparkR初体验2.0

    突然有个想法,R只能处理百万级别的数据,如果R能运行在Spark上多好!搜了下发现13年SparkR这个项目就启动了,感谢美帝! 1.你肯定得先装个spark吧.看这:Spark本地模式与Spark ...

随机推荐

  1. Unix环境高级编程(五)进程环境

    本章主要介绍了Unix进程环境,包含main函数是如何被调用的,命令行参数如何传递,存储方式布局,分配存储空间,环境变量,进程终止方法,全局跳转longjmp和setjmp函数及进程的资源限制. ma ...

  2. CodeForces 550A Two Substrings(模拟)

    [题目链接]click here~~  [题目大意]:  You are given string s. Your task is to determine if the given string s ...

  3. ISCC2014-reverse

    这是我做reverse的题解.在咱逆向之路上的mark一下,,水平有限,大牛见笑. 题目及题解链接:http://pan.baidu.com/s/1gd3k2RL 宗女齐姜 果然是仅仅有50分的难度, ...

  4. puppeteer (Nodejs版selenium )快速入门

    官网 https://pptr.dev/ api 与 教程: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.mdhttp ...

  5. django1.8读书笔记模版高级进阶

    一.概述 想要定制或者扩展模版引擎,模版系统工作原理,自动转移特征 名词解析:模板 渲染 就是是通过从context获取值来替换模板中变量并执行所有的模板标签. 二.Context处理器 如果在模版中 ...

  6. GPS轨迹数据集免费下载资源整理

    https://blog.csdn.net/liangyihuai/article/details/58335510

  7. Angularjs Module类的介绍及模块化依赖

    后面的学习我们会遵循一个控制器管理一个视图,一个路由对应一个视图的单一原则,所以再不会将controller控制器代码直接写到 index.html 中. 我们会应用到angular.js中强大的模块 ...

  8. [hihoCoder] #1096 : Divided Product

    时间限制:10000ms 单点时限:1000ms 内存限制:256MB 描述 Given two positive integers N and M, please divide N into sev ...

  9. IP分片

    物理网络层一般要限制每次发送数据帧的最大长度.任何时候IP层接受到一份要发送的IP数据报时,它要判断向本地哪个接口发送数据(选路),并查询该接口获得其MTU(最大传输单元:Maximum Transm ...

  10. [k8s]prometheus+alertmanager二进制安装实现简单邮件告警

    本次任务是用alertmanaer发一个报警邮件 本次环境采用二进制普罗组件 本次准备监控一个节点的内存,当使用率大于2%时候(测试),发邮件报警. k8s集群使用普罗官方文档 环境准备 下载二进制h ...