CDH spark 命令行测试

一、

参考 https://www.cnblogs.com/bovenson/p/5801536.html

[root@node- test]# chown hdfs:hdfs     /root/test/*

[root@node-1 test]# chown hdfs:hdfs     /root/test

[root@node-1 test]# cd /var/lib/hadoop-hdfs/

[root@node-1 hadoop-hdfs]# ls

[root@node-1 hadoop-hdfs]# mkdir  /var/lib/hadoop-hdfs/test

[root@node-1 hadoop-hdfs]# mv  /root/test/* /var/lib/hadoop-hdfs/test

[root@node-1 hadoop-hdfs]# su hdfs

Attempting to create directory /var/lib/hadoop-hdfs/perl5

[hdfs@node-1 ~]$  hadoop fs  -put test/*.txt /user/tt/me.txt

$ hadoop fs -ls -R /user/tt

-rw-r--r--   3 hdfs supergroup      56302 2018-01-30 20:59 /user/tt/me.txt

# spark-shell --master spark://node-1:7077 --executor-memory 100M --driver-memory 1000M 

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_162)

Type in expressions to have them evaluated.

Type :help for more information.

Spark context available as sc (master = spark://node-1:7077, app id = app-20180130211525-0002).

SQL context available as sqlContext.

scala> val lines  = sc.textFile("hdfs://node-1:8020/user/tt/me.txt")

lines: org.apache.spark.rdd.RDD[String] = hdfs://node-1:8020/user/tt/me.txt MapPartitionsRDD[1] at textFile at <console>:27

scala> lines.count()

res0: Long = 10000                                                              

scala> lines.first()

res1: String = 4

scala>

spark 文件行数统计-命令行

二、

备注：

# spark-shell --help

Usage: ./bin/spark-shell [options]

Options:

--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).

--class CLASS_NAME Your application's main class (for Java / Scala apps).

--name NAME A name of your application.

--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.

--packages Comma-separated list of maven coordinates of jars to include

on the driver and executor classpaths. Will search the local

maven repo, then maven central and any additional remote

repositories given by --repositories. The format for the

coordinates should be groupId:artifactId:version.

--exclude-packages Comma-separated list of groupId:artifactId, to exclude while

resolving the dependencies provided in --packages to avoid

dependency conflicts.

--repositories Comma-separated list of additional remote repositories to

search for the maven coordinates given with --packages.

--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATH for Python apps.

--files FILES Comma-separated list of files to be placed in the working

directory of each executor.

--conf PROP=VALUE Arbitrary Spark configuration property.

--properties-file FILE Path to a file from which to load extra properties. If not

specified, this will look for conf/spark-defaults.conf.

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).

--driver-java-options Extra Java options to pass to the driver.

--driver-library-path Extra library path entries to pass to the driver.

--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--proxy-user NAME User to impersonate when submitting the application.

--help, -h Show this help message and exit

--verbose, -v Print additional debug output

--version, Print the version of current Spark

Spark standalone with cluster deploy mode only:

--driver-cores NUM Cores for driver (Default: ).

Spark standalone or Mesos with cluster deploy mode only:

--supervise If given, restarts the driver on failure.

--kill SUBMISSION_ID If given, kills the driver specified.

--status SUBMISSION_ID If given, requests the status of the driver specified.

Spark standalone and Mesos only:

--total-executor-cores NUM Total cores for all executors.

Spark standalone and YARN only:

--executor-cores NUM Number of cores per executor. (Default: in YARN mode,

or all available cores on the worker in standalone mode)

YARN-only:

--driver-cores NUM Number of cores used by the driver, only in cluster mode

(Default: ).

--queue QUEUE_NAME The YARN queue to submit to (Default: "default").

--num-executors NUM Number of executors to launch (Default: ).

--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

--principal PRINCIPAL Principal to be used to login to KDC, while running on

secure HDFS.

--keytab KEYTAB The full path to the file that contains the keytab for the

principal specified above. This keytab will be copied to

the node running the Application Master via the Secure

Distributed Cache, for renewing the login tickets and the

delegation tokens periodically.

spark-shell --help

CDH spark 命令行测试的更多相关文章

Linux命令行测试网速speedtest.net
Linux命令行测试网速speedtest.net 当发现上网速度变慢时,人们通常会先首先测试自己的电脑到网络服务提供商(通常被称为"最后一公里")的网络连接速度.在可用于测试宽带 ...
I.MX6 Android CAN 命令行测试
/********************************************************************* * I.MX6 Android CAN 命令行测试 * 说 ...
[转]使用Linux命令行测试网速
装speedtest-cli speedtest-cli是一个用Python编写的轻量级Linux命令行工具,在Python2.4至3.4版本下均可运行.它基于Speedtest.net的基础架构来测 ...
使用Linux命令行测试网速
安装speedtest speedtest是一个用Python编写的轻量级Linux命令行工具,在Python2.4至3.4版本下均可运行.它基于Speedtest.net的基础架构来测量网络的上/下 ...
使用Linux命令行测试网速-----speedtest-cli
https://github.com/sivel/speedtest-cli 当发现上网速度变慢时,人们通常会先首先测试自己的电脑到网络服务提供商(通常被称为“最后一公里”)的网络连接速度.在可用于测 ...
gtest命令行测试案例
使用gtest编写的测试案例通常本身就是一个可执行文件,因此运行起来非常方便.同时,gtest也为我们提供了一系列的运行参数(环境变量.命令行参数或代码里指定),使得我们可以对案例的执行进行一些有效的 ...
monkey命令行测试
一. 什么是Monkey monkey是google提供的一个用于稳定性与压力测试的命令行工具.monkey程序由android系统自带,位于/sdcard/system/framework/monk ...
Junit 命令行测试报错：Could not find class 理解及解决方法
一.报错 : 『Could not find class』下面给出三个示例比较,其中只有第一个是正确的. 1. MyComputer:bin marikobayashi$ java -cp .:./ ...
laravel 命令行测试 Uncaught ReflectionException: Class config does not exist
require __DIR__ . '/vendor/autoload.php'; $app = require_once __DIR__ . '/bootstrap/app.php'; config ...

随机推荐

异步消息处理机制相关面试问题-intentservice面试问题详解
IntentService是什么? IntentService是继承并处理异步请求的一个类,在IntentService内有一个工作线程来处理耗时操作,启动IntentService方法和启动传统的S ...
PHP程序员的技能图谱
PHP知识图谱
monkeyrunner录制和回放功能
脚本录制网上先是搜索了一下,说是SDK--tools目录下有monkey_recorder.py和monkey_playback.py的脚本,但是我的没有找到所以可以自己编辑个脚本保存即可~ 先编辑 ...
单调队列优化&&P1886 滑动窗口题解
单调队列: 顾名思义,就是队列中元素是单调的(单增或者单减). 在某些问题中能够优化复杂度. 在dp问题中,有一个专题动态规划的单调队列优化,以后会更新(现在还是太菜了不会). 在你看到类似于滑动定长 ...
String成员函数
string类提供的各种操作函数大致分为八类:构造器和析构器,大小和容量,元素存取,字符串比较,字符串修改,字符串接合,I/O操作以及搜索和查找. 函数名称功能构造函数产生或复制字符串析构函数 ...
[易学易懂系列|rustlang语言|零基础|快速入门|（25）|实战2：命令行工具minigrep（2）]
[易学易懂系列|rustlang语言|零基础|快速入门|(25)|实战2:命令行工具minigrep(2)] 项目实战实战2:命令行工具minigrep 我们继续开发我们的minigrep. 我们现 ...
【洛谷P4172】水管局长
题目大意:给定 N 个点,M 条边的无向图,支持两种操作:动态删边和查询任意两点之间路径上边权的最大值最小是多少. 题解: 引理:对原图求最小生成树,可以保证任意两点之间的路径上边权的最大值取得最小值 ...
python3爬虫--shell命令的使用和firefox firebug获取目标信息的xpath
scrapy version -v #该命令用于查看scrapy安装的相关组件和版本一个工程下可创建多个爬虫 scrapy genspider rxmetal rxmetal.com scrapy ...
Swoole：PHP7安装Swoole的步骤
下载 swoole 首先下载swoole的源码包,这个操作很简单,没有太多说的.(没有wget:brew install wget--mac) wget -c https://github.com/s ...
mysql explain 执行计划详解
1).id列数字越大越先执行,如果说数字一样大,那么就从上往下依次执行,id列为null的就表是这是一个结果集,不需要使用它来进行查询. 2).select_type列常见的有: A:simple ...

CDH spark 命令行测试

CDH spark 命令行测试的更多相关文章

随机推荐

热门专题