Spark运行环境的安装
scala-2.9.3:一种编程语言,下载地址:http://www.scala-lang.org/download/
spark-1.4.0:必须是编译好的Spark,如果下载的是Source,则需要自己根据环境使用SBT或者MAVEN重新编译才能使用。
编译好的 Spark下载地址:http://spark.apache.org/downloads.html。
2、安装scala-2.9.3
#解压scala-2.9.3.tgz
tar -zxvf scala-2.9.3.tgz
#配置SCALA_HOME
vi /etc/profile
#添加如下环境
export SCALA_HOME=/home/apps/scala-2.9.3
export PATH=.:$SCALA_HOME/bin:$PATH
#测试scala安装是否成功
#直接输入
scala
3、安装spark-1.4.0
#解压spark-1.4.0.tgz
tar -zxvf spark-1.4.0.tgz
#配置SPARK_HOME
vi /etc/profile
#添加如下环境
export SCALA_HOME=/home/apps/spark-1.4.0
export PATH=.:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
4、修改Spark配置文件
#复制slaves.template和 spark-env.sh.template各一份
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
#slaves,此文件是指定子节点的主机,直接添加子节点主机名即可
在spark-env.sh末端添加如下几行:
#JDK安装路径
export JAVA_HOME=/root/app/jdk
#SCALA安装路径
export SCALA_HOME=/root/app/scala-2.9.3
#主节点的IP地址
export SPARK_MASTER_IP=192.168.1.200
#分配的内存大小
export SPARK_WORKER_MEMORY=200m
#指定hadoop的配置文件目录
export HADOOP_CONF_DIR=/root/app/hadoop/etc/hadoop
#指定worker工作时分配cpu数量
export SPARK_WORKER_CORES=1
#指定spark实例,一般1个足以
export SPARK_WORKER_INSTANCES=1
#jvm操作,在spark1.0之后增加了spark-defaults.conf默认配置文件,该配置参数在默认配置在该文件中
export SPARK_JAVA_OPTS
spark-defaults.conf中还有如下配置参数:
SPARK.MASTER //spark://hostname:8080
SPARK.LOCAL.DIR //spark工作目录(做shuffle的目录)
SPARK.EXECUTOR.MEMORY //spark1.0抛弃SPARK_MEM参数,使用该参数
5、测试spark安装是否成功
在主节点机器上启动顺序
1、先启动hdfs(./sbin/start-dfs.sh)
2、启动spark-master(./sbin/start-master.sh)
3、启动spark-worker(./sbin/start-slaves.sh)
4、jps查看进程有
主节点:namenode、secondrynamnode、master
从节点:datanode、worker
5、启动spark-shell
15/06/21 21:23:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/21 21:23:47 INFO spark.SecurityManager: Changing view acls to: root
15/06/21 21:23:47 INFO spark.SecurityManager: Changing modify acls to: root
15/06/21 21:23:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/06/21 21:23:47 INFO spark.HttpServer: Starting HTTP Server
15/06/21 21:23:47 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/06/21 21:23:47 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:38651
15/06/21 21:23:47 INFO util.Utils: Successfully started service 'HTTP class server' on port 38651.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
15/06/21 21:23:54 INFO spark.SparkContext: Running Spark version 1.4.0
15/06/21 21:23:54 INFO spark.SecurityManager: Changing view acls to: root
15/06/21 21:23:54 INFO spark.SecurityManager: Changing modify acls to: root
15/06/21 21:23:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/06/21 21:23:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/06/21 21:23:56 INFO Remoting: Starting remoting
15/06/21 21:23:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.200:57658]
15/06/21 21:23:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 57658.
15/06/21 21:23:58 INFO spark.SparkEnv: Registering MapOutputTracker
15/06/21 21:23:58 INFO spark.SparkEnv: Registering BlockManagerMaster
15/06/21 21:23:58 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/blockmgr-530e4335-9e59-45d4-b9fb-6014089f5a00
15/06/21 21:23:58 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/06/21 21:23:59 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/httpd-4b2cca3c-e8d4-4ab3-9c3d-38ec579ec873
15/06/21 21:23:59 INFO spark.HttpServer: Starting HTTP Server
15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/06/21 21:23:59 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:51899
15/06/21 21:23:59 INFO util.Utils: Successfully started service 'HTTP file server' on port 51899.
15/06/21 21:23:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/06/21 21:23:59 INFO server.AbstractConnector: Started SelectChannelConnector@0 .0.0.0:4040
15/06/21 21:23:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/06/21 21:23:59 INFO ui.SparkUI: Started SparkUI at http://192.168.1.200:4040
15/06/21 21:24:00 INFO executor.Executor: Starting executor ID driver on host localhost
15/06/21 21:24:00 INFO executor.Executor: Using REPL class URI: http://192.168.1.200:38651
15/06/21 21:24:01 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59385.
15/06/21 21:24:01 INFO netty.NettyBlockTransferService: Server created on 59385
15/06/21 21:24:01 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/06/21 21:24:01 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:59385 with 267.3 MB RAM, BlockManagerId(driver, localhost, 59385)
15/06/21 21:24:01 INFO storage.BlockManagerMaster: Registered BlockManager
15/06/21 21:24:02 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
15/06/21 21:24:03 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/06/21 21:24:04 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/06/21 21:24:04 INFO metastore.ObjectStore: ObjectStore, initialize called
15/06/21 21:24:04 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/06/21 21:24:04 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/06/21 21:24:05 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/06/21 21:24:07 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/06/21 21:24:14 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/06/21 21:24:14 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/06/21 21:24:19 INFO metastore.ObjectStore: Initialized ObjectStore
15/06/21 21:24:20 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added admin role in metastore
15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added public role in metastore
15/06/21 21:24:24 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
15/06/21 21:24:25 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/06/21 21:24:25 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
6、使用wordcount例子测试,启动spark-shell之前先上传一份文件到hdfs
7、代码:
val file = sc.textFile("hdfs://hadoop.master:9000/data/intput/wordcount.data")
val count = file.flatMap(line=>(line.split(" "))).map(word=>(word,1)).reduceByKey(_+_)
count.collect()
count.textAsFile("hdfs://hadoop.master:9000/data/output")
理解上面的代码你需要学习scala语言。
直接打印结果:hadoop dfs -cat /data/output/p*
(im,1)
(are,1)
(yes,1)
(hi,2)
(do,1)
(no,3)
(to,1)
(lll,1)
(,3)
(hello,3)
(xiaoming,1)
(ga,1)
(world,1)
Spark运行环境的安装的更多相关文章
- 服务器运行环境(LNMP)安装说明
服务器运行环境(LNMP)安装说明 因为公司需要一套流程标准,所以写了如下步骤. 先下载文件environment.tar,将文件上传到服务器. 使用命令解压文件,tar xvf environmen ...
- Apache Spark源码走读之12 -- Hive on Spark运行环境搭建
欢迎转载,转载请注明出处,徽沪一郎. 楔子 Hive是基于Hadoop的开源数据仓库工具,提供了类似于SQL的HiveQL语言,使得上层的数据分析人员不用知道太多MapReduce的知识就能对存储于H ...
- Hive on Spark运行环境搭建
Hive是基于Hadoop的开源数据仓库工具,提供了类似于SQL的HiveQL语言,使得上层的数据分析人员不用知道太多MapReduce的知识就能对存储于Hdfs中的海量数据进行分析.由于这一特性而收 ...
- Java 运行环境的安装、配置与运行
(一)SDK 的下载与安装 1. 下载SDK 为了建立基于SDK 的Java 运行环境,需要先下载Sun 的免费SDK 软件包.SDK 包含了一整套开发工具,其中包含对编程最有用的是Java 编译器. ...
- 实验 1 Java 运行环境的安装、配置与运行
一.实验目的 1. 掌握下载 Java SDK 软件包. 2. 掌握设置 Java 程序运行环境的方法. 3. 掌握编写与运行 Java 程序的方法. 4. 了解 Ja ...
- Spark简单介绍,Windows下安装Scala+Hadoop+Spark运行环境,集成到IDEA中
一.前言 近几年大数据是异常的火爆,今天小编以java开发的身份来会会大数据,提高一下自己的层面! 大数据技术也是有很多: Hadoop Spark Flink 小编也只知道这些了,由于Hadoop, ...
- 0.1Linux系统开发Angular项目一一首次运行环境的安装(chrome ,terminator,git,node)
首先,保证你已经安装了虚拟机(虚拟机可以用virturalbox或者VM)并安装了ubuntu镜像! 安装Chrome浏览器 安装terminator(可以多开)代替原来的命令行工具 sudo apt ...
- (转)Tomcat(java运行环境)安装及配置教程
转自:http://jingyan.baidu.com/article/870c6fc33e62bcb03fe4be90.html 用来进行web开发的工具有很多,Tomcat是其中一个开源的且免费的 ...
- 一起玩"Docker"之1——Ubuntu配置安装Docker运行环境并安装(Ubuntu、Centos)镜像
Docker 是一个开源的应用容器引擎,基于 Go 语言 并遵从Apache2.0协议开源. Docker 可以让开发者打包他们的应用以及依赖包到一个轻量级.可移植的容器中,然后发布到任何流行的 Li ...
随机推荐
- 关于boost的thread的mutex与lock的问题
妈的,看了好久的相关的知识,感觉终于自己有点明白了,我一定要记下来啊,相关的知识呀.... 1, 也可以看一下boost的线程指南:http://wenku.baidu.com/link?url=E_ ...
- Spring 定时任务的配置
1.applicationContext.xml 中 加入task 的声明与xsd ? 1 xmlns:task="http://www.springframework.org/schema ...
- Htmlt_Div+Css简介
DIV+CSS是网站标准(或称“WEB标准”)中常用术语之一,通常为了说明与HTML网页设计语言中的表格(table)定位方式的区别,因为XHTML网站设计标准中,不再使用表格定位技术,而是采用DIV ...
- mac10.12的Cocopods安装使用
Cocopods的安装 CocoaPods应该是iOS最常用最有名的类库管理当我们开发iOS应用时,会经常使用到很多第三方开源类库,比如AFNetWorking等等,可能某个类库又用到其他的库,手动一 ...
- hiho_1062_最近公共祖先
题目大意 给出一棵家谱树,树中的节点都有一个名字,保证每个名字都是唯一的,然后进行若干次查询,找出两个名字的最近公共祖先. 分析 数据量较小,对于每次查询都进行如下操作: 先找出person1到达根节 ...
- iOS开发 差间距滚动
CGFloat fView_Height(UIView *aView) { return aView.frame.size.height; } CGFloat fView_Width(UIView * ...
- LinuxShell脚本攻略--第三章 以文件之名
生成任意大小的文件文件权限.所有权和粘滞位创建不可修改文件生成空白文件查找符号链接及其指向目标head 与 tail只列出目录的其他方法在命令行中用 pushd 和 popd 快速定位(cd -)统计 ...
- alertdialog.builder 自定义弹窗
<?xml version="1.0" encoding="utf-8"?> <RelativeLayout xmlns:android=&q ...
- iOS 使用drawRect: 绘制虚线椭圆
iOS 使用drawRect: 绘制虚线椭圆 1:首先如果要使用 drawRect 绘图 要导入 CoreGraphics.framework 框架 然后 创建 自定义view, 即是 myView继 ...
- Windows下快捷键
1.任务管理器ctrl+alt+delete 2.切换任务窗口alt+tab 3.命令行Win+R 命令: {打开任务管理器:taskmgr} {打开远程桌面连接:mstsc} 4.回到桌面Win+D ...