Hadoop+Spark:集群环境搭建
- 环境准备:
在虚拟机下,大家三台Linux ubuntu 14.04 server x64 系统(下载地址:http://releases.ubuntu.com/14.04.2/ubuntu-14.04.2-server-amd64.iso):
192.168.1.200 master
192.168.1.201 node1
192.168.1.202 node2
- 在Master上安装Spark环境:
Spark集群环境搭建:
搭建hadoop集群使用hadoop版本是hadoop2.6.4(这里hadoop我已经安装完成,具体如何安装hadoop具体请参考我的文章:《Hadoop:搭建hadoop集群》)
搭建spark这里使用spark版本是spark1.6.2(spark-1.6.2-bin-hadoop2.6.tgz)
1、下载安装包到master虚拟服务器:
在线下载:
hadoop@master:~/ wget http://mirror.bit.edu.cn/apache/spark/spark-1.6.2/spark-1.6.2-bin-hadoop2.6.tgz
离线上传到集群:
2、解压spark安装包到master虚拟服务器/usr/local/spark下,并分配权限:
#解压到/usr/local/下
hadoop@master:~$ sudo tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz -C /usr/local/
hadoop@master:~$ cd /usr/local/
hadoop@master:/usr/local$ ls
bin games include man share src
etc hadoop lib sbin spark-1.6.2-bin-hadoop2.6
#重命名 为spark
hadoop@master:/usr/local$ sudo mv spark-1.6.2-bin-hadoop2.6/ spark/
hadoop@master:/usr/local$ ls
bin etc games hadoop include lib man sbin share spark src
#分配权限
hadoop@master:/usr/local$ sudo chown -R hadoop:hadoop spark
hadoop@master:/usr/local$
3、在master虚拟服务器/etc/profile中添加Spark环境变量:
编辑/etc/profile文件
sudo vim /etc/profile
在尾部添加$SPARK_HOME变量,添加后,目前我的/etc/profile文件尾部内容如下:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=/usr/lib/jvm/java-8-oracle
export SCALA_HOME=/opt/scala/scala-2.10.5
# add hadoop bin/ directory to PATH
export HADOOP_HOME=/usr/local/hadoop
export SPARK_HOME=/usr/local/spark
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
export CLASSPATH=$CLASS_PATH::$JAVA_HOME/lib:$JAVA_HOME/jre/lib
生效:
source /etc/profile
- 在Master配置Spark:
1、配置master虚拟服务器hadoop-env.sh文件:
sudo vim /usr/local/spark/conf/hadoop-env.sh
注意:默认情况下没有hadoop-env.sh和slaves文件,而是*.template文件,需要重命名:
hadoop@master:/usr/local/spark/conf$ ls
docker.properties.template metrics.properties.template spark-env.sh
fairscheduler.xml.template slaves.template
log4j.properties.template spark-defaults.conf.template
hadoop@master:/usr/local/spark/conf$ sudo vim spark-env.sh
hadoop@master:/usr/local/spark/conf$ mv slaves.template slaves
在文件末尾追加如下内容:
export STANDALONE_SPARK_MASTER_HOST=192.168.1.200
export SPARK_MASTER_IP=192.168.1.200
export SPARK_WORKER_CORES=1
#every slave node start work instance count
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=1g
export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}
export SCALA_HOME=/opt/scala/scala-2.10.5
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://192.168.1.200:9000/SparkEventLog"
export SPARK_WORKDER_OPTS="-Dspark.worker.cleanup.enabled=true"
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
2、配置master虚拟服务器下slaves文件:
sudo vim /usr/local/spark/conf/slaves
在slaves文件中内容如下:
192.168.1.201
192.168.1.202
注意:每行写一个机器的ip。
3、Master虚拟机下/usr/local/spark/目录下创建logs文件夹,并分配777权限:
hadoop@master:/usr/local/spark$ mkdir logs
hadoop@master:/usr/local/spark$ chmod 777 logs
- 复制Master虚拟服务器上的/usr/loca/spark下文件到所有slaves节点(node1、node2)下:
1、复制Master虚拟服务器上/usr/local/spark/安装文件到各个salves(node1、node2)上:
注意:拷贝钱需要在ssh到所有salves节点(node1、node2)上,创建/usr/local/spark/目录,并分配777权限。
hadoop@master:/usr/local/spark/conf$ cd ~/
hadoop@master:~$ sudo chmod 777 /usr/local/spark
hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local
scp: /usr/local/spark: Permission denied
hadoop@master:~$ sudo scp -r /usr/local/spark hadoop@node1:/usr/local
hadoop@node1's password:
scp: /usr/local/spark: Permission denied
hadoop@master:~$ sudo chmod 777 /usr/local/spark
hadoop@master:~$ ssh node1
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:40:31 UTC 2016 System load: 0.08 Processes: 400
Usage of /: 12.2% of 17.34GB Users logged in: 0
Memory usage: 5% IP address for eth0: 192.168.1.201
Swap usage: 0% Graph this data and manage this system at:
https://landscape.canonical.com/ New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it. Last login: Wed Sep 21 16:19:25 2016 from master
hadoop@node1:~$ cd /usr/local/
hadoop@node1:/usr/local$ sudo mkdir spark
[sudo] password for hadoop:
hadoop@node1:/usr/local$ ls
bin etc games hadoop include lib man sbin share spark src
hadoop@node1:/usr/local$ sudo chmod 777 ./spark
hadoop@node1:/usr/local$ exit
hadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local
...........
hadoop@master:~$ ssh node2
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:15:03 UTC 2016 System load: 0.08 Processes: 435
Usage of /: 13.0% of 17.34GB Users logged in: 0
Memory usage: 6% IP address for eth0: 192.168.1.202
Swap usage: 0% Graph this data and manage this system at:
https://landscape.canonical.com/ Last login: Wed Sep 21 16:19:47 2016 from master
hadoop@node2:~$ cd /usr/local
hadoop@node2:/usr/local$ sudo mkdir spark
[sudo] password for hadoop:
hadoop@node2:/usr/local$ sudo chmod 777 ./spark
hadoop@node2:/usr/local$ exit
logout
Connection to node2 closed.
hadoop@master:~$ scp -r /usr/local/spark hadoop@node2:/usr/local
...........
2、修改所有salves节点(node1、node2)上/etc/profile,追加$SPARK_HOME环境变量:
注意:一般都会遇到权限问题。最好登录到各个salves节点(node1、node2)上手动编辑/etc/profile。
hadoop@master:~$ ssh node1
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:42:44 UTC 2016 System load: 0.01 Processes: 400
Usage of /: 12.2% of 17.34GB Users logged in: 0
Memory usage: 5% IP address for eth0: 192.168.1.201
Swap usage: 0% Graph this data and manage this system at:
https://landscape.canonical.com/ New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 16:40:52 2016 from master
hadoop@node1:~$ sudo vim /etc/profile
[sudo] password for hadoop:
hadoop@node1:~$ exit
logout
Connection to node1 closed.
hadoop@master:~$ ssh node2
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:44:42 UTC 2016 System load: 0.0 Processes: 400
Usage of /: 13.0% of 17.34GB Users logged in: 0
Memory usage: 5% IP address for eth0: 192.168.1.202
Swap usage: 0% Graph this data and manage this system at:
https://landscape.canonical.com/ New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep 23 16:43:31 2016 from master
hadoop@node2:~$ sudo vim /etc/profile
[sudo] password for hadoop:
hadoop@node2:~$ exit
logout
Connection to node2 closed.
hadoop@master:~$
修改后的所有salves上/etc/profile文件与master节点上/etc/profile文件配置一致。
- 在Master启动spark并验证是否配置成功:
1、启动命令:
一般要确保hadoop已经启动,之后才启动spark
hadoop@master:~$ cd /usr/local/spark/
hadoop@master:/usr/local/spark$ ./sbin/start-all.sh
2、验证是否启动成功:
方法一、jps
hadoop@master:/usr/local/spark$ ./sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master--master.out
192.168.1.201: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker--node1.out
192.168.1.202: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker--node2.out
hadoop@master:/usr/local/spark$ jps
NameNode
SecondaryNameNode
Jps
ResourceManager
Master
hadoop@master:/usr/local/spark$ cd ~/
hadoop@master:~$ ssh node1
Welcome to Ubuntu 14.04. LTS (GNU/Linux 3.16.--generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep :: UTC System load: 0.06 Processes:
Usage of /: 13.9% of .34GB Users logged in:
Memory usage: % IP address for eth0: 192.168.1.201
Swap usage: % Graph this data and manage this system at:
https://landscape.canonical.com/ New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep :: from master
hadoop@node1:~$ jps
1392 DataNode
2449 Jps
2330 Worker
2079 NodeManager
hadoop@node1:~$ exit
logout
Connection to node1 closed.
hadoop@master:~$ ssh node2
Welcome to Ubuntu 14.04. LTS (GNU/Linux 3.16.--generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep :: UTC System load: 0.07 Processes:
Usage of /: 14.7% of .34GB Users logged in:
Memory usage: % IP address for eth0: 192.168.1.202
Swap usage: % Graph this data and manage this system at:
https://landscape.canonical.com/ New release '16.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it. Last login: Fri Sep :: from master
hadoop@node2:~$ jps
Worker
NodeManager
DataNode
Jps
hadoop@node2:~$
方法二、web方式http://192.168.1.200:8080看是否正常:
Hadoop+Spark:集群环境搭建的更多相关文章
- Spark集群环境搭建——Hadoop集群环境搭建
Spark其实是Hadoop生态圈的一部分,需要用到Hadoop的HDFS.YARN等组件. 为了方便我们的使用,Spark官方已经为我们将Hadoop与scala组件集成到spark里的安装包,解压 ...
- Spark 集群环境搭建
思路: ①先在主机s0上安装Scala和Spark,然后复制到其它两台主机s1.s2 ②分别配置三台主机环境变量,并使用source命令使之立即生效 主机映射信息如下: 192.168.32.100 ...
- Spark集群环境搭建——部署Spark集群
在前面我们已经准备了三台服务器,并做好初始化,配置好jdk与免密登录等.并且已经安装好了hadoop集群. 如果还没有配置好的,参考我前面两篇博客: Spark集群环境搭建--服务器环境初始化:htt ...
- Spark集群环境搭建——服务器环境初始化
Spark也是属于Hadoop生态圈的一部分,需要用到Hadoop框架里的HDFS存储和YARN调度,可以用Spark来替换MR做分布式计算引擎. 接下来,讲解一下spark集群环境的搭建部署. 一. ...
- Hadoop、Spark 集群环境搭建问题汇总
Hadoop 问题1: Hadoop Slave节点 NodeManager 无法启动 解决方法: yarn-site.xml reducer取数据的方式是mapreduce_shuffle 问题2: ...
- Hadoop、Spark 集群环境搭建
1.基础环境搭建 1.1运行环境说明 1.1.1硬软件环境 主机操作系统:Windows 64位,四核8线程,主频3.2G,8G内存 虚拟软件:VMware Workstation Pro 虚拟机操作 ...
- Hadoop,HBase集群环境搭建的问题集锦(四)
21.Schema.xml和solrconfig.xml配置文件里參数说明: 參考资料:http://www.hipony.com/post-610.html 22.执行时报错: 23., /comm ...
- hadoop分布式集群环境搭建
参考 http://www.cnblogs.com/zhijianliutang/p/5736103.html 1 wget http://mirrors.shu.edu.cn/apache/hado ...
- Hadoop,HBase集群环境搭建的问题集锦(二)
10.艾玛, Datanode也启动不了了? 找到log: Caused by: java.net.UnknownHostException: Invalid host name: local hos ...
随机推荐
- 洛谷 P1033 自由落体 Label:模拟&&非学习区警告
题目描述 在高为 H 的天花板上有 n 个小球,体积不计,位置分别为 0,1,2,….n-1.在地面上有一个小车(长为 L,高为 K,距原点距离为 S1).已知小球下落距离计算公式为 d=1/2*g* ...
- 洛谷 P1014 Cantor表 Label:续命模拟QAQ
题目描述 现代数学的著名证明之一是Georg Cantor证明了有理数是可枚举的.他是用下面这一张表来证明这一命题的: 1/1 1/2 1/3 1/4 1/5 … 2/1 2/2 2/3 2/4 … ...
- 第1章 ZigBee协议栈初始化网络启动流程
作者:宋老师,华清远见嵌入式学院讲师. ZigBee的基本流程:由协调器的组网(创建PAN ID),终端设备和路由设备发现网络以及加入网络. 基本流程:main()->osal_init_sys ...
- Oracle 游标使用全解(转)
-- 声明游标:CURSOR cursor_name IS select_statement --For 循环游标 --(1)定义游标 --(2)定义游标变量 --(3)使用for循环来使用这个游标 ...
- 关于VSS上的项目源码管理的注意问题
1.将项目添加到vss上面去 如果项目取的名字没有问题,则不需要去vss上面去新建项目,直接在解决方案那里右击“添加到vss”中,把第一个输入框中的名字(xxxx.root)全部清除掉.确定即可. 2 ...
- (转)as3效率优化
1.改进算法无论对于那一种程序,好的算法总是非常重要的,而且能够极大地提高程序性能,所以任何性能的优化第一步就是从算法或者说程序逻辑的优化开始,检查自己的程序是否有多余的运算,是否在没有必要的时候做了 ...
- Hadoop.2.x_源码编译
一.基本环境搭建 1. 准备 hadoop-2.5.0-src.tar.gz apache-maven-3.0.5-bin.tar.gz jdk-7u67-linux-x64.tar.gz proto ...
- springMVC搭建
springMVC搭建 1.Spring特点: 方便耦合,简化开发,提升性能 AOP面向切面的编程 声明式事务支持 方便程序的调试 方便集成各大优秀的框架 Java源代码学习的典范 2.Java的面向 ...
- Varnish安装使用(初学)
本人对varnish也是新手,这里记录一下安装步骤! 环境:centos6.6 varnish安装包下载:wget https://repo.varnish-cache.org/source/varn ...
- jquery-validation 使用
jquery-validation 使用 一.用前必备 官方网站:http://bassistance.de/jquery-plugins/jquery-plugin-validation/ API: ...