大数据-spark HA集群搭建
一、安装scala
我们安装的是scala-2.11.8 5台机器全部安装
下载需要的安装包,放到特定的目录下/opt/workspace/并进行解压
1、解压缩
[root@master1 ~]# cd /opt/workspace
[root@master1 workspace]#tar -zxvf scala-2.11..tar.gz
2、配置环境变量 /etc/profile文件中添加spark配置
[root@master1 ~]# vi /etc/profile
# Scala Config
export SCALA_HOME=/opt/software/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
[root@master1 ~]# source /etc/profile
3、启动scala
[root@master1 workspace]# vim /etc/profile
[root@master1 workspace]# scala -version
-bash: /opt/workspace/scala-2.11.8/bin/scala: 权限不够
[root@master1 workspace]# chmod +x /opt/workspace/scala-2.11.8/bin/scala
[root@master1 workspace]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
[root@master1 workspace]# scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
Type in expressions for evaluation. Or try :help.
scala>
二、安装spark
1、下载spark对应版本
因为后期需要安装Hive,并且会运行hive on spark模式,为避免jar冲突,我们去掉了spark中的hive部分。
我们应用的是spark-2.3.0-bin-hadoop2-without-hive.tgz 自己编译的版本
可参考https://blog.csdn.net/sinat_25943197/article/details/81906060进行编译
2、文件解压
[root@master1 workspace]# tar -zxvf spark-2.3.0-bin-hadoop2-without-hive.tgz
3、配置文件 spark-env.sh slaves、/etc/profile
/etc/profile文件中添加
# Spark Config
export SPARK_HOME=/opt/workspace/spark-2.3.-bin-hadoop2-without-hive
export PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${MAVEN_HOME}/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:${SPARK_HOME}/bin:$SQOOP_HOME/bin:${ZK_HOME}/bin:$PATH
source /etc/profile
spark-env.sh.template重新命名为spark-env.sh文件、配置如下:
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# # This file is sourced when running various Spark programs.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: ).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
#export SPARK_MASTER_IP=master1
export SPARK_SSH_OPTS="-p 61333"
export SPARK_MASTER_PORT=
export SPARK_WORKER_INSTANCES=
export SCALA_HOME=/opt/workspace/scala-2.11.
export JAVA_HOME=/opt/workspace/jdk1.
export HADOOP_HOME=/opt/workspace/hadoop-2.9.
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/workspace/spark-2.3.-bin-hadoop2-without-hive
export SPARK_CONF_DIR=$SPARK_HOME/conf
export SPARK_EXECUTOR_MEMORY=5120M
export SPARK_DIST_CLASSPATH=$(/opt/workspace/hadoop-2.9./bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181 -Dspark.deploy.zookeeper.dir=/spark"
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers # Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: )
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-).
# - MKL_NUM_THREADS= Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS= Disable multi-threading of OpenBLAS
slaves.template文件重新命名为slaves、配置如下:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# # A Spark Worker will be started on each of the machines listed below.
slave1
slave2
slave3
4、启动spark
[root@master1 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-all.sh
报错:默认是22端口,进行ssh端口修改
解决:在spark-env.sh中增加端口
export SPARK_SSH_OPTS="-p 61333"
重新启动spark
启动成功
5、手动启动备用master
[root@master2 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-master.sh
参考:https://blog.csdn.net/sinat_25943197/article/details/81906060
大数据-spark HA集群搭建的更多相关文章
- 大数据-HBase HA集群搭建
1.下载对应版本的Hbase,在我们搭建的集群环境中选用的是hbase-1.4.6 将下载完成的hbase压缩包放到对应的目录下,此处我们的目录为/opt/workspace/ 2.对已经有的压缩包进 ...
- 大数据-hadoop HA集群搭建
一.安装hadoop.HA及配置journalnode 实现namenode HA 实现resourcemanager HA namenode节点之间通过journalnode同步元数据 首先下载需要 ...
- 大数据学习——HADOOP集群搭建
4.1 HADOOP集群搭建 4.1.1集群简介 HADOOP集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起 HDFS集群: 负责海量数据的存储,集群中的角色主 ...
- 大数据中Hadoop集群搭建与配置
前提环境是之前搭建的4台Linux虚拟机,详情参见 Linux集群搭建 该环境对应4台服务器,192.168.1.60.61.62.63,其中60为主机,其余为从机 软件版本选择: Java:JDK1 ...
- 大数据中HBase集群搭建与配置
hbase是分布式列式存储数据库,前提条件是需要搭建hadoop集群,需要Zookeeper集群提供znode锁机制,hadoop集群已经搭建,参考 Hadoop集群搭建 ,该文主要介绍Zookeep ...
- 大数据平台Hadoop集群搭建
一.概念 Hadoop是由java语言编写的,在分布式服务器集群上存储海量数据并运行分布式分析应用的开源框架,其核心部件是HDFS与MapReduce.HDFS是一个分布式文件系统,类似mogilef ...
- 大数据学习——Storm集群搭建
安装storm之前要安装zookeeper 一.安装storm步骤 1.下载安装包 2.解压安装包 .tar.gz storm 3.修改配置文件 mv /root/apps/storm/conf/st ...
- 大数据中Linux集群搭建与配置
因测试需要,一共安装4台linux系统,在windows上用vm搭建. 对应4个IP为192.168.1.60.61.62.63,这里记录其中一台的搭建过程,其余的可以直接复制虚拟机,并修改相关配置即 ...
- 大数据学习——hadoop集群搭建2.X
1.准备Linux环境 1.0先将虚拟机的网络模式选为NAT 1.1修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=itcast ### ...
随机推荐
- #error用法
#error命令是C/C++语言的预处理命令之一,当预处理器预处理到#error命令时将停止编译并输出用户自定义的错误消息. 语法: #error [用户自定义的错误消息] 注:上述语法成份中的方括号 ...
- C++编程语言学习资料
C++ How to Program, 7/e (C++大学教程 第7版) 英文原版 全彩页 C++大学教程(第五版)中文版高清PDF下载 C++大学教程 第五版 (C++ How to Progra ...
- pcl知识
1.pcl/io/pcd_io.h pcl/io/ply_io.h pcl::PLYReader reader; pcl::PCDWriter pcdwriter; 读取ply pcd 2.voidl ...
- Java 读取jar内的文件的超简便方法
坑爹的java课程设计,偏要用jar来运行 读取.存储jar内文件的支持也好低 存储方法: 进入jar文件其实没有说的那么困难,jar文件本质是一个zip格式的压缩文件,只是把文件后缀名改了,要用Ja ...
- PHP发红包程序限制红包的大小
我们先来分析下规律. 设定总金额为10元,有N个人随机领取: N=1 第一个 则红包金额=X元: N=2 第二个 为保证第二个红包可以正常发出,第一个红包金额=0.01至9.99之间的某个随机数. 第 ...
- 多校训练4——Hehe
递推题: dp[i]表示字符串第i个字母前有多少种不同的方法 1.出现一个hehe:dp[i]=dp[i-4]+dp[i-2] 意思是dp[i]=当前的hehe换成wqnmlgb+当前的hehe不换成 ...
- Web测试实践-任务进度-Day02
小组成员 华同学.郭同学.覃同学.刘同学.穆同学.沈同学 任务进度 在经过任务分配阶段后,大家都投入到了各自的任务中,以下是大家今天任务的进度情况汇总. 华同学 & 刘同学(任务1) 1.对爱 ...
- js实现二级菜单显示和收缩
window.onload=function(){ var aLi=document.getElementsByTagName('li'); for(var i=0; i<aLi.length; ...
- 37 有n个人围成一圈,顺序排号,从第一个人开始报数(从1到3报数),凡报到3的人退出圈子,问最后留下的是原来第几号那位.
题目:有n个人围成一圈,顺序排号,从第一个人开始报数(从1到3报数),凡报到3的人退出圈子,问最后留下的是原来第几号那位. public class _037NumberOff { public st ...
- Smart3d和3dsmax结合做人脸建模
1.拍摄几张照片(或视频 我是拍摄的视频然后截图,因为自拍照20张总是不方便) 2.导入smart3d 3.空三匹配 4.重建,并保存格式为.obj常用格式 5.将上一步重建的结果导入3dsmax做进 ...