一、安装scala

我们安装的是scala-2.11.8  5台机器全部安装

下载需要的安装包,放到特定的目录下/opt/workspace/并进行解压

1、解压缩

[root@master1 ~]# cd /opt/workspace
[root@master1 workspace]#tar -zxvf scala-2.11..tar.gz

2、配置环境变量  /etc/profile文件中添加spark配置

[root@master1 ~]# vi /etc/profile
# Scala Config
export SCALA_HOME=/opt/software/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
[root@master1 ~]# source /etc/profile

3、启动scala

[root@master1 workspace]# vim /etc/profile
[root@master1 workspace]# scala -version
-bash: /opt/workspace/scala-2.11.8/bin/scala: 权限不够
[root@master1 workspace]# chmod +x /opt/workspace/scala-2.11.8/bin/scala
[root@master1 workspace]# scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
[root@master1 workspace]# scala
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181).
Type in expressions for evaluation. Or try :help.

scala>

二、安装spark

1、下载spark对应版本

因为后期需要安装Hive,并且会运行hive on spark模式,为避免jar冲突,我们去掉了spark中的hive部分。

我们应用的是spark-2.3.0-bin-hadoop2-without-hive.tgz 自己编译的版本

可参考https://blog.csdn.net/sinat_25943197/article/details/81906060进行编译

2、文件解压

[root@master1 workspace]# tar -zxvf spark-2.3.0-bin-hadoop2-without-hive.tgz 

 3、配置文件 spark-env.sh  slaves、/etc/profile

/etc/profile文件中添加

# Spark Config
export SPARK_HOME=/opt/workspace/spark-2.3.-bin-hadoop2-without-hive
export PATH=.:${JAVA_HOME}/bin:${SCALA_HOME}/bin:${MAVEN_HOME}/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:${SPARK_HOME}/bin:$SQOOP_HOME/bin:${ZK_HOME}/bin:$PATH
source /etc/profile

spark-env.sh.template重新命名为spark-env.sh文件、配置如下:

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# # This file is sourced when running various Spark programs.
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public dns name of the driver program # Options read by executors and drivers running inside the cluster
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
# - SPARK_PUBLIC_DNS, to set the public DNS name of the driver program
# - SPARK_LOCAL_DIRS, storage directories to use on this node for shuffle and RDD data
# - MESOS_NATIVE_JAVA_LIBRARY, to point to your libmesos.so if you use Mesos # Options read in YARN client/cluster mode
# - SPARK_CONF_DIR, Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - YARN_CONF_DIR, to point Spark towards YARN configuration files when you use YARN
# - SPARK_EXECUTOR_CORES, Number of cores for the executors (Default: ).
# - SPARK_EXECUTOR_MEMORY, Memory per Executor (e.g. 1000M, 2G) (Default: 1G)
# - SPARK_DRIVER_MEMORY, Memory for Driver (e.g. 1000M, 2G) (Default: 1G)
#export SPARK_MASTER_IP=master1
export SPARK_SSH_OPTS="-p 61333"
export SPARK_MASTER_PORT=
export SPARK_WORKER_INSTANCES=
export SCALA_HOME=/opt/workspace/scala-2.11.
export JAVA_HOME=/opt/workspace/jdk1.
export HADOOP_HOME=/opt/workspace/hadoop-2.9.
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/workspace/spark-2.3.-bin-hadoop2-without-hive
export SPARK_CONF_DIR=$SPARK_HOME/conf
export SPARK_EXECUTOR_MEMORY=5120M
export SPARK_DIST_CLASSPATH=$(/opt/workspace/hadoop-2.9./bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181 -Dspark.deploy.zookeeper.dir=/spark"
# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_DAEMON_CLASSPATH, to set the classpath for all daemons
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers # Generic options for the daemons used in the standalone deploy mode
# - SPARK_CONF_DIR Alternate conf dir. (Default: ${SPARK_HOME}/conf)
# - SPARK_LOG_DIR Where log files are stored. (Default: ${SPARK_HOME}/logs)
# - SPARK_PID_DIR Where the pid file is stored. (Default: /tmp)
# - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER)
# - SPARK_NICENESS The scheduling priority for daemons. (Default: )
# - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file.
# Options for native BLAS, like Intel MKL, OpenBLAS, and so on.
# You might get better performance to enable these options if using native BLAS (see SPARK-).
# - MKL_NUM_THREADS= Disable multi-threading of Intel MKL
# - OPENBLAS_NUM_THREADS= Disable multi-threading of OpenBLAS

slaves.template文件重新命名为slaves、配置如下:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# # A Spark Worker will be started on each of the machines listed below.
slave1
slave2
slave3

4、启动spark

[root@master1 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-all.sh

报错:默认是22端口,进行ssh端口修改

解决:在spark-env.sh中增加端口

export SPARK_SSH_OPTS="-p 61333"

重新启动spark

启动成功

5、手动启动备用master

[root@master2 workspace]# ./spark-2.3.0-bin-hadoop2-without-hive/sbin/start-master.sh

参考:https://blog.csdn.net/sinat_25943197/article/details/81906060

大数据-spark HA集群搭建的更多相关文章

  1. 大数据-HBase HA集群搭建

    1.下载对应版本的Hbase,在我们搭建的集群环境中选用的是hbase-1.4.6 将下载完成的hbase压缩包放到对应的目录下,此处我们的目录为/opt/workspace/ 2.对已经有的压缩包进 ...

  2. 大数据-hadoop HA集群搭建

    一.安装hadoop.HA及配置journalnode 实现namenode HA 实现resourcemanager HA namenode节点之间通过journalnode同步元数据 首先下载需要 ...

  3. 大数据学习——HADOOP集群搭建

    4.1 HADOOP集群搭建 4.1.1集群简介 HADOOP集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起 HDFS集群: 负责海量数据的存储,集群中的角色主 ...

  4. 大数据中Hadoop集群搭建与配置

    前提环境是之前搭建的4台Linux虚拟机,详情参见 Linux集群搭建 该环境对应4台服务器,192.168.1.60.61.62.63,其中60为主机,其余为从机 软件版本选择: Java:JDK1 ...

  5. 大数据中HBase集群搭建与配置

    hbase是分布式列式存储数据库,前提条件是需要搭建hadoop集群,需要Zookeeper集群提供znode锁机制,hadoop集群已经搭建,参考 Hadoop集群搭建 ,该文主要介绍Zookeep ...

  6. 大数据平台Hadoop集群搭建

    一.概念 Hadoop是由java语言编写的,在分布式服务器集群上存储海量数据并运行分布式分析应用的开源框架,其核心部件是HDFS与MapReduce.HDFS是一个分布式文件系统,类似mogilef ...

  7. 大数据学习——Storm集群搭建

    安装storm之前要安装zookeeper 一.安装storm步骤 1.下载安装包 2.解压安装包 .tar.gz storm 3.修改配置文件 mv /root/apps/storm/conf/st ...

  8. 大数据中Linux集群搭建与配置

    因测试需要,一共安装4台linux系统,在windows上用vm搭建. 对应4个IP为192.168.1.60.61.62.63,这里记录其中一台的搭建过程,其余的可以直接复制虚拟机,并修改相关配置即 ...

  9. 大数据学习——hadoop集群搭建2.X

    1.准备Linux环境 1.0先将虚拟机的网络模式选为NAT 1.1修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=itcast ### ...

随机推荐

  1. [C++] any number to binary (Bit manipulation)

    any number to binary  (Bit manipulation)

  2. pcl文档库

    http://docs.pointclouds.org/trunk/structpcl_1_1_polygon_mesh.html

  3. Use formatter to format your JAVA code

    In order to make the codes looks unified and make it easy to understand, it's better to use the same ...

  4. 设计模式(java的23种设计模式)

    转自:leshui http://blog.csdn.net/leshui/article/details/11951 在java版看见了这篇文章,作者以轻松的语言比喻了java的32种模式,有很好的 ...

  5. 白盒测试实践项目(day5)

    在这几天的工作下,小组成员都基本完成了各自所负责的内容. 李建文同学完成提交了代码复审相关文档后,也经过小组的补充,彻底完成. 汪鸿同学使用FIndBugs工具完成了静态代码的测试,并且也完成了静态代 ...

  6. servler配置

    <?xml version="1.0" encoding="UTF-8"?><web-app xmlns:xsi="http://w ...

  7. wireshark问题

    上一篇wireshark编译成功了,生成了相应的wireshark.exe,dumpcap.exe等可执行文件(这些文件都是可以运行的),编译工具用的是VS2010,但是新生成的文件和文件夹中没有找到 ...

  8. mongoTemplate查询

    1.精确查询用“is” Criteria criteria=new Criteria("namespaceName"); criteria.is(namespaceName); Q ...

  9. Web图片编辑控件发布-Xproer.ImageEditor

    版权所有 2009-2014 荆门泽优软件有限公司 保留所有权利 官方网站:http://www.ncmem.com 产品首页:http://www.ncmem.com/webplug/image-e ...

  10. 编写高质量代码改善C#程序的157个建议——建议143:方法抽象级别应在同一层次

    建议143:方法抽象级别应在同一层次 看下面代码: class SampleClass { public void Init() { //本地初始化代码1 //本地初始化代码2 RemoteInit( ...