(四)Spark集群搭建-Java&Python版Spark
Spark集群搭建
视频教程
1、优酷
2、YouTube
安装scala环境
下载地址http://www.scala-lang.org/download/
上传scala-2.10.5.tgz到master和slave机器的hadoop用户installer目录下
两台机器都要做
[hadoop@master installer]$ ls
hadoop2 hadoop-2.6.0.tar.gz scala-2.10.5.tgz
解压
[hadoop@master installer]$ tar -zxvf scala-2.10.5.tgz
[hadoop@master installer]$ mv scala-2.10.5 scala
[hadoop@master installer]$ cd scala
[hadoop@master scala]$ pwd
/home/hadoop/installer/scala
配置环境变量:
[hadoop@master ~]$ vim .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/home/hadoop/installer/hadoop2
export SCALA_HOME=/home/hadoop/installer/scala
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin
[hadoop@master ~]$ . .bashrc
安装python
安装gcc
[root@master ~]# mkdir /RHEL5U4
[root@master ~]# mount /dev/cdrom /media/
[root@master media]# cp -r * /RHEL5U4/
[root@master ~]vim /etc/yum.repos.d/iso.repo
[rhel-Server]
Name=5u4_Server
Baseurl=file:///RHEL5U4/Server
Enable=1
Gpgcheck=0
Gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
yum clean all
yum install gcc
Python安装
[root@master installer]# tar -zxvf Python-2.7.12
上传zlib-1.2.8.tar.gz
替换/root/installer/Python-2.7.12/Modules的zlib
[root@master Python-2.7.12]# ./configure --prefix=/usr/local/python27
[root@master Python-2.7.12]# make
[root@master Python-2.7.12]# make install
[root@master Python-2.7.12]# mv /usr/bin/python /usr/bin/python_old
[root@master Python-2.7.12]# ln -s /usr/local/python27/bin/python /usr/bin/
[root@master Python-2.7.12]# python
Python 2.7.12 (default, Nov 7 2016, 21:42:16)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
安装spark环境
下载地址http://spark.apache.org/downloads.html
上传spark-2.0.0-bin-hadoop2.6.tgz到master的hadoop用户installer目录下
解压缩
[hadoop@master installer]$ tar -zxvf spark-2.0.0-bin-hadoop2.6.tgz
[hadoop@master installer]$ mv spark-2.0.0-bin-hadoop2.6 spark2
[hadoop@master installer]$ cd spark2/
[hadoop@master spark2]$ ls
bin conf data examples jars LICENSE licenses NOTICE python R README.md RELEASE sbin yarn
[hadoop@master spark2]$ pwd
/home/hadoop/installer/spark2
[hadoop@master ~]$ vim .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/home/hadoop/installer/hadoop2
export SCALA_HOME=/home/hadoop/installer/scala
export SPARK_HOME=/home/hadoop/installer/spark2
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
[hadoop@master ~]$ . .bashrc
[hadoop@master ~]$ scp .bashrc slave:~
.bashrc 100% 621 0.6KB/s 00:00
在slave机器上执行
[hadoop@slave ~]$ . .bashrc
配置spark
[hadoop@master conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@slave conf]$ vim spark-env.sh
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
export JAVA_HOME=/usr/java/jdk1.7.0_79
export SCALA_HOME=/home/hadoop/installer/scala
export SPARK_MASTER_HOST=master
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_EXECUTOR_MEMORY=600M
export SPARK_DRIVER_MEMORY=600M
[hadoop@slave conf]$ vim slaves
master
slave
[hadoop@master installer]$ scp -r spark2 slave:~/installer/
启动spark集群
[hadoop@master ~]$ start-master.sh
[hadoop@master ~]$ start-slaves.sh
[hadoop@master ~]$ jps
17769 ResourceManager
20192 Master
20275 Worker
17443 NameNode
20521 Jps
17631 SecondaryNameNode
[hadoop@slave ~]$ jps
13297 DataNode
15367 Worker
13408 NodeManager
16245 Jps
Spark wordcount
[hadoop@master ~]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/04 11:05:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/04 11:05:09 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://192.168.3.100:4040
Spark context available as 'sc' (master = local[*], app id = local-1478228709028).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file = sc.textFile("hdfs://master:9000/data/wordcount")
16/11/04 11:05:14 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
file: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/data/input/wordcount MapPartitionsRDD[1] at textFile at <console>:24
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26
scala> count.collect()
res0: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (cluster.,1), (its,1), ([run,1), (general,2), (have,1), (pre-built,1), (YARN,,1), (locally,2), (changed,1), (locally.,1), (sc.parallelize(1,1), (only,1), (Configuration,1), (This,2), (basic,1), (first,1), (learning,,1), ([Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse),1), (documentation,3), (graph,1), (Hive,2), (several,1), (["Specifying,1), ("yarn",1), (page](http://spark.apache.org/documentation.html),1), ([params]`.,1), ([project,2), (prefer,1), (SparkPi,2), (<http://spark.apache.org/>,1), (engine,1), (version,1), (file,1), (documentation...
scala>
(四)Spark集群搭建-Java&Python版Spark的更多相关文章
- (三)Spark-Hadoop集群搭建-Java&Python版Spark
Spark-Hadoop集群搭建 视频教程: 1.优酷 2.YouTube 配置java 启动ftp [root@master ~]# /etc/init.d/vsftpd restart 关闭 vs ...
- Spark集群搭建_YARN
2017年3月1日, 星期三 Spark集群搭建_YARN 前提:参考Spark集群搭建_Standalone 1.修改spark中conf中的spark-env.sh 2.Spark on ...
- Spark集群搭建【Spark+Hadoop+Scala+Zookeeper】
1.安装Linux 需要:3台CentOS7虚拟机 IP:192.168.245.130,192.168.245.131,192.168.245.132(类似,尽量保持连续,方便记忆) 注意: 3台虚 ...
- Spark集群搭建简配+它到底有多快?【单挑纯C/CPP/HADOOP】
最近耳闻Spark风生水起,这两天利用休息时间研究了一下,果然还是给人不少惊喜.可惜,笔者不善JAVA,只有PYTHON和SCALA接口.花了不少时间从零开始认识PYTHON和SCALA,不少时间答了 ...
- hadoop+spark集群搭建入门
忽略元数据末尾 回到原数据开始处 Hadoop+spark集群搭建 说明: 本文档主要讲述hadoop+spark的集群搭建,linux环境是centos,本文档集群搭建使用两个节点作为集群环境:一个 ...
- Spark集群搭建中的问题
参照<Spark实战高手之路>学习的,书籍电子版在51CTO网站 资料链接 Hadoop下载[链接](http://archive.apache.org/dist/hadoop/core/ ...
- spark集群搭建
文中的所有操作都是在之前的文章scala的安装及使用文章基础上建立的,重复操作已经简写: 配置中使用了master01.slave01.slave02.slave03: 一.虚拟机中操作(启动网卡)s ...
- 十、scala、spark集群搭建
spark集群搭建: 1.上传scala-2.10.6.tgz到master 2.解压scala-2.10.6.tgz 3.配置环境变量 export SCALA_HOME=/mnt/scala-2. ...
- Spark集群搭建简要
Spark集群搭建 1 Spark编译 1.1 下载源代码 git clone git://github.com/apache/spark.git -b branch-1.6 1.2 修改pom文件 ...
随机推荐
- 【Java并发编程实战】----- AQS(四):CLH同步队列
在[Java并发编程实战]-–"J.U.C":CLH队列锁提过,AQS里面的CLH队列是CLH同步锁的一种变形.其主要从两方面进行了改造:节点的结构与节点等待机制.在结构上引入了头 ...
- python学习 正则表达式
一.re 模块中 1.re.match #从开始位置开始匹配,如果开头没有match()就返回none 语法:re.match(pattern, string, flags=0) pattern 匹配 ...
- 揭开SQL注入的神秘面纱PPT分享
SQL注入是一个老生常谈但又经常会出现的问题.该课程是我在公司内部培训的课程,现在分享出来,希望对大家有帮助. 点击这里下载.
- AFNetworking 3.0 源码解读(四)之 AFURLResponseSerialization
本篇是AFNetworking 3.0 源码解读的第四篇了. AFNetworking 3.0 源码解读(一)之 AFNetworkReachabilityManager AFNetworking 3 ...
- 扫二维码下载apk并统计被扫描次数(及微信屏蔽下载解决方案)
转载请注明出处:http://www.cnblogs.com/Joanna-Yan/p/5395715.html 需求:想让用户扫描一个二维码就能下载APP,并统计被扫描次数. 两种实现方法: 1.一 ...
- 原创:MD5 32位加密软件
网站后台数据库切勿使用明文保存密码,否则一旦黑客拿下你的Webshell,后果不堪设想. 网站后台密码加密大多数采用的就是MD5算法加密.今天给大家送一个本人用c#简单编写的MD5 32位加密程序,虽 ...
- javascript权威指南笔记
最近每天工作之余看下js的细节部分,时间不是很多,所以看的进度也不会太快,写个博客监督自己每天都看下. 以前不知道的细节或者以前知道但是没注意过的地方都会记录下来,所以适合有一定基础的,不适合零基础新 ...
- 利用WCF的双工通讯实现一个简单的心跳监控系统
何为心跳监控系统? 故名思义,就是监控某个或某些个程序的运行状态,就好比医院里面的心跳监视仪一样,能够随时显示病人的心跳情况. 心跳监控的目的是什么? 与医院里面的心跳监视仪目的类似,监控程序运行状态 ...
- 【原创】如何确定Kafka的分区数、key和consumer线程数
在Kafak中国社区的qq群中,这个问题被提及的比例是相当高的,这也是Kafka用户最常碰到的问题之一.本文结合Kafka源码试图对该问题相关的因素进行探讨.希望对大家有所帮助. 怎么确定分区数? ...
- Task.Factory.StartNew的用法
代码: private void button5_Click(object sender, EventArgs e) { ; Task.Factory.StartNew(() => { Mess ...