1.官方网站下载spark 1.5.0的源码

2.根据官方编译即可。

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package

./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Pyarn

如你使用的版本是scala2.11 可以做以下操作

./dev/change-scala-version.sh 2.11 mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

不用再执行 ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.4 -Pyarn

然后将./assembly/target/scala-2.11/ spark-assembly-1.5.0-hadoop2.6.0.jar  大约137MB 将其拷贝到$HIVE_HOME/lib下 hive 启动后,

可以执行 set hive.execution.engine=spark; 即可

调试中遇到的问题: 一定要调YARN的内存,否则会获取不到资源

YARN: Diagnostic Messages for this Task: Container [pid=7830,containerID=container_1397098636321_27548_01_000297] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used;  2.7 GB of 4.2 GB virtual memory used. Killing container. Dump of the process-tree for container_1397098636321_27548_01_000297 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 7830 7816 7830 7830 (java) 2547 390 2924818432 539150 /export/servers/jdk1.6.0_25/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2224m -Djava.io.tmpdir=/data2/nm/local/usercache/admin/appcache/application_1397098636321_27548/container_1397098636321_27548_01_000297/tmp -Dlog4j.configuration=container-log4j.properties......

检查yarn-site-xml job内存限制 <property>   <name>yarn.scheduler.minimum-allocation-mb</name>   <value>2048</value> </property>

解决方法: 1.增加yarn.scheduler.minimum-allocation-mb内存上限。 2.--hiveconf mapred.child.java.opts=-Xmx????m  一定要小于yarn.scheduler.minimum-allocation-mb

如果是vm超了,如下:调整yarn.nodemanager.vmem-pmem-ratio

查看log没有明显的ERROR,但存在类似以下描述的日志 2012-05-16 13:08:20,876 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:  Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1337134318909, },  attemptId: 1, }, id: 6, }, state: C_COMPLETE, diagnostics: "Container [pid=15641,containerID=container_1337134318909_0018_01_000006]  is running beyond virtual memory limits. Current usage: 32.1mb of 1.0gb physical memory used; 6.2gb of 2.1gb virtual memory used.  Killing container.\nDump of the process-tree for container_1337134318909_0018_01_000006 :\n\t|- PID PPID PGRPID  SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE\n\t|  - 15641 26354 15641 15641 (java) 36 2 6686339072 8207 /home/zhouchen.zm/jdk1.6.0_23/bin/java   原因: 该错误是YARN的虚拟内存计算方式导致,上例中用户程序申请的内存为1Gb,YARN根据此值乘以一个比例(默认为2.1)得出申请的虚拟内存的值, 当YARN计算的用户程序所需虚拟内存值大于计算出来的值时,就会报出以上错误。调节比例值可以解决该问题。具体参数为:yarn-site.xml中的yarn.nodemanager.vmem-pmem-ratio

------QIN XIAO YAN -------------- <!-- Site specific YARN configuration properties -->

<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>qxy1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property> <property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property> <property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property> <property>
<description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
physical cores used by YARN containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property> <property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property> <property>
<description>The minimum allocation for every container request at the RM,
in MBs. Memory requests lower than this will throw a
InvalidResourceRequestException.</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property> <property>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this will throw a
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property> <property>
<description>Path to file with nodes to include.</description>
<name>yarn.resourcemanager.nodes.include-path</name>
<value></value>
</property> <property>
<description>
Where to store container logs. An application's localized log directory
will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers' log directories will be below this, in directories
named container_{$contid}. Each container directory will contain the files
stderr, stdin, and syslog generated by that container.
</description>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property> <property>
<description>Time in seconds to retain user logs. Only applicable if
log aggregation is disabled
</description>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property> <property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<description>The remote log dir will be created at
{yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
</description>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

SRARK 启动时报如下错误: Error: A JNI error has occurred, please check your installation and try again

1. SPARK_DIST_CLASSPATH=$(/home/hadoop/hadoop-2.7.2/bin/hadoop classpath)

2. 解决办法:

3.  export SCALA_HOME=/opt/scala-2.11.8  4. export SPARK_MASTER_IP=192.168.233.159  5. export SPARK_WORKER_MEMORY=1g  6. export HADOOP_CONF_DIR=/opt/hadoop-2.6.2/etc/hadoop  7. export JAVA_HOME=/opt/jdk1.8.0_77  8. export SPARK_DIST_CLASSPATH=$(/opt/hadoop-2.6.2/bin/hadoop  classpath)     ##加这条

hive on spark 编译时遇到的问题的更多相关文章

  1. Hive扩展功能(七)--Hive On Spark

    软件环境: linux系统: CentOS6.7 Hadoop版本: 2.6.5 zookeeper版本: 3.4.8 主机配置: 一共m1, m2, m3这五部机, 每部主机的用户名都为centos ...

  2. Hive On Spark环境搭建

    Spark源码编译与环境搭建 Note that you must have a version of Spark which does not include the Hive jars; Spar ...

  3. Spark记录-源码编译spark2.2.0(结合Hive on Spark/Hive on MR2/Spark on Yarn)

    #spark2.2.0源码编译 #组件:mvn-3.3.9 jdk-1.8 #wget http://mirror.bit.edu.cn/apache/spark/spark-2.2.0/spark- ...

  4. Spark入门实战系列--2.Spark编译与部署(下)--Spark编译安装

    [注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 .编译Spark .时间不一样,SBT是白天编译,Maven是深夜进行的,获取依赖包速度不同 ...

  5. Spark编译及spark开发环境搭建

    最近需要将生产环境的spark1.3版本升级到spark1.6(尽管spark2.0已经发布一段时间了,稳定可靠起见,还是选择了spark1.6),同时需要基于spark开发一些中间件,因此需要搭建一 ...

  6. 【原创】大数据基础之Hive(5)hive on spark

    hive 2.3.4 on spark 2.4.0 Hive on Spark provides Hive with the ability to utilize Apache Spark as it ...

  7. hive使用spark引擎的几种情况

    使用spark引擎查询hive有以下几种方式:1>使用spark-sql(spark sql cli)2>使用spark-thrift提交查询sql3>使用hive on spark ...

  8. Hive数据分析——Spark是一种基于rdd(弹性数据集)的内存分布式并行处理框架,比于Hadoop将大量的中间结果写入HDFS,Spark避免了中间结果的持久化

    转自:http://blog.csdn.net/wh_springer/article/details/51842496 近十年来,随着Hadoop生态系统的不断完善,Hadoop早已成为大数据事实上 ...

  9. Hive On Spark保姆级攻略

    声明: 此博客参考了官网的配置方式,并结合笔者在实践网上部分帖子时的踩坑经历整理而成 这里贴上官方配置说明: [官方]: https://cwiki.apache.org//confluence/di ...

随机推荐

  1. Java 设置Word页边距、页面大小、页面方向、页面边框

    本文将通过Java示例介绍如何设置Word页边距(包括上.下.左.右).页面大小(可设置Letter/A3/A4/A5/A6/B4/B5/B6/Envelop DL/Half Letter/Lette ...

  2. MySQL查询基础

    MySQL查询 DQL(Data Query Language ) 1.排序查询 # 语法: select 字段 from 表名 order by 字段1 [降序/升序],字段2 [降序/升序],.. ...

  3. ORM执行原生SQL语句

    # 1.connectionfrom django.db import connection, connections cursor = connection.cursor() # cursor = ...

  4. GoldenGate DB11gr2配置手册

    GoldenGate DB11gr2配置手册 源端数据库配置 1.1源端数据库打开Archive Log: SQL>shutdown immediate; SQL>startup moun ...

  5. 高通量计算框架HTCondor(五)——分布计算

    目录 1. 正文 1.1. 任务描述文件 1.2. 提交任务 1.3. 返回结果 2. 相关 1. 正文 1.1. 任务描述文件 前文提到过,HTCondor是通过condor_submit命令将提交 ...

  6. HCNA网络技术学习指南

    网络通信基础 网络与通信 OSI模型和TCP/IP模型 网络类型 传输介质及通信方式 2 VRP基础 VRP简介 VRP命令行 登录设备 基本配置 配置文件管理 通过Telnet登录设备 文件管理 基 ...

  7. Centos 7中安装svn服务器,史上最详细

    最近上头安排了帮客户安装svn服务器,用了两种方式安装,yum命令安装,快速简洁容易上手,但是源码安装就比较繁琐,两种方式都试了一下,yum命令基本一个多小时就安装完了,但是源码安装弄了我两天的时间, ...

  8. C语言博客作业5

    本周作业头 这个作业属于那个课程 C语言程序设计II 这个作业要求在哪里 作业链接 我在这个课程的目标是 学会函数函数的编写与自定义函数 这个作业在那个具体方面帮助我实现目标 通过pta作业练习 参考 ...

  9. kali-2019.4中文乱码问题的解决

    1.安装完kali-2019.4版出现乱码问题 2.更新源,用vi编辑器,在/etc/apt/resources.list中添加清华源 #清华大学 [更新源]deb https://mirrors.t ...

  10. SpringBoot消息篇Ⅲ --- 整合RabbitMQ

    知识储备:  关于消息队列的基本概念我已经在上一篇文章介绍过了(传送门),本篇文章主要讲述的是SpringBoot与RabbitMQ的整合以及简单的使用. 一.安装RabbitMQ 1.在linux上 ...