Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

Select or create a user account to be used as principal.

This should not be the kafka or spark service account.
Generate a keytab for the user.
Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

The following example specifies keytab location ./v.keytab for principal vagrant@example.com:
```
KafkaClient {

   com.sun.security.auth.module.Krb5LoginModule required

   useKeyTab=true

   keyTab="./v.keytab"

   storeKey=true

   useTicketCache=false

   serviceName="kafka"

   principal="vagrant@EXAMPLE.COM";

};
```

In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --filesoption, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:

spark-submit \

    --files key.conf#key.conf,v.keytab#v.keytab \

    --driver-java-options "-Djava.security.auth.login.config=./key.conf" \

    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \

...

Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
```
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL
```

Parent topic: Using Spark Streaming

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

Kafka：ZK+Kafka+Spark Streaming集群环境搭建（十三）kafka+spark streaming打包好的程序提交时提示虚拟内存不足（Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G）
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
Spark Streaming官方文档学习--下
Accumulators and Broadcast Variables 这些不能从checkpoint重新恢复如果想启动检查点的时候使用这两个变量,就需要创建这写变量的懒惰的singleton实例 ...
Spark Streaming Backpressure分析
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming编程指南
Overview A Quick Example Basic Concepts Linking Initializing StreamingContext Discretized Streams (D ...
<Spark><Spark Streaming><作业分析><JobHistory>
Intro 这篇是对一个Spark (Streaming)作业的log进行分析.用来加深对Spark application运行过程,优化空间的各种理解. Here to Start 从我这个初学者写 ...
Spark Streaming性能优化: 如何在生产环境下应对流数据峰值巨变
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming job的生成及数据清理总结
关于这次总结还是要从一个bug说起....... 场景描述:项目的基本处理流程为:从文件系统读取每隔一分钟上传的日志并由Spark Streaming进行计算消费,最后将结果写入InfluxDB中,然 ...
Spark Streaming数据清理内幕彻底解密
本讲从二个方面阐述: 数据清理原因和现象数据清理代码解析 Spark Core从技术研究的角度讲对Spark Streaming研究的彻底,没有你搞不定的Spark应用程序. Spark Stre ...
spark第六篇：Spark Streaming Programming Guide
预览 Spark Streaming是Spark核心API的扩展,支持高扩展,高吞吐量,实时数据流的容错流处理.数据可以从Kafka,Flume或TCP socket等许多来源获取,并且可以使用复杂的 ...

随机推荐

【译】MongoDb vs Mysql—以NodeJs为例
亲爱的读者,您可能想知道为什么要写关于MongoDb和MySql这篇文章.那是因为我与NodeJs开发人员讨论在应用程序中使用哪种数据存储作为主要的数据存储方式. 我看过很多评论都在争论这个问题. 有 ...
FPGA高速ADC接口实战——250MSPS采样率ADC9481
一.前言最近忙于硕士毕业设计和论文,没有太多时间编写博客,现总结下之前在某个项目中用到的一个高速ADC接口设计部分.ADC这一器件经常用于无线通信.传感.测试测量等领域.目前数字系统对高速数据采集的 ...
windowns10安装httpd
下载页面:https://www.apachehaus.com/cgi-bin/download.plx 下载内容:httpd-2.4.38-o102r-x64-vc14-r2.zip 解压到本地磁盘 ...
SOAP webserivce 和 RESTful webservice 对比及区别（转载）
简单对象访问协议(Simple Object Access Protocol,SOAP)是一种基于 XML 的协议,可以和现存的许多因特网协议和格式结合使用,包括超文本传输协议(HTTP),简单邮件传 ...
Sitecore® 8.2 Professional Developer考试心得
因工作原因入了Sitecore的坑.. 不了解Sitecore认证考试的同学请移步: http://www.cnblogs.com/edisonchou/archive/2018/08/17/9488 ...
49个Spring经典面试题总结，附带答案，赶紧收藏
1. 一般问题 1.1. 不同版本的 Spring Framework 有哪些主要功能? Version Feature Spring 2.5 发布于 2007 年.这是第一个支持注解的版本. Spr ...
网卡的 Ring Buffer 详解
1. 网卡处理数据包流程网卡处理网络数据流程图: 图片来自参考链接1 上图中虚线步骤的解释: DMA 将 NIC 接收的数据包逐个写入 sk_buff ,一个数据包可能占用多个 sk_buff , ...
android学习笔记--Scanner
private static List<String> getxxxx(Context ctx) { try { Scanner sc = new Scanner( ctx.openFil ...
java线程通信与协作小结多线程中篇（十六）
在锁与监视器中我们对Object中的方法进行了简单介绍以监视器原理为核心,三个方法:wait,notify.notifyAll,可以完成线程之间的通信当然,不会像“语言”似的,有多种多样的沟通 ...
【憩园】C#并发编程之异步编程(三)
写在前面本篇是异步编程系列的第三篇,本来计划第三篇的内容是介绍异步编程中常用的几个方法,但是前两篇写出来后,身边的朋友总是会有其他问题,所以决定再续写一篇,作为异步编程(一)和异步编程(二)的补 ...

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

随机推荐

热门专题