Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

Select or create a user account to be used as principal.

This should not be the kafka or spark service account.
Generate a keytab for the user.
Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

The following example specifies keytab location ./v.keytab for principal vagrant@example.com:
```
KafkaClient {

   com.sun.security.auth.module.Krb5LoginModule required

   useKeyTab=true

   keyTab="./v.keytab"

   storeKey=true

   useTicketCache=false

   serviceName="kafka"

   principal="vagrant@EXAMPLE.COM";

};
```

In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --filesoption, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:

spark-submit \

    --files key.conf#key.conf,v.keytab#v.keytab \

    --driver-java-options "-Djava.security.auth.login.config=./key.conf" \

    --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" \

...

Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
```
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL
```

Parent topic: Using Spark Streaming

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

Kafka：ZK+Kafka+Spark Streaming集群环境搭建（十三）kafka+spark streaming打包好的程序提交时提示虚拟内存不足（Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G）
异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...
Spark Streaming官方文档学习--下
Accumulators and Broadcast Variables 这些不能从checkpoint重新恢复如果想启动检查点的时候使用这两个变量,就需要创建这写变量的懒惰的singleton实例 ...
Spark Streaming Backpressure分析
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming编程指南
Overview A Quick Example Basic Concepts Linking Initializing StreamingContext Discretized Streams (D ...
<Spark><Spark Streaming><作业分析><JobHistory>
Intro 这篇是对一个Spark (Streaming)作业的log进行分析.用来加深对Spark application运行过程,优化空间的各种理解. Here to Start 从我这个初学者写 ...
Spark Streaming性能优化: 如何在生产环境下应对流数据峰值巨变
1.为什么引入Backpressure 默认情况下,Spark Streaming通过Receiver以生产者生产数据的速率接收数据,计算过程中会出现batch processing time > ...
Spark Streaming job的生成及数据清理总结
关于这次总结还是要从一个bug说起....... 场景描述:项目的基本处理流程为:从文件系统读取每隔一分钟上传的日志并由Spark Streaming进行计算消费,最后将结果写入InfluxDB中,然 ...
Spark Streaming数据清理内幕彻底解密
本讲从二个方面阐述: 数据清理原因和现象数据清理代码解析 Spark Core从技术研究的角度讲对Spark Streaming研究的彻底,没有你搞不定的Spark应用程序. Spark Stre ...
spark第六篇：Spark Streaming Programming Guide
预览 Spark Streaming是Spark核心API的扩展,支持高扩展,高吞吐量,实时数据流的容错流处理.数据可以从Kafka,Flume或TCP socket等许多来源获取,并且可以使用复杂的 ...

随机推荐

Kafka系列2-producer和consumer报错
1. 使用127.0.0.1启动生产和消费进程: 1)启动生产者进程: bin/kafka-console-producer.sh --broker-list 127.0.0.1:9092 --top ...
Windows 2008 打开声音重定向来听到远程主机音频
Windows 2008 Server在默认情况下,是禁止使用桌面桌面连接后播放声音的(提示音频服务未启用).可以通过下面的方法启用音频: 管理工具->远程桌面服务->远程桌面会话主机配置 ...
注解ConfigurationProperties注入yml配置文件中的数据
在使用SpringBoot开发中需要将一些配置参数放在yml文件中定义,再通过Java类来引入这些配置参数 SpringBoot提供了一些注解来实现这个功能 ConfigurationProperti ...
iFace Chain [ 爱妃链 ] 或将凭借人脸密钥技术成为安全领域最大的赢家
前段时间iFace Chain [ 中文音译名称: 爱妃链 ] 安全专家揭密了区块链领域,数字资产存放于无信用钱包中的一些风险,并为区块链玩家解密如何安全保护资产私钥,我们再来回顾分析一下目前跑路钱包 ...
mysql优化一之查询优化
这一篇笔记的mysql优化是注重于查询优化,根据mysql的执行情况,判断mysql什么时候需要优化,关于数据库开始阶段的数据库逻辑.物理结构的设计结构优化不是本文重点,下次再谈查看mysql语句的 ...
Linux知识要点大全（第三章）
第三章 Linux基本操作 *主要内容 1:认识root用户 2:Linux下命令的写法 3:Linux关机和重启 4:忘记root密码的处理方法 5. Linux下的目录结构 6. 查看信息 ...
Android屏幕适配和方案【整理】
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 前言这里只是根据参考资料整理下,具体内容请阅读参考资料. 原型设计图推荐1倍效果图,即采用 720 * 360 大小( 1280 *7 ...
SVN问题解决--Attempted to lock an already-locked dir
今天上午更新uap(uap就是基于eclipse开发的软件,可以当eclipse来使用)上的代码时,发现在svn上更新不了,一直报这个Attempted to lock an already-lock ...
Python：bs4中 string 属性和 text 属性的区别及背后的原理
刚开始接触 bs4 的时候,我也很迷茫,觉得 string 属性和 text 属性是一样的,不明白为什么要分成两个属性. html = '<p>hello world</p>' ...
一套代码小程序&Web&Native运行的探索06——组件系统
接上文:一套代码小程序&Web&Native运行的探索05——snabbdom 对应Git代码地址请见:https://github.com/yexiaochai/wxdemo/tre ...

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的更多相关文章

随机推荐

热门专题