Redis on Spark:Task not serializable
|
We use Redis on Spark to cache our key-value pairs.This is the code:
But compiler gave me feedback like this:
Could somebody tell me how to serialize the data get from Redis.Thanks a lot. |
||
|
In Spark, the functions on The Redis connection here is not serializable as it opens TCP connections to the target DB that are bound to the machine where it's created. The solution is to create those connections on the executors, in the local execution context. There're few ways to do that. Two that pop to mind are:
A singleton connection manager can be modeled with an object that holds a lazy reference to a connection (note: a mutable ref will also work).
This object can then be used to instantiate 1 connection per worker JVM and is used as a
The advantage of using the singleton object is less overhead as connections are created only once by JVM (as opposed to 1 per RDD partition) There're also some disadvantages:
(*) code provided for illustration purposes. Not compiled or tested. |
|||||||||||||
|
|
You're trying to serialize the client. You have one |
|||||||||||||
|
Redis on Spark:Task not serializable的更多相关文章
- spark2.1注册内部函数spark.udf.register("xx", xxx _),运行时抛出异常:Task not serializable
函数代码: class MySparkJob{ def entry(spark:SparkSession):Unit={ def getInnerRsrp(outer_rsrp: Double, we ...
- spark出现task不能序列化错误的解决方法 org.apache.spark.SparkException: Task not serializable
import org.elasticsearch.cluster.routing.Murmur3HashFunction; import org.elasticsearch.common.math.M ...
- Spark运行程序异常信息: org.apache.spark.SparkException: Task not serializable 解决办法
错误信息: 17/05/20 18:51:39 ERROR JobScheduler: Error running job streaming job 1495277499000 ms.0 org.a ...
- 【原创】大叔问题定位分享(19)spark task在executors上分布不均
最近提交一个spark应用之后发现执行非常慢,点开spark web ui之后发现卡在一个job的一个stage上,这个stage有100000个task,但是绝大部分task都分配到两个execut ...
- Hadoop MapReduce Task的进程模型与Spark Task的线程模型
Hadoop的MapReduce的Map Task和Reduce Task都是进程级别的:而Spark Task则是基于线程模型的. 多进程模型和多线程模型 所谓的多进程模型和多线程模型,指的是同一个 ...
- Kafka Topic ISR不全,个别Spark task处理时间长
现象 Spark streaming读kafka数据做业务处理时,同一个stage的task,有个别task的运行时间比多数task时间都长,造成业务延迟增大. 查看业务对应的topic发现当topi ...
- Spark Task 概述
Task的执行流程: 1. Driver端中的 CoarseGrainSchedulerBackend 给 CoarseGrainExecutorBacken 发送 LaunchTask 消息 2. ...
- 大数据学习day34---spark14------1 redis的事务(pipeline)测试 ,2. 利用redis的pipeline实现数据统计的exactlyonce ,3 SparkStreaming中数据写入Hbase实现ExactlyOnce, 4.Spark StandAlone的执行模式,5 spark on yarn
1 redis的事务(pipeline)测试 Redis本身对数据进行操作,单条命令是原子性的,但事务不保证原子性,且没有回滚.事务中任何命令执行失败,其余的命令仍会被执行,将Redis的多个操作放到 ...
- 【原】 Spark中Task的提交源码解读
版权声明:本文为原创文章,未经允许不得转载. 复习内容: Spark中Stage的提交 http://www.cnblogs.com/yourarebest/p/5356769.html Spark中 ...
随机推荐
- C++使用hiredis连接带密码的redis服务
c = redisConnect((char*)redis_host, redis_port); if (c->err) { /* Error flags, 0 when there is no ...
- An application icon
The application icon is a small image which is usually displayed in the top left corner of the title ...
- Java Lombok 减少代码冗余 get set
1.下载 2.安装 java -jar Users\uatww990393\Desktop\lombok-1.16.16.jar a. 直接添加jar包到lib中 在java中项目中使用lombok ...
- spring mysql多数据源配置
spring mysql多数据源配置 @Configuration public class QuartzConfig { @Autowired private AutowireJobFactory ...
- HTML5 CANVAS 弹幕插件
概述 修改了普通弹幕运动的算法,新增了部分功能 详细 代码下载:http://www.demodashi.com/demo/10595.html 修改了普通弹幕运动的算法,新增了部分功能,具体请参看附 ...
- SDUT 2608 Alice and Bob (巧妙的二进制)
Alice and Bob Time Limit: 1000ms Memory limit: 65536K 有疑问?点这里^_^ 题目描述 Alice and Bob like playing ...
- Linux命令-工作管理命令:&,ctrl+z,jobs,fg,bg
在linux下面将一个进程放入后台执行,有两种方式: 第一种方式:&表示命令在后台执行程序,等同于windows里面的程序最小化. 第二种方式:执行某一个命令,例如:top,然后按ctrl+z ...
- WEB网络问题的排查【转】
Browser/Server结构主要是利用了不断成熟的Web浏览器技术:结合浏览器的多种脚本语言和ActiveX技术,用通用浏览器实现原来需要复杂专用软件才能实现的强大功能,同时节约了开发成本.B/S ...
- IP数据报格式和IP地址路由
一.IP数据报格式 IP数据报格式如下: 注:需要注意的是网络数据包以大端字节序传输,当然头部也得是大端字节序,也就是说: The most significant bit is numbered 0 ...
- 平衡二叉树AVL - 插入节点后旋转方法分析
平衡二叉树 AVL( 发明者为Adel'son-Vel'skii 和 Landis)是一种二叉排序树,其中每一个节点的左子树和右子树的高度差至多等于1. 首先我们知道,当插入一个节点,从此插入点到树根 ...