RabbitMQ - RabbitMQ and batch processing
http://rabbitmq.1065348.n5.nabble.com/RabbitMQ-and-batch-processing-td35634.html

Reply | Threaded | More 
May 19, 2014; 2:55am

RabbitMQ and batch processing

20 posts
I mentioned this on Twitter and a couple of people have requested that I bring this up on the mailing list.

 
It seems to be a given that RabbitMQ was not designed for the batch processing use case (i.e. using RabbitMQ as a buffer between large serial steps). We have a system in place that attempts to do just that, however.
 
I have been working with the developers of the software involved in an attempt to help them redesign around a more ideal use of RabbitMQ (or to help them move to a different bus altogether -- database or something like kafka) and some of them have been able to simply operate in smaller batch sizes (thus keeping their queues relatively small).
 
However, I cannot stem the tide of improper RabbitMQ use.
 
When things go poorly, millions of messages end up in the queues. 
 
In 3.1.x we saw this regularly cause our clusters to partition.
 
In 3.1.x and 3.2.x when we would delete large queues (5+ million messages enqueued), this would cause the cluster to become unresponsive, run out of memory, and then crash.
 
During the 3.1 -> 3.2 upgrade, we had to completely rebuild our clusters. When 3.2 came up, it soon crashed.
 
In the most recent upgrade, we saw a 3.2.3 cluster in our dev environment crash. I performed an opportunistic upgrade to 3.3.1, because hey... downtime already, so let's see if 3.3.1 addresses some of the issues we've been seeing.
 
 
After the upgrade, 3.3.1 would not startup at all. I removed /var/lib/rabbitmq/mnesia on all of the nodes and brought RabbitMQ back up.
 
3.3.1 has been up and running alright so far, but we haven't done another end-to-end test in our development environment in a while. One of these tests can lead to at least a million messages in the queue over a period of time on average.
 
So, I guess my question is:
 
If I know that I have people using RabbitMQ like this, and there is nothing I can do to change that fact... what do I do?

_______________________________________________ 
rabbitmq-discuss mailing list 
[hidden email] 
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Reply | Threaded | More 
May 19, 2014; 7:50am

Re: RabbitMQ and batch processing

144 posts
I'll respond inline w our experience:

On Sun, May 18, 2014 at 2:55 PM, Greg Poirier <[hidden email]> wrote:

I mentioned this on Twitter and a couple of people have requested that I bring this up on the mailing list.

 
It seems to be a given that RabbitMQ was not designed for the batch processing use case (i.e. using RabbitMQ as a buffer between large serial steps). We have a system in place that attempts to do just that, however.
 
It is not a 'given' as far as we are concerned. We have some processes that result in a million or more messages being queued within a minute or so. These messages are processed over the ensuing several minutes (for 'dismissals' of news items from individual devices) to several hours (for lower-priority individualized  'offers'). This is the new 'batch'.
 
 
I have been working with the developers of the software involved in an attempt to help them redesign around a more ideal use of RabbitMQ (or to help them move to a different bus altogether -- database or something like kafka) and some of them have been able to simply operate in smaller batch sizes (thus keeping their queues relatively small).
 
We put large message bodies in S3 and pass them by reference. We never use RabbitMQ persistence and compensate for that with replication. For 'real' persistence we use Cassandra. Most importantly, none of our internal users know this, as we provide them with an abstracted interface.
 
 
However, I cannot stem the tide of improper RabbitMQ use.
 
We try to make it easier to use us than not. We work hard to be the most reliable, fastest, most scalable, most flexible and cheapest component of our customers technology mix.
 
 
When things go poorly, millions of messages end up in the queues. 
 
We target zero length queues. If they grow unexpectedly we: 1) autoscale, 2) shift load, 3) start new regions - usually all those. Then we diagnose.
 
 
In 3.1.x we saw this regularly cause our clusters to partition.
 
We have never had a partition in production because we always overprovision RabbitMQ so it can maintain cluster communications. We basically avoid disk IO due to the risk of IO wait interfering w the cluster heartbeat.
 
 
In 3.1.x and 3.2.x when we would delete large queues (5+ million messages enqueued), this would cause the cluster to become unresponsive, run out of memory, and then crash.
 
When we have tested situations like this, we found it best to just wipe out the cluster and restart. Before doing this, we shift the load to other regions operating in parallel.
 
 
During the 3.1 -> 3.2 upgrade, we had to completely rebuild our clusters. When 3.2 came up, it soon crashed.
 
We have not had that problem.
 
 
In the most recent upgrade, we saw a 3.2.3 cluster in our dev environment crash. I performed an opportunistic upgrade to 3.3.1, because hey... downtime already, so let's see if 3.3.1 addresses some of the issues we've been seeing.
 
 
After the upgrade, 3.3.1 would not startup at all. I removed /var/lib/rabbitmq/mnesia on all of the nodes and brought RabbitMQ back up.
 
We are not yet in production w 3.3.1 but 3.2.4 is running solidly in stage and we will upgrade stage to 3.3.1 this coming week.
 
 
3.3.1 has been up and running alright so far, but we haven't done another end-to-end test in our development environment in a while. One of these tests can lead to at least a million messages in the queue over a period of time on average.
 
A million is not that many - depending on size of course. As I said - our target is 0, but really the question is: what's your rate of change? I try to have enough 'headroom' to easily handle the surges - volumes can vary 20 to 1 depending on the news of the moment etc. If a queue builds and stays high we add resources until it goes down and then investigate.
 
 
So, I guess my question is:
 
If I know that I have people using RabbitMQ like this, and there is nothing I can do to change that fact... what do I do?
 
You need enough resource. And it is good to be able to autoscale. 
 
A specific suggestion I would make for any internal service provider is to use an amqp proxy. We locate proxy clusters that we control in our internal customers' computing environments. They publish to and subscribe from these proxies. We control the shoveling/federation of the proxies to/from our core pipelines in regions, redirecting as needed. The proxies are an additional buffer and also allow us to 'launder' incoming messages, e.g. by forcing persistence off.
 
We also track and account for every message using metadata, and can charge back... We are cheap but not free.
 
Anyway, I hope this helps.
 
ml
 

_______________________________________________
rabbitmq-discuss mailing list
[hidden email]
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

_______________________________________________ 
rabbitmq-discuss mailing list 
[hidden email] 
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

RabbitMQ and batch processing 批提交的更多相关文章

  1. [转] 深入理解Batch Normalization批标准化

    转自:https://www.cnblogs.com/guoyaohua/p/8724433.html 郭耀华's Blog 欲穷千里目,更上一层楼项目主页:https://github.com/gu ...

  2. SAP OData $batch processing

    例として.1回の呼び出しで100個の新しい商品を作成したい場合.最も簡単な方法は.$ batch要求を使用して100個のPOST呼び出しすべてを単一のサービス呼び出しにまとめることです. URIの末尾 ...

  3. 转载-【深度学习】深入理解Batch Normalization批标准化

      全文转载于郭耀华-[深度学习]深入理解Batch Normalization批标准化:   文章链接Batch Normalization: Accelerating Deep Network T ...

  4. 把SAS批提交添加到鼠标右键

    下载注册表管理工具:RegSeeker Portable v2.57 中文绿色便携版 在RegSeeker中搜索:batch

  5. 莫烦课程Batch Normalization 批标准化

    for i in range(N_HIDDEN): # build hidden layers and BN layers input_size = 1 if i == 0 else 10 fc = ...

  6. Spring Batch 跑批框架

    SpringBatch的框架包括启动批处理作业的组件和存储Job执行产生的元数据. 如果作为一个批处理应用程序的开发人员,你暂时没有必要跟这些组件打交道, 因为它们主要为我们提供组件支持的角色,但是您 ...

  7. 【深度学习】深入理解Batch Normalization批标准化

    这几天面试经常被问到BN层的原理,虽然回答上来了,但还是感觉答得不是很好,今天仔细研究了一下Batch Normalization的原理,以下为参考网上几篇文章总结得出. Batch Normaliz ...

  8. [转载]深入理解Batch Normalization批标准化

    文章转载自:http://www.cnblogs.com/guoyaohua/p/8724433.html Batch Normalization作为最近一年来DL的重要成果,已经广泛被证明其有效性和 ...

  9. (十三)Batch Processing

    In addition to being able to index, update, and delete individual documents, Elasticsearch also prov ...

随机推荐

  1. SQL SERVER-LinkServer搬迁

    选中linkserver,按F7打开对象游览器, 选中linkserver,生成脚本. 把密码填入脚本运行即可 USE [master] GO /****** Object: LinkedServer ...

  2. DataTable通过Select进行过滤

    DataTable方法测试 //测试DataTable的select DataTable dt = new DataTable(); //a.OrderType, //a.[Status] dt.Co ...

  3. STM32 LoRaWAN探索板B-L072Z-LRWAN1入门指南

    UM2159用户手册 基于STM32L0的超低功耗LoRa探索套件入门指南 前言 LoRa 探索套件(B-L072Z-LRWAN1)是一款RF探索开发板,采用了Murata公司的LoRa模块CMWX1 ...

  4. 剑指Offer(二十九):最小的K个数

    剑指Offer(二十九):最小的K个数 搜索微信公众号:'AI-ming3526'或者'计算机视觉这件小事' 获取更多算法.机器学习干货 csdn:https://blog.csdn.net/baid ...

  5. linux网络编程之posix线程(一)

    今天继续学习posix IPC相关的东东,消息队列和共享内存已经学习过,接下来学习线程相关的知识,下面开始: [注意]:创建失败这时会返回错误码,而通常函数创建失败都会返回-1,然后错误码会保存在er ...

  6. 【二叉搜索树】PAT-天梯赛- L2-004. 这是二叉搜索树吗?

    大致题意: 一棵二叉搜索树可被递归地定义为具有下列性质的二叉树:对于任一结点,    其左子树中所有结点的键值小于该结点的键值:    其右子树中所有结点的键值大于等于该结点的键值:    其左右子树 ...

  7. 【HDU-1045,Fire Net-纯暴力简单DFS】

    原题链接:点击!   大致题意:白块表示可以放置炮台的位置——每个炮台可以攻击到上下左右的直线上的炮台(也就是说在它的上下左右直线上不可以再放置炮台,避免引起互相攻击),黑块表示隔离墙的位置——不可放 ...

  8. can't assign to struct fileds in map

    原文: https://haobook.readthedocs.io/zh_CN/latest/periodical/201611/zhangan.html --------------------- ...

  9. 织梦阿里云OSS解决方案

    准备工作 申请OSS账号,并且创建一个public-read的bucket.这里需要权限为public-read是因为后面需要匿名访问. 详细步骤 1.开启织梦远程附件功能2.现在织梦还有远程附件还有 ...

  10. Git报错:Permission denied (publickey)

    Git在克隆的时候报错.Permission denied (publickey). 报错 Permission denied (publickey) 具体如下: 原因:没有将自己的电脑的SSH ke ...