原文出处:http://www.lornajane.net/posts/2014/working-with-php-and-beanstalkd

Working with PHP and Beanstalkd

I have just introduced Beanstalkd into my current PHP project; it was super-easy so I thought I'd share some examples and my thoughts on how a job queue fits in with a PHP web application.

The Scenario

I have an API backend and a web frontend on this project (there may be apps later. It's a startup, there could be anything later). Both front and back ends are PHP Slim Framework applications, and there's a sort of JSON-RPC going on in between the two.

The job queue will handle a few things we don't want to do in real time on the application, such as:

  • updating counts of things like comments; when a comment is made, a job gets created and we can return to the user. At some point the job will get processed updating the counts of how many comments are on that thing, how many comments the user made, adding to a news feed of activities ... you get the idea.
  • cleaning up; we have had a few cron jobs running to clean up old data but now those cron jobs put jobs into beanstalkd which gives us a bit more visibility and control of them, and also means that those big jobs aren't running on the web servers (we have a separate worker server)
  • other periodic things like updating incoming data/content feeds or talking to some of the 3rd party APIs we use like Mailchimp and Bit.ly

Adding Jobs to the Queue

There are two ends to this process, let's start by adding jobs to the queue. Anything you don't want to make a user wait for is a good candidate for a job. As I mentioned, some of our jobs get handled periodically with cron creating jobs, but since they are just beanstalkd jobs I can easily give an admin interface to trigger them manually also. In this case, I'm just making a job to process things we update when a user makes a comment.

A good job is very self-contained; a bit like a stateless web request it should contain anything that is needed to process it and not rely on anything that went before. On a live platform you would typically have many workers all consuming jobs from a single queue so there are no guarantees that one job will be completed before the next one begins to be processed! You can put any data you like into a job; you could send all the data fields to fill in and send an email template for example.

In this example I need to talk to the database anyway so I'm just storing information about which task should be done and including the comment ID with it.

I'm using an excellent library called Pheanstalk which is well-documented and available via Composer. The lines I added to my composer.json:

  "require": {
"pda/pheanstalk": "2.1.0",
}

I start by creating an object which connects to the job server and allows me to put jobs on the queue:

    new Pheanstalk_Pheanstalk(
$config['beanstalkd']['host'] . ":" . $config['beanstalkd']['port']
)

The config settings there will change between platforms but for my development version of this project, beanstalkd is just running on my laptop so my settings are the defaults:

[beanstalkd]
host=127.0.0.1
port=11300

Once you have the object created, $queue in my example, we can easily add jobs with the put() command - but first you specify which "tube" to use. The tubes would be queues in another tool, just a way of putting jobs into different areas, and it is possible to ask the workers to listen on specific tubes so you can have specialised workers if needed. Beanstalkd also supports adding jobs with different priorities.

Here's adding the simple job to the queue; the data is just a string so I'm using json_encode to wrap up a couple of fields:

  $job = array("action" => "comment_added",
"data" => array("comment_id" => $comment_id));
$queue->useTube('mytube')->put(json_encode($job));

I wrote a bit in a previous post about how to check the current number of jobs on beanstalkd, so you can use those instructions to check that you have jobs stacking up. To use those, we'll need to write a worker.

Taking Jobs Off The Queue

The main application and the worker scripts don't need to be in the same technology stack since beanstalkd is very lightweight and technology agnostic. I'm working with an entirely PHP team though so both the application and the workers are PHP in this instance. The workers are simply command-line PHP scripts that run for a long time, picking up jobs when they become available.

For my workers I have added the Pheanstalk libraries via Composer again and then my basic worker script looks like this:

require("vendor/autoload.php");

$queue =  new Pheanstalk_Pheanstalk($config['beanstalkd']['host'] . ":" . $config['beanstalkd']['port']);

$worker = new Worker($config);

// Set which queues to bind to
$queue->watch("mytube"); // pick a job and process it
while($job = $queue->reserve()) {
$received = json_decode($job->getData(), true);
$action = $received['action'];
if(isset($received['data'])) {
$data = $received['data'];
} else {
$data = array();
} echo "Received a $action (" . current($data) . ") ...";
if(method_exists($worker, $action)) {
$outcome = $worker->$action($data); // how did it go?
if($outcome) {
echo "done \n";
$queue->delete($job);
} else {
echo "failed \n";
$queue->bury($job);
}
} else {
echo "action not found\n";
$queue->bury($job);
} }

Here you can see the Pheanstalk object again, but this time we use some different commands:

  • reserve() picks up a job from the queue and marks it as reserved so that no other workers will pick it up
  • delete() removes the job from the queue when it has been successfully completed
  • bury() marks the job as terminally failed and no workers will restart it.

The other alternative outcome is to return without a specific status - this will cause the job to be retried again later.

Once one job has been processed, the worker will pick up another, and so on. With multiple workers running, they will all just pick up jobs in turn until the queue is empty again.

The Worker class really doesn't have much that is beanstalkd-specific. The constructor connects to MySQL and also instantiates a Guzzle client which is used to hit the backend API of the application for the tasks where all the application framework and config is really needed to perform the task - we create endpoints for those and the worker has an access token so it can make the requests. Here's a snippet from the Worker class:

class Worker
{
protected $config;
protected $db;
protected $client; public function __construct($config) {
$this->config = $config;
// connect to mysql
$dsn = 'mysql:host=' . $config['db']['host'] . ';dbname=' . $config['db']['database'];
$username = $config['db']['username'];
$password = $config['db']['password'];
$this->db = new \PDO($dsn, $username, $password,
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")); $this->client = new \Guzzle\Http\Client($config['api']['url']);
} public function comment_added($data) {
$comment_sql = "select * from comments where comment_id = :comment_id";
$comment_stmt = $this->db->prepare($comment_sql);
$comment_stmt->execute(array("comment_id" => $data['comment_id']));
$comment = $comment_stmt->fetch(PDO::FETCH_ASSOC); if($comment) {
// more SQL to update various counts
}
return true;
}

There are various different tasks here that call out to either our own API backend, or to MySQL as shown here, or to something else.

Other Things You Should Probably Know

Working with workers leads me to often do either one of these:

  1. forget to start the worker and then wonder why nothing is working
  2. forget to restart the worker when I deploy new code and then wonder why nothing is working

Beanstalkd doesn't really have access control so you will want to lock down what can talk to your server on the port it listens on. It's a deliberately lightweight protocol and I like it, but do double check that it isn't open to the internet or something!

Long-running PHP scripts aren't the most robust thing in the world. I recommend running then under the tender loving care of supervisord (which I wrote about previously) - this has the added advantage of a really easy way to restart your workers and good logging. You should probably also include a lot more error handling than I have in the scripts here; I abbreviated to keep things readable.

What did I miss? If you're working with Beanstalkd and PHP and there's something I should have mentioned, please share it in the comments. This was my first beanstalkd implementation but I think it's the first of many - it was super-easy to get started!

working-with-php-and-beanstalkd的更多相关文章

  1. Beanstalkd一个高性能分布式内存队列系统

    高性能离不开异步,异步离不开队列,内部是Producer-Consumer模型的原理. 设计中的核心概念: job:一个需要异步处理的任务,是beanstalkd中得基本单元,需要放在一个tube中: ...

  2. beanstalkd 消息队列

    概况:Beanstalkd,一个高性能.轻量级的分布式内存队列系统,最初设计的目的是想通过后台异步执行耗时的任务来降低高容量Web应用系统的页面访问延迟,支持过有9.5 million用户的Faceb ...

  3. 轻量级队列beanstalkd

    一.基本Beanstalkd,一个高性能.轻量级的分布式内存队列系统,最初设计的目的是想通过后台异步执行耗时的任务来降低高容量Web应用系统的页面访问延迟,支持过有9.5 million用户的Face ...

  4. centos 安装beanstalkd

    You need to have the EPEL repo (http://www.servermom.org/2-cents-tip-how-to-enable-epel-repo-on-cent ...

  5. 【转】Beanstalkd 队列简易使用

    Beanstalkd一个高性能分布式内存队列系统   之前在微博上调查过大家正在使用的分布式内存队列系统,反馈有Memcacheq,Fqueue, RabbitMQ, Beanstalkd以及link ...

  6. 高性能分布式内存队列系统beanstalkd(转)

    beanstalkd一个高性能.轻量级的分布式内存队列系统,最初设计的目的是想通过后台异步执行耗时的任务来降低高容量Web应用系统的页面访问延迟,支持过有9.5 million用户的Facebook ...

  7. 使用Beanstalkd实现队列

    Beanstalkd可以想象成缓存当中的memcahe或者redise,将我们的队列任务放到内存中进行管理. 运行环境是在linux中,反正我的windows中没运行成功.../(ㄒoㄒ)/~~ 首先 ...

  8. Beanstalkd(ubuntu安装)

    安装Beanstalkd # apt-get install beanstalkd Unubtu 开启beanstalkd的持久化选项 #vim  /etc/default/beanstalkd 把S ...

  9. Beanstalkd介绍

    特征 优先级:任务 (job) 可以有 0~2^32 个优先级, 0 代表最高优先级,beanstalkd 采用最大最小堆 (Min-max heap) 处理任务优先级排序, 任何时刻调用 reser ...

  10. Beanstalkd

    摘要by ck:beanstalkd  和  kafka的本质区别是什么? Beanstalkd,一个高性能.轻量级的分布式内存队列系统,最初设计的目的是想通过后台异步执行耗时的任务来降低高容量Web ...

随机推荐

  1. 每天一道剑指offer-二叉树的下一个结点

    题目 每天一道剑指offer-二叉树的下一个结点 https://www.nowcoder.com/practice/ef068f602dde4d28aab2b210e859150a?tpId=13& ...

  2. 选择适用才最好 盘点MySQL备份方式

    我们要备份什么? 一般情况下, 我们需要备份的数据分为以下几种 数据 二进制日志, InnoDB事务日志 代码(存储过程.存储函数.触发器.事件调度器) 服务器配置文件 备份工具 这里我们列举出常用的 ...

  3. [转]How to add a script in a partial view in MVC4?

    本文转自:https://stackoverflow.com/questions/14114084/how-to-add-a-script-in-a-partial-view-in-mvc4 问题: ...

  4. android studio 中由于网络问题,编译错误

    由于网络原因,需要连外网实现下载相关依赖包,导致编译失败 在 build.gradle文件中 将原来是jcenter()的地址改成 maven{ url 'http://maven.aliyun.co ...

  5. 10、List、Set

    List接口 List接口的特点 *A:List接口的特点: a:它是一个元素存取有序的集合. 例如,存元素的顺序是11.22.33.那么集合中,元素的存储就是按照11.22.33的顺序完成的). b ...

  6. hdu 2049 考新郎

    假设一共有N对新婚夫妇,其中有M个新郎找错了新娘,求发生这种情况一共有多少种可能. 和之前那道题一样,是错排,但是要乘上排列数. 选对的人有C(N,M)个组合,将它们排除掉,剩下的人就是错排了 #in ...

  7. plSql读取Oracle数据库中文乱码

    新建环境变量,设置变量名:NLS_LANG,变量值:SIMPLIFIED CHINESE_CHINA.ZHS16GBK,确定即可

  8. React Native中的远程调试是不可靠的

    一.原因 当您发现rn app在关闭远程调试后,一些功能无法正常工作时,这很可能是由于设备上的JavaScript执行环境与远程调试器之间的细微差别造成的. 例如,日期问题,Date构造函数似乎接受C ...

  9. PoPo数据可视化周刊第4期

    PoPo数据可视化 聚焦于Web数据可视化与可视化交互领域,发现可视化领域有意思的内容.不想错过可视化领域的精彩内容, 就快快关注我们吧 :) 微信号:popodv_com   由于国庆节的原因,累计 ...

  10. jQuery处理JSONP

    http://www.g7blogs.com/?p=821 作为一枚前端,提起jsonp大家都不会陌生.特别是在我们组内的业务中,和服务器端交互的数据几乎都是采用这种形式.但假如要让你用原生的JS写出 ...