A feature-detection example using the Intel® Threading Building Blocks flow graph

By Michael V. (Intel), Added September 9, 2011

Translate
Chinese Simplified
Chinese Traditional
English
French
German
Italian
Portuguese
Russian
Spanish
Turkish

Translate

The Intel® Threading Building Blocks ( Intel® TBB ) flow graph is fully supported in Intel® TBB 4.0. If you are unfamiliar with the flow graph, you can read an introduction here.

Figure 1 below shows a flow graph that implements a simple feature detection application. A number of images will enter the graph and two alternative feature detection algorithms will be applied to each one. If either algorithm detects a feature of interest, the image will be stored for later inspection. In this article, I’ll describe each node used in this graph, and then provide and described a complete working implementation.

Figure 1: The Intel® TBB flow graph for the feature-detection example.

In the figure, there are four different type of nodes used to construct the application: a source_node, a queue_node, two join_nodes, and several function_nodes. Before I provide a sample implementation, I’ll provide a brief overview of each node.

The first type of node is a source_node, which is shown pictorially using the symbol below. This type of node has no predecessors, and is used to generate messages that are injected into the graph. It executes a user functor (or lambda expression) to generate its output. The unfilled circle on its right side indicates that it buffers its output and that this buffer can be reserved. The source_node buffers a single item. When a buffer is reserved, a value is held for the caller until the caller either consumes or releases the value. A source_node will only invoke the user functor when there is nothing currently buffered in its single item output buffer.

The second type of node is a queue_node, which is show using the figure below. A queue_node is an unbounded first-in first-out buffer. Like the source_node, its output is reservable.

The third type of node, of which there are two variants used in the example, is the join_node. A join_node has multiple input ports and generates a single output tuple that contains a value received at each port. A join_node can use different policies at its input ports: queueing, reserving or tag_matching. A queueing join_node, greedily consumes all messages as they arrive and generates an output whenever it has at least 1 item at each input queue. A reserving join_node only attempts to generate a tuple when it can successfully reserve an item at each input port. If it cannot successfully reserve all inputs, it releases all of its reservations and will only try again when it receives a message from the port or ports it was previously unable to reserve. Lastly, a tag_matching join_node uses hash tables to buffer messages in its input ports. When it has received messages at each port that have matching keys, it creates an output tuple with these messages. Shown below are the symbol for the reserving and tag_matching join_nodes used in Figure 1.

The final node type used in this example is a function_node; it uses the symbol shown below. A function_node executes a user-provided functor or lambda expression on incoming messages, passing the return value to its successors. A function_node can be constructed with a limited or unlimited allowable concurrency level. A function_node with unlimited concurrency creates a task to apply its functor to each message as they arrive. If a function_node has limited concurrency, it will create tasks only up to its allowed concurrency level, buffering messages at its input as necessary so that they are not dropped.

To save on space, I’m going to fake the image processing parts of this example. In particular, each image will simply be an array of characters. An image that contains the character ‘A’ has a feature recognizable by algorithm A, and an image that contains the character ‘B’ has a feature recognizable by algorithm B. So in the post, I will provide the complete code to construct and execute a flow graph that has the structure shown in Figure 1, but I’ll replace the actual computations with trivial ones.

Below is the declaration of struct image, as well as the trivial implementations that can be used as the bodies of the function nodes. The function get_next_image will be used by the source_node to generate images for processing. You might note that in get_next_image, every 11th image will have a feature detectable by algorithm A and every 13th image will contain a feature detectable by algorithm B. The function preprocess_image adds a simple offset to each character, and detect_with_A and detect_with_B do the trivial search for the characters 'A' and 'B', respectively.

#include <cstring>
#include <cstdio>

const int num_image_buffers = 100;
int image_size = 10000000;

struct image {
   const int N;
   char *data;
   image();
   image( int image_number, bool a, bool b );
};

image::image() : N(image_size) {
data = new char[N];
}

image::image( int image_number, bool a, bool b ) : N(image_size) {
    data = new char[N];
    memset( data, '\0', N );
    data[0] = (char)image_number - 32;
    if ( a ) data[N-2] = 'A';
    if ( b ) data[N-1] = 'B';
}

int img_number = 0;
int num_images = 64;
const int a_frequency = 11;
const int b_frequency = 13;

image *get_next_image() {
    bool a = false, b = false;
    if ( img_number < num_images ) {
        if ( img_number%a_frequency == 0 ) a = true;
        if ( img_number%b_frequency == 0 ) b = true;
        return new image( img_number++, a, b );
    } else {
       return false;
    }
}

void preprocess_image( image *input_image, image *output_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        output_image->data[i] = input_image->data[i] + 32;
    }
}

bool detect_with_A( image *input_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == 'a' )
            return true;
    }
    return false;
}

bool detect_with_B( image *input_image ) {
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == 'b' )
            return true;
    }
    return false;
}

void output_image( image *input_image, bool found_a, bool found_b ) {
    bool a = false, b = false;
    int a_i = -1, b_i = -1;
    for ( int i = 0; i < input_image->N; ++i ) {
        if ( input_image->data[i] == 'a' ) { a = true; a_i = i; }
        if ( input_image->data[i] == 'b' ) { b = true; b_i = i; }
    }
    printf("Detected feature (a,b)=(%d,%d)=(%d,%d) at (%d,%d) for image %p:%d\n",
a, b, found_a, found_b, a_i, b_i, input_image, input_image->data[0]);
}

The code to implement the flow graph itself is shown in function main below. I will interject text in the middle of the listing of main to describe the use of the flow graph components. If you want to build this example, you can just cut and paste the code snippets above and below linearly into a single file.

int num_graph_buffers = 8;

#include "tbb/flow_graph.h"

using namespace tbb;
using namespace tbb::flow;

int main() {

First, a graph g is created. All of the nodes will belong to this single graph. A few typedefs are provided to make it easier to refer to the outputs of the join nodes:

graph g;

    typedef std::tuple< image *, image * > resource_tuple;
    typedef std::pair< image *, bool > detection_pair;
    typedef std::tuple< detection_pair, detection_pair > detection_tuple;

Next, the queue_node that holds the images buffers is created, along with the two join nodes. Again, note that the resource_join is using the reserving policy, while detection_join uses the tag_matchingpolicy. To use tag_matching, the user must provide functors that can extract the tag from the item; these appear as the additional arguments to the constructor.

    queue_node< image * > buffers( g );
    join_node< resource_tuple, reserving > resource_join( g );
    join_node< detection_tuple, tag_matching > detection_join( g,
[](const detection_pair &p) -> size_t { return (size_t)p.first; },
            [](const detection_pair &p) -> size_t { return (size_t)p.first; } );

Next, the nodes that execute the user’s code are created, including the source_node and the four function_nodes. The user’s code is passed to each node using a C++ lambda expression ( a function object could also be used ). For the most part, each lambda expression is a bit of wrapper code that calls the functions that were described earlier, obtaining inputs and creating outputs as necessary. The make_edge calls wire together the nodes as shown in Figure 1.

    source_node< image * > src( g,
                                []( image* &next_image ) -> bool {
                                    next_image = get_next_image();
                                    if ( next_image ) return true;
                                    else return false;
                                }
                              );
    make_edge(src, input_port<0>(resource_join) );
    make_edge(buffers, input_port<1>(resource_join) );

    function_node< resource_tuple, image * >
        preprocess_function( g, unlimited,
                             []( const resource_tuple &in ) -> image * {
                                 image *input_image = std::get<0>(in);
                                 image *output_image = std::get<1>(in);
                                 preprocess_image( input_image, output_image );
                                 delete input_image;
                                 return output_image;
                             }
                           );

make_edge(resource_join, preprocess_function );

    function_node< image *, detection_pair >
        detect_A( g, unlimited,
                 []( image *input_image ) -> detection_pair {
                    bool r = detect_with_A( input_image );
                    return std::make_pair( input_image, r );
                 }
               );

    function_node< image *, detection_pair >
        detect_B( g, unlimited,
                 []( image *input_image ) -> detection_pair {
                    bool r = detect_with_B( input_image );
                    return std::make_pair( input_image, r );
                 }
               );

    make_edge(preprocess_function, detect_A );
    make_edge(detect_A, input_port<0>(detection_join) );
    make_edge(preprocess_function, detect_B );
    make_edge(detect_B, input_port<1>(detection_join) );

    function_node< detection_tuple, image * >
        decide( g, serial,
                 []( const detection_tuple &t ) -> image * {
                     const detection_pair &a = std::get<0>(t);
                     const detection_pair &b = std::get<1>(t);
                     image *img = a.first;
                     if ( a.second || b.second ) {
                         output_image( img, a.second, b.second );
                     }
                     return img;
                 }
               );

make_edge(detection_join, decide);
make_edge(decide, buffers);

Because of the reserving join node at the front of the graph, the graph will remain idle until there are image buffers available in the buffers queue. The for-loop below allocates and puts buffers into the queue. After the loop, the call to g.wait_for_all() will block until the graph again becomes idle when all images are processed.

    // Put image buffers into the buffer queue
    for ( int i = 0; i < num_graph_buffers; ++i ) {
        image *img = new image;
        buffers.try_put( img );
    }
    g.wait_for_all();

When the graph is idle, all of the buffers will again be in the buffers queue. The queue_node therefore needs to be drained and the buffers deallocated.:

    for ( int i = 0; i < num_graph_buffers; ++i ) {
        image *img = NULL;
        if ( !buffers.try_get(img) )
            printf("ERROR: lost a buffer\n");
        else
            delete img;
    }
return 0;
}

I hope that this feature-detection example demonstrates how a reasonably complex flow graph that passes messages between nodes can be implemented. To learn more about the new features in Intel® Threading Building Blocks 4.0, visit http://www.threadingbuildingblocks.org or to learn more about the Intel® TBB flow graph, check-out the other blog articles at /en-us/blogs/tag/flow_graph/.

For more complete information about compiler optimizations, see our Optimization Notice.

Categories:

Tags:

flow_graph

翻译：使用tbb实现特征检测的例子的更多相关文章

Flex中如何通过showAllDataTips属性使鼠标移动到图表时显示所有的数据Tips的例子
原文 http://blog.minidx.com/2008/11/10/1616.html 接下来的例子演示了Flex中如何通过showAllDataTips属性,使鼠标移动到图表时显示所有的数据T ...
Flex中如何通过horizontalTickAligned和verticalTickAligned样式指定线图LineChart横竖方向轴心标记的例子
原文http://blog.minidx.com/2008/12/03/1669.html 接下来的例子演示了Flex中如何通过horizontalTickAligned和verticalTickAl ...
Flex中如何通过设置GridLines对象的horizontalAlternateFill样式交错显示LineSeries图表背景颜色的例子
原文 http://blog.minidx.com/2008/11/27/1652.html 接下来的例子演示了Flex中如何通过设置GridLines对象的horizontalAlternateFi ...
推荐《用Python进行自然语言处理》中文翻译-NLTK配套书
NLTK配套书<用Python进行自然语言处理>(Natural Language Processing with Python)已经出版好几年了,但是国内一直没有翻译的中文版,虽然读英文 ...
Django字符串翻译
文章出处:https://www.jb51.net/article/70077.htm Django模板使用两种模板标签,且语法格式与Python代码有些许不同. 为了使得模板访问到标签,需要将 {% ...
【Python3 爬虫】02_利用urllib.urlopen向百度翻译发送数据并返回结果
上一节进行了网页的简单抓取,接下来我们详细的了解一下两个重要的参数url与data urlopen详解 urllib.request.urlopen(url, data=None, [timeout, ...
Google 翻译如何获取 tk 参数值？
1.首先获取 TKK 参数,这个参数可以在 https://translate.google.com 网页获取, src:TKK=eval('((function(){var a\x3d2089517 ...
基于DDD的现代ASP.NET开发框架--ABP系列文章总目录
ABP相关岗位招聘:给热爱.NET新技术和ABP框架的朋友带来一个高薪的工作机会 ABP交流会录像视频:ABP架构设计交流群-7月18日上海线下交流会的内容分享(有高清录像视频的链接) 代码自动生成: ...
一个App Widget实例第一次创建时被调用
事实上已经有很多的所谓的路由框架做到这一点,我也没有去研究别的,加上一直对backbone这个框架的评价不错,所以就琢磨着怎么用它实现我所需要的SPA的url管理了. 比如,你可能会说"如果 ...

随机推荐

Verilog之串口(UART)通信
0:起始位,低电平:1~8:数据位:9:校验位,高电平:10:停止位,高电平. 波特率 “9600bps”表示每秒可以传输9600位. 波特率定时计数器由时钟频率除以波特率. 采集1~8位,忽略0.9 ...
用函数datepart获取当前日期、周数、季度
用函数datepart处理就可以了,示例:select datepart(weekday,getdate()) as 周内的第几日select datepart(week,getdate()) as ...
maven打包时使用的pom配置
<build> <plugins>  <plugin> <groupId>org.a ...
[转]框架模式 MVC 在Android中的使用
算来学习Android开发已有2年的历史了,在这2年的学习当中,基本掌握了Android的基础知识.越到后面的学习越感觉困难,一来是自认为android没啥可学的了(自认为的,其实还有很多知识科学), ...
[转载]& 引用取地址
原文地址:& 引用取地址作者:beter 引用实际上就是给同一个变量取了多个名字. 举个例子: 有个人的名字叫a,之后又改名叫b,这时a和b都是指这个人,这样b就引用了a,即 ...
解决：Could not load type 'System.ServiceModel.Activation.HttpModule' from assemb
解决:Could not load type 'System.ServiceModel.Activation.HttpModule' from assembly 'System.ServiceMode ...
零配置Socket TCP消息通讯服务容器EC
EC全称是elastic communication,是基于c#实现的Socket网络通讯服务容器,支持windows .Net和mono.通过EC容器可以让开发人员在不了解Socket网络通讯知识和 ...
Dynamic CRM 2013学习笔记（二十三）CRM JS智能提示（CRM 相关的方法、属性以及页面字段），及发布前调试
我们知道在CRM的js文件里引用XrmPageTemplate.js后,就可以实现智能提示,但每个js文件都引用太麻烦了,其实可以利用vs的功能让每个js文件自动实现智能提示CRM的js: 另外,我们 ...
WPF快速入门系列(3)——深入解析WPF事件机制
一.引言 WPF除了创建了一个新的依赖属性系统之外,还用更高级的路由事件功能替换了普通的.NET事件. 路由事件是具有更强传播能力的事件——它可以在元素树上向上冒泡和向下隧道传播,并且沿着传播路径被事 ...
TypeScript的全部资料，以后都放这儿了
很早之前就听说TypeScript了(以下简称TS),但总是用难以抽出时间给自己找到这个冠冕堂皇的理由.最近又心血来潮,打算写TS的博客了,毕竟TS核心开发者也是C#之父,像我这么热爱C#的人,怎么可 ...

翻译：使用tbb实现特征检测的例子

A feature-detection example using the Intel® Threading Building Blocks flow graph

翻译：使用tbb实现特征检测的例子的更多相关文章

随机推荐

热门专题