Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
帮助测试的测试Processors

独立pipelineonly

  • Record Deduplicator - Removes duplicate records.

独立&&集群pipeline

  • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
  • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
  • Base64 Field Encoder - Encodes binary data using Base64.
  • Data Parser - Parses NetFlow or syslog data embedded in a field.
  • Delay - Delays passing a batch to the rest of the pipeline.
  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Flattener - Flattens nested fields.
  • Field Hasher - Uses an algorithm to encode sensitive data.
  • Field Masker - Masks sensitive string data.
  • Field Merger - Merges fields in complex lists or maps.
  • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
  • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer - Replaces field values.
  • Field Splitter - Splits the string values in a field into different fields.
  • Field Type Converter - Converts the data types of fields.
  • Field Zip - Merges list data from two fields.
  • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
  • Groovy Evaluator - Processes records based on custom Groovy code.
  • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
  • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
  • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
  • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
  • JSON Generator - Serializes data from a field to a JSON-encoded string.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Jython Evaluator - Processes records based on custom Jython code.
  • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
  • Log Parser - Parses log data in a field based on the specified log format.
  • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
  • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
  • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
  • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
  • Spark Evaluator - Processes data based on a custom Spark application.
  • SQL Parser - Parses SQL queries in a string field.
  • Static Lookup - Performs key-value lookups in local memory.
  • Stream Selector - Routes data to different streams based on conditions.
  • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
  • Whole File Transformer - Transforms Avro files to Parquet.
  • XML Flattener - Flattens XML data in a string field.
  • XML Parser - Parses XML data in a string field.

边缘pipeline

  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Remover - Removes fields from a record.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • Stream Selector - Routes data to different streams based on conditions.

测试Processors

  • Dev Identity
  • Dev Random Error
  • Dev Record Creator

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

 
 
 
 

streamsets Processors 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets geoip 使用

    geoip 分析对于网站数据分析是很方便的 安装geoip2 下载地址 https://dev.maxmind.com/geoip/geoip2/geolite2/ 配置streamsets geoi ...

  4. streamsets stream selector 使用

    stream selector 就是一个选择器,可以方便的对于不同record 的数据进行区分,并执行不同的处理 pipeline flow stream selector 配置 local fs 配 ...

  5. StreamSets使用指南

    StreamSets使用指南 最近在调研Streamsets,照猫画虎做了几个最简单的Demo鉴于网络上相关资料非常少,做个记录. 1.简介 Streamsets是一款大数据实时采集和ETL工具,可以 ...

  6. lib/sqlalchemy/cextension/processors.c:10:20: 致命错误: Python.h:没有那个文件或目录

    本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...

  7. BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors

    BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...

  8. regardless of how many processors are devoted to a parallelized execution of this program

    https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...

  9. using 40 logical processors based on SQL Server licensing SqlServer CPU核心数限制问题

    公司服务器是120核心cpu,但是实际应用中只有40核,原因是业务部门发现服务器cpu承载30%的时候sql 就会卡死: 然后从sqlserver 去查询,cpu核心数: SELECT COUNT(1 ...

随机推荐

  1. # 20145118 《Java程序设计》第4周学习总结 ## 教材学习内容总结

    20145118 <Java程序设计>第4周学习总结 教材学习内容总结 本周内容为教材第六.七两张内容. 重点概念: 1.面向对象中,子类继承父类,避免重复的行为定义,是一种简化操作. 2 ...

  2. 参考sectools,每个人至少查找5种安全工具、库等信息并深入研究至少两种并写出使用教程

    1.Nessus Nessus是免费网络漏洞扫描器,它可以运行于几乎所有的UNIX平台之上.它不仅能永久升级,还免费提供多达11000种插件(但需要注册并接受EULA-acceptance--终端用户 ...

  3. ubuntu16.04解决tensorflow提示未编译使用SSE3、SSE4.1、SSE4.2、AVX、AVX2、FMA的问题【转】

    本文转载自:https://blog.csdn.net/Nicholas_Wong/article/details/70215127 rticle/details/70215127 在我的机器上出现的 ...

  4. 同样的输入,为什么Objects.hash()方法返回的hash值每次不一样?

    背景 开发过程中发现一个问题,项目中用Set保存AopMethod对象用于去重,但是发现即使往set中添加相同内容的对象,每次也能够添加成功. AopMethod类的部分代码如下: public cl ...

  5. CF_321_B_NetFlow

    CF_321_B 题面:据说题目描述是游戏王的规则,然而我并没有玩过.大概意思就是我方有m张攻击牌,敌方有n张牌(防御,攻击都有),如果一回合我方选择攻击牌(X)攻击敌方防守牌(Y)且$Vval_X ...

  6. poj 2828 Buy Tickets 树状数组

    Buy Tickets Description Railway tickets were difficult to buy around the Lunar New Year in China, so ...

  7. python 计算阶乘

    # 用for循环计算 n! sum = n=int(input('请输入n=')) ,n+): ,-): sum *= j # sum=sum*j print('%d!=%3d' %(i,sum)) ...

  8. rails安装使用版本控制器的原因。

    使用版本控制器的原因: 你没有系统根权限,所以你没有别的选择 你想要分开运行几个rails 系统 ,并且这几个rails有不同的Ruby版本.使用RVM就可以轻松做到. 没有什么新鲜的先安装xcode ...

  9. hdu1846巴什博弈

    巴什博弈:只有一堆n个物品,两个人轮流从这堆物品中取物, 规定每次至少取一个,最多取m个.最后取光者得胜. 结论:只要不能整除,那么必然是先手取胜,否则后手取胜. #include<map> ...

  10. UVALive-2966 King's Quest(强连通+二分图匹配)

    题目大意:有n个男孩和和n个女孩,已只每个男孩喜欢的女孩.一个男孩只能娶一个女孩.一个女孩只能嫁一个男孩并且男孩只娶自己喜欢的女孩,现在已知一种他们的结婚方案,现在要求找出每个男孩可以娶的女孩(娶完之 ...