Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
帮助测试的测试Processors

独立pipelineonly

  • Record Deduplicator - Removes duplicate records.

独立&&集群pipeline

  • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
  • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
  • Base64 Field Encoder - Encodes binary data using Base64.
  • Data Parser - Parses NetFlow or syslog data embedded in a field.
  • Delay - Delays passing a batch to the rest of the pipeline.
  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Flattener - Flattens nested fields.
  • Field Hasher - Uses an algorithm to encode sensitive data.
  • Field Masker - Masks sensitive string data.
  • Field Merger - Merges fields in complex lists or maps.
  • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
  • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer - Replaces field values.
  • Field Splitter - Splits the string values in a field into different fields.
  • Field Type Converter - Converts the data types of fields.
  • Field Zip - Merges list data from two fields.
  • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
  • Groovy Evaluator - Processes records based on custom Groovy code.
  • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
  • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
  • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
  • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
  • JSON Generator - Serializes data from a field to a JSON-encoded string.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Jython Evaluator - Processes records based on custom Jython code.
  • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
  • Log Parser - Parses log data in a field based on the specified log format.
  • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
  • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
  • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
  • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
  • Spark Evaluator - Processes data based on a custom Spark application.
  • SQL Parser - Parses SQL queries in a string field.
  • Static Lookup - Performs key-value lookups in local memory.
  • Stream Selector - Routes data to different streams based on conditions.
  • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
  • Whole File Transformer - Transforms Avro files to Parquet.
  • XML Flattener - Flattens XML data in a string field.
  • XML Parser - Parses XML data in a string field.

边缘pipeline

  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Remover - Removes fields from a record.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • Stream Selector - Routes data to different streams based on conditions.

测试Processors

  • Dev Identity
  • Dev Random Error
  • Dev Record Creator

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

 
 
 
 

streamsets Processors 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets geoip 使用

    geoip 分析对于网站数据分析是很方便的 安装geoip2 下载地址 https://dev.maxmind.com/geoip/geoip2/geolite2/ 配置streamsets geoi ...

  4. streamsets stream selector 使用

    stream selector 就是一个选择器,可以方便的对于不同record 的数据进行区分,并执行不同的处理 pipeline flow stream selector 配置 local fs 配 ...

  5. StreamSets使用指南

    StreamSets使用指南 最近在调研Streamsets,照猫画虎做了几个最简单的Demo鉴于网络上相关资料非常少,做个记录. 1.简介 Streamsets是一款大数据实时采集和ETL工具,可以 ...

  6. lib/sqlalchemy/cextension/processors.c:10:20: 致命错误: Python.h:没有那个文件或目录

    本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...

  7. BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors

    BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...

  8. regardless of how many processors are devoted to a parallelized execution of this program

    https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...

  9. using 40 logical processors based on SQL Server licensing SqlServer CPU核心数限制问题

    公司服务器是120核心cpu,但是实际应用中只有40核,原因是业务部门发现服务器cpu承载30%的时候sql 就会卡死: 然后从sqlserver 去查询,cpu核心数: SELECT COUNT(1 ...

随机推荐

  1. bzoj1009 / P3193 [HNOI2008]GT考试

    P3193 [HNOI2008]GT考试 设$f[i][j]$表示主串匹配到第$i$个位置,不吉利数字匹配到第$j$个位置 $g[i][j]$表示加上某数字使子串原来最多能匹配到第$i$个数字,现在只 ...

  2. Java实现文件上传到服务器(FTP方式)

    Java实现文件上传到服务器(FTP方式) 1,jar包:commons-net-3.3.jar 2,实现代码: //FTP传输到数据库服务器 private boolean uploadServer ...

  3. SQL Server 2008 添加约束

    ALTER TABLE Student --主键约束ADD CONSTRAINT PK_StuNo PRIMARY KEY (StudentNo) ALTER TABLE Student --唯一约束 ...

  4. Duilib 创建不规则窗口(转载)

    方法一: 转载:http://blog.csdn.net/chenlycly/article/details/46447297 转载:http://blog.csdn.net/harvic880925 ...

  5. 代理模式:利用JDK原生动态实现AOP

    代理模式:利用JDK原生动态实现AOP http://www.cnblogs.com/qiuyong/p/6412870.html 1.概述 含义:控制对对象的访问. 作用:详细控制某个(某类)某对象 ...

  6. SpringBoot创建多模块方式以及打包方式

    springboot重构多模块的步骤 模型层:model 持久层:persistence 表示层:web 步骤: 正常创建一个springboot项目 修改创建项目的pom文件,将jar修改为pom ...

  7. JAVA异常处理分析(中)

    在使用java异常处理机制时候我们会发现有些异常抛出后可以不需要进行抓取处理,而有些异常必须要进行抓取处理,这是个什么情况呢? 设计理念猜想:      有一些场景的异常,是可以不需要处理或是经常不会 ...

  8. 配置AD RMS的一点心得

    基本上是按照下面的连接配置的,微软写的很好 AD RMS Step-by-Step Guide http://technet.microsoft.com/en-us/library/cc753531( ...

  9. ORA-12505: TNS: 监听程序当前无法识别连接描述符中所给出的SID等错误解决方法

    程序连接orarle报ORA-12505错误 一.异常{ ORA-12505, TNS:listener does not currently know of SID given in connect ...

  10. SQL语言的增删改查

    select(查), update(改), delete(删), insert into(增)   select * from table_name 获取表中所有字段 select id, name, ...