Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
帮助测试的测试Processors

独立pipelineonly

  • Record Deduplicator - Removes duplicate records.

独立&&集群pipeline

  • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
  • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
  • Base64 Field Encoder - Encodes binary data using Base64.
  • Data Parser - Parses NetFlow or syslog data embedded in a field.
  • Delay - Delays passing a batch to the rest of the pipeline.
  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Flattener - Flattens nested fields.
  • Field Hasher - Uses an algorithm to encode sensitive data.
  • Field Masker - Masks sensitive string data.
  • Field Merger - Merges fields in complex lists or maps.
  • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
  • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer - Replaces field values.
  • Field Splitter - Splits the string values in a field into different fields.
  • Field Type Converter - Converts the data types of fields.
  • Field Zip - Merges list data from two fields.
  • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
  • Groovy Evaluator - Processes records based on custom Groovy code.
  • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
  • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
  • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
  • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
  • JSON Generator - Serializes data from a field to a JSON-encoded string.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Jython Evaluator - Processes records based on custom Jython code.
  • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
  • Log Parser - Parses log data in a field based on the specified log format.
  • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
  • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
  • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
  • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
  • Spark Evaluator - Processes data based on a custom Spark application.
  • SQL Parser - Parses SQL queries in a string field.
  • Static Lookup - Performs key-value lookups in local memory.
  • Stream Selector - Routes data to different streams based on conditions.
  • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
  • Whole File Transformer - Transforms Avro files to Parquet.
  • XML Flattener - Flattens XML data in a string field.
  • XML Parser - Parses XML data in a string field.

边缘pipeline

  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Remover - Removes fields from a record.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • Stream Selector - Routes data to different streams based on conditions.

测试Processors

  • Dev Identity
  • Dev Random Error
  • Dev Record Creator

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

 
 
 
 

streamsets Processors 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets geoip 使用

    geoip 分析对于网站数据分析是很方便的 安装geoip2 下载地址 https://dev.maxmind.com/geoip/geoip2/geolite2/ 配置streamsets geoi ...

  4. streamsets stream selector 使用

    stream selector 就是一个选择器,可以方便的对于不同record 的数据进行区分,并执行不同的处理 pipeline flow stream selector 配置 local fs 配 ...

  5. StreamSets使用指南

    StreamSets使用指南 最近在调研Streamsets,照猫画虎做了几个最简单的Demo鉴于网络上相关资料非常少,做个记录. 1.简介 Streamsets是一款大数据实时采集和ETL工具,可以 ...

  6. lib/sqlalchemy/cextension/processors.c:10:20: 致命错误: Python.h:没有那个文件或目录

    本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...

  7. BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors

    BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...

  8. regardless of how many processors are devoted to a parallelized execution of this program

    https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...

  9. using 40 logical processors based on SQL Server licensing SqlServer CPU核心数限制问题

    公司服务器是120核心cpu,但是实际应用中只有40核,原因是业务部门发现服务器cpu承载30%的时候sql 就会卡死: 然后从sqlserver 去查询,cpu核心数: SELECT COUNT(1 ...

随机推荐

  1. Learning Query and Document Similarities from Click-through Bipartite Graph with Metadata

    读了一篇paper,MSRA的Wei Wu的一篇<Learning Query and Document Similarities from Click-through Bipartite Gr ...

  2. 20145221 《Java程序设计》第三周学习总结

    20145221 <Java程序设计>第三周学习总结 教材学习内容总结 第四章部分已在假期完成,详见博客: <Java程序设计>第四章-认识对象 第五章部分 何谓封装 封装实际 ...

  3. 20145319 《网络渗透》MS11-050漏洞渗透

    20145319 <网络渗透>MS11-050漏洞渗透 一 实验内容 初步掌握平台matesploit的使用 有了初步完成渗透操作的思路 了解MS11_050相关知识: - 安全公告:KB ...

  4. 学习Zookeeper之第3章Zookeeper内部原理

    第 3 章 Zookeeper 内部原理 3.1 选举机制 3.2 节点类型 3.3 stat 结构体 3.4 监听器原理   1)监听原理详解   2)常见的监听 3.5 写数据流程 第 3 章 Z ...

  5. HDU 1438 钥匙计数之一(状压DP)题解

    思路: 每个槽有4种深度,一共有2^4种状态.然后开4维来保存每一次的状态:dp[ 第几个槽 ][ 当前状态 ][ 末尾深度 ][ 是否符合要求 ]. 代码: #include<cstdio&g ...

  6. c#pdf查看器

    Free Spire.PDF for .NET is a Community Edition of the Spire.PDF for .NET, which is a totally free PD ...

  7. 一起动手打造个人娱乐级linux

    我们使用电脑,一直以来用的都是windows,但是对于像我这种爱折腾的人来说,尝试使用linux系统应该是一种不错的体验.说到linux,许多人可能都没听过,或者知道的人对它印象是这样的: 然而,li ...

  8. 连接数据库的DBUtils工具类

    import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import ...

  9. 用PendingIntent传送数据丢失解决办法

    当要设置一个闹钟时,可以把数据放在Intent里,再用intent对象生成一个PendingIntent对象,然后用AlarmManager 来邦定PendingIntent对象设置闹钟,具体代码如下 ...

  10. 【测试设计】基于正交法的测试用例设计工具--PICT

    前言 我们都知道成对组合覆盖是一种非常有效的测试用例设计方法,但是实际工作过程中当成对组合量太大,我们往往很难做到有效的用例覆盖. PICT是微软公司出品的一款成对组合命令行生成工具,它很好的解决了上 ...