Processors 表示对于一种数据操作处理,在pipeline中可以应用多个Processors,
同时根据不同的执行模式,可以分为独立模式的,集群模式、边缘模式(agent),以及
帮助测试的测试Processors

独立pipelineonly

  • Record Deduplicator - Removes duplicate records.

独立&&集群pipeline

  • Aggregator - Performs aggregations and displays the results in Monitor mode and writes the results to events when enabled. This processor does not update the records being evaluated.
  • Base64 Field Decoder - Decodes Base64 encoded data to binary data.
  • Base64 Field Encoder - Encodes binary data using Base64.
  • Data Parser - Parses NetFlow or syslog data embedded in a field.
  • Delay - Delays passing a batch to the rest of the pipeline.
  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Flattener - Flattens nested fields.
  • Field Hasher - Uses an algorithm to encode sensitive data.
  • Field Masker - Masks sensitive string data.
  • Field Merger - Merges fields in complex lists or maps.
  • Field Order - Orders fields in a map or list-map root field type and outputs the fields into a list-map or list root field type.
  • Field Pivoter - Pivots data in a list, map, or list-map field and creates a record for each item in the field.
  • Field Remover - Removes fields from a record.
  • Field Renamer - Renames fields in a record.
  • Field Replacer - Replaces field values.
  • Field Splitter - Splits the string values in a field into different fields.
  • Field Type Converter - Converts the data types of fields.
  • Field Zip - Merges list data from two fields.
  • Geo IP- Returns geolocation and IP intelligence information for a specified IP address.
  • Groovy Evaluator - Processes records based on custom Groovy code.
  • HBase Lookup - Performs key-value lookups in HBase to enrich records with data.
  • Hive Metadata - Works with the Hive Metastore destination as part of the Drift Synchronization Solution for Hive.
  • HTTP Client - The HTTP Client processor sends requests to an HTTP resource URL and writes the results to a field.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • JDBC Lookup - Performs lookups in a database table through a JDBC connection.
  • JDBC Tee - Writes data to a database table through a JDBC connection, and enriches records with data from generated database columns.
  • JSON Generator - Serializes data from a field to a JSON-encoded string.
  • JSON Parser - Parses a JSON object embedded in a string field.
  • Jython Evaluator - Processes records based on custom Jython code.
  • Kudu Lookup - Performs lookups in Kudu to enrich records with data.
  • Log Parser - Parses log data in a field based on the specified log format.
  • PostgreSQL Metadata - Tracks structural changes in source data then creates and alters PostgreSQL tables as part of the Drift Synchronization Solution for PostgreSQL.
  • Redis Lookup - Performs key-value lookups in Redis to enrich records with data.
  • Salesforce Lookup - Performs lookups in Salesforce to enrich records with data.
  • Schema Generator - Generates a schema for each record and writes the schema to a record header attribute.
  • Spark Evaluator - Processes data based on a custom Spark application.
  • SQL Parser - Parses SQL queries in a string field.
  • Static Lookup - Performs key-value lookups in local memory.
  • Stream Selector - Routes data to different streams based on conditions.
  • Value Replacer (Deprecated) - Replaces existing nulls or specified values with constants or nulls.
  • Whole File Transformer - Transforms Avro files to Parquet.
  • XML Flattener - Flattens XML data in a string field.
  • XML Parser - Parses XML data in a string field.

边缘pipeline

  • Expression Evaluator - Performs calculations on data. Can also add or modify record header attributes.
  • Field Remover - Removes fields from a record.
  • JavaScript Evaluator - Processes records based on custom JavaScript code.
  • Stream Selector - Routes data to different streams based on conditions.

测试Processors

  • Dev Identity
  • Dev Random Error
  • Dev Record Creator

参考资料

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Processors/Processors_overview.html#concept_hpr_twm_jq

 
 
 
 

streamsets Processors 说明的更多相关文章

  1. StreamSets 相关文章

    相关streamsets 文章(不按顺序) 学习视频-百度网盘 StreamSets 设计Edge pipeline StreamSets Data Collector Edge 说明 streams ...

  2. streamsets 3.5 的一些新功能

    streamsets 3.5 有了一些新的特性以及增强,总之是越来越方便了,详细的可以 查看官方说明,以下简单例举一些比较有意义的. origins 新的pulsar 消费origin jdbc 多表 ...

  3. streamsets geoip 使用

    geoip 分析对于网站数据分析是很方便的 安装geoip2 下载地址 https://dev.maxmind.com/geoip/geoip2/geolite2/ 配置streamsets geoi ...

  4. streamsets stream selector 使用

    stream selector 就是一个选择器,可以方便的对于不同record 的数据进行区分,并执行不同的处理 pipeline flow stream selector 配置 local fs 配 ...

  5. StreamSets使用指南

    StreamSets使用指南 最近在调研Streamsets,照猫画虎做了几个最简单的Demo鉴于网络上相关资料非常少,做个记录. 1.简介 Streamsets是一款大数据实时采集和ETL工具,可以 ...

  6. lib/sqlalchemy/cextension/processors.c:10:20: 致命错误: Python.h:没有那个文件或目录

    本文地址:http://www.cnblogs.com/yhLinux/p/4063444.html $ sudo easy_install sqlalchemy [sudo] password fo ...

  7. BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Conferencing Processors

    BSS Audio® Introduces Full-Bandwidth Acoustic Echo Cancellation Algorithm for Soundweb London Confer ...

  8. regardless of how many processors are devoted to a parallelized execution of this program

    https://en.wikipedia.org/wiki/Amdah's_law Amdahl's law is often used in parallel computing to predic ...

  9. using 40 logical processors based on SQL Server licensing SqlServer CPU核心数限制问题

    公司服务器是120核心cpu,但是实际应用中只有40核,原因是业务部门发现服务器cpu承载30%的时候sql 就会卡死: 然后从sqlserver 去查询,cpu核心数: SELECT COUNT(1 ...

随机推荐

  1. 2017-2018 ACM-ICPC Southeastern European Regional Programming Contest (SEERC 2017) Solution

    A:Concerts 题意:给出一个串T, 一个串S,求串S中有多少个串T,可以重复,但是两个字符间的距离要满足给出的数据要求 思路:先顺序统计第一个T中的字符在S中有多少个,然后对于第二位的以及后面 ...

  2. UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position 51: ord

    1.问题描述:一个在Django框架下使用Python编写的定时更新项目,在Windows系统下测试无误,在Linux系统下测试,报如下错误: ascii codec can't decode byt ...

  3. python使用set来去重碰到TypeError: unhashable type

    新版:Python 的 unhashable type 错误分析及解决 python使用set来去重是一种常用的方法. 一般使用方法如下: # int a = [1, 2, 3, 4, 5, 1, 2 ...

  4. 常用技巧之JS判断数组中某元素出现次数

    先上代码:function arrCheck(arr){  var newArr = [];  for(var i=0;i<arr.length;i++){    var temp=arr[i] ...

  5. springcloud13---zuul

    Zuul:API  GATEWAY (服务网关): http://blog.daocloud.io/microservices-2/ 一个客户端不同的功能请求不同的微服务,那么客户端要知道所有微服务的 ...

  6. 20145312 《网络对抗》PC平台逆向破解:注入shellcode和 Return-to-libc 攻击实验

    20145312 <网络对抗>PC平台逆向破解:注入shellcode和 Return-to-libc 攻击实验 注入shellcode 实验步骤 1. 准备一段Shellcode 2. ...

  7. 2017-2018-1 Java小组-1623 第一周作业

    2017-2018-1 Java小组-1623 第一周作业 <构建之法>学习笔记及团队成员介绍 1. 学习内容 概论 个人技术和流程 软件工程师的成长 两人合作 团队和流程 敏捷流程 实战 ...

  8. HDU 6438 网络赛 Buy and Resell(贪心 + 优先队列)题解

    思路:维护一个递增队列,如果当天的w比队首大,那么我们给收益增加 w - q.top(),这里的意思可以理解为w对总收益的贡献而不是真正获利的具体数额,这样我们就能求出最大收益.注意一下,如果w对收益 ...

  9. (探讨贴)POJ 1463 树形DP解法的不正确性

    POJ1463是一个典型的树状DP题. 通常解法如下代码所示: using namespace std; ; ]; int pre[maxn]; int childcnt[maxn]; int n; ...

  10. 如何解决Nginx php 50x 错误

    SEO反馈百度爬虫经常504,一般情况下是由nginx默认的fastcgi进程响应慢引起的,但也有其他情况,这里我总结了一些解决办法供大家参考.   方法/步骤 一般50x状态码问题分析: Nginx ...