pipelinewise 学习二创建一个简单的pipeline

pipelinewise 提供了方便的创建简单pipeline的命令，可以简化pipeline 的创建，同时也可以帮我们学习

生成demo pipeline

pipelinewise init --name pipelinewise_samples

效果

一个简单的pg 2 pg 的demo

生成tap 以及target yaml 配置
直接使用的demo 配置文件

cp tap_postgres.yml.sample tap_postgres.yml

cp target_postgres.yml.sample target_postgres.yml

修改配置
tap

---

# ------------------------------------------------------------------------------

# General Properties

# ------------------------------------------------------------------------------

id: "postgres_sample" # Unique identifier of the tap

name: "Sample Postgres Database" # Name of the tap

type: "tap-postgres" # !! THIS SHOULD NOT CHANGE !!

owner: "somebody@foo.com" # Data owner to contact

# ------------------------------------------------------------------------------

# Source (Tap) - PostgreSQL connection details

# ------------------------------------------------------------------------------

db_conn:

  host: "localhost" # PostgreSQL host

  port: 15432 # PostgreSQL port

  user: "pipelinewise" # PostfreSQL user

  password: "secret" # Plain string or vault encrypted

  dbname: "postgres_source_db" # PostgreSQL database name

  #filter_schemas: "schema1,schema2" # Optional: Scan only the required schemas

                                       # to improve the performance of

                                       # data extraction

# ------------------------------------------------------------------------------

# Destination (Target) - Target properties

# Connection details should be in the relevant target YAML file

# ------------------------------------------------------------------------------

target: "postgres_dwh" # ID of the target connector where the data will be loaded

batch_size_rows: 20000 # Batch size for the stream to optimise load performance

# ------------------------------------------------------------------------------

# Source to target Schema mapping

# ------------------------------------------------------------------------------

schemas:

  - source_schema: "public" # Source schema in postgres with tables

    target_schema: "repl_pg_public" # Target schema in the destination Data Warehouse

    target_schema_select_permissions: # Optional: Grant SELECT on schema and tables that created

      - grp_stats

    # List of tables to replicate from Postgres to destination Data Warehouse

    # Please check the Replication Strategies section in the documentation to understand the differences.

    # For LOG_BASED replication method you might need to adjust the source mysql/ mariadb configuration.

    tables:

      - table_name: "city"

        replication_method: "FULL_TABLE" # One of INCREMENTAL, LOG_BASED and FULL_TABLE

        replication_key: "last_update" # Important: Incremental load always needs replication key

        # OPTIONAL: Load time transformations

        #transformations:

        # - column: "last_name" # Column to transform

        # type: "SET-NULL" # Transformation type

      # You can add as many tables as you need...

      - table_name: "country"

        replication_method: "FULL_TABLE" # Important! Log based must be enabled in PostgreSQL

  # You can add as many schemas as you need...

  # Uncommend this if you want replicate tables from multiple schemas

  #- source_schema: "another_schema_in_postgres"

  # target_schema: "another

target

---

# ------------------------------------------------------------------------------

# General Properties

# ------------------------------------------------------------------------------

id: "postgres_dwh" # Unique identifier of the target

name: "Postgres Data Warehouse" # Name of the target

type: "target-postgres" # !! THIS SHOULD NOT CHANGE !!

# ------------------------------------------------------------------------------

# Target - Data Warehouse connection details

# ------------------------------------------------------------------------------

db_conn:

  host: "localhost" # Postgres host

  port: 15433 # Postgres port

  user: "pipelinewise" # Postgres user

  password: "secret" # Plain string or vault encrypted

  dbname: "postgres_dwh" # Postgres database name

激活pipeline

激活部署的服务

pipelinewise import --dir pipelinewise_samples

效果

 2019-09-17 05:07:55 INFO: Searching YAML config files in /app/wrk

2019-09-17 05:07:55 INFO: LOADING TARGET: target_postgres.yml

2019-09-17 05:07:55 INFO: LOADING TAP: tap_postgres.yml

2019-09-17 05:07:55 INFO: SAVING CONFIG

2019-09-17 05:07:55 INFO: SAVING MAIN CONFIG JSON to /root/.pipelinewise/config.json

2019-09-17 05:07:55 INFO: SAVING TARGET JSONS to /root/.pipelinewise/postgres_dwh/config.json

2019-09-17 05:07:55 INFO: SAVING TAP JSONS to /root/.pipelinewise/postgres_dwh/postgres_sample

2019-09-17 05:07:55 INFO: ACTIVATING TAP STREAM SELECTIONS...

[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 4 concurrent workers.

2019-09-17 05:07:55 INFO: Discovering postgres_sample (tap-postgres) tap in postgres_dwh (target-postgres) target...

2019-09-17 05:07:56 INFO: Loading pre defined selection from /root/.pipelinewise/postgres_dwh/postgres_sample/selection.json

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-edgydata tap_stream_id as not selected

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-city tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-city'}

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-country tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-country'}

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-countrylanguage tap_stream_id as not selected

2019-09-17 05:07:56 INFO: Loading pre defined selection from /root/.pipelinewise/postgres_dwh/postgres_sample/selection.json

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-edgydata tap_stream_id as not selected

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-city tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-city'}

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-country tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-country'}

2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-countrylanguage tap_stream_id as not selected

2019-09-17 05:07:56 INFO: Writing new properties file with changes into /root/.pipelinewise/postgres_dwh/postgres_sample/properties.json

[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.3s

[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 0.3s finished

2019-09-17 05:07:56 INFO:

            -------------------------------------------------------

            IMPORTING YAML CONFIGS FINISHED

            -------------------------------------------------------

                Total targets to import : 1

                Total taps to import : 1

                Taps imported successfully : 1

                Taps failed to import : []

                Runtime : 0:00:00.409421

            -------------------------------------------------------

查看状态

pipelinewise status

效果

Tap ID Tap Type Target ID Target Type Enabled Status Last Sync Last Sync Result

--------------- ------------ ------------ --------------- --------- -------- ----------- ------------------

postgres_sample tap-postgres postgres_dwh target-postgres True ready unknown

运行pipeline

执行命令

pipelinewise run_tap --tap postgres_sample --target postgres_dwh

效果：

2019-09-17 05:08:36 INFO: Running postgres_sample tap in postgres_dwh target

2019-09-17 05:08:36 INFO: No table available that needs to be sync by fastsync

2019-09-17 05:08:36 INFO: Table(s) selected to sync by singer: ['postgres_source_db-public-city', 'postgres_source_db-public-country']

2019-09-17 05:08:36 INFO: Writing output into /root/.pipelinewise/postgres_dwh/postgres_sample/log/postgres_dwh-postgres_sample-20190917_050836.singer.log

数据库效果

查看状态

pipelinewise status

参考资料

https://transferwise.github.io/pipelinewise/installation_guide/creating_pipelines.html
https://transferwise.github.io/pipelinewise/installation_guide/running_pipelines.html

pipelinewise 学习二创建一个简单的pipeline的更多相关文章

micronaut 学习二创建一个简单的服务
micronaut 提供的cli 很方便,我们可以快速创建具有所需特性的应用,以下是一个简单的web server app 创建命令 mn create-app hello-world 效果 mn c ...
Python框架学习之用Flask创建一个简单项目
在前面一篇讲了如何创建一个虚拟环境,今天这一篇就来说说如何创建一个简单的Flask项目.关于Flask的具体介绍就不详细叙述了,我们只要知道它非常简洁.灵活和扩展性强就够了.它不像Django那样集成 ...
使用ssm（spring+springMVC+mybatis）创建一个简单的查询实例（二）（代码篇）
这篇是上一篇的延续: 用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(一) 源代码在github上可以下载,地址:https://github.com/guoxia ...
BitAdminCore框架应用篇：（二）创建一个简单的增删改查模块
NET Core应用框架之BitAdminCore框架应用篇系列框架演示:http://bit.bitdao.cn 框架源码:https://github.com/chenyinxin/cookie ...
[WCF学习笔记] 我的WCF之旅（1）：创建一个简单的WCF程序
近日学习WCF,找了很多资料,终于找到了Artech这个不错的系列.希望能从中有所收获. 本文用于记录在学习和实践WCF过程中遇到的各种基础问题以及解决方法,以供日后回顾翻阅.可能这些问题都很基础,可 ...
使用ssm（spring+springMVC+mybatis）创建一个简单的查询实例（三）（错误整理篇）
使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(一) 使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(二) 以上两篇已经把流 ...
使用ssm（spring+springMVC+mybatis）创建一个简单的查询实例（一）
梳理下使用spring+springMVC+mybatis 整合后的一个简单实例:输入用户的 ID,之后显示用户的信息(此次由于篇幅问题,会分几次进行说明,此次是工程的创建,逆向生成文件以及这个简单查 ...
django创建一个简单的web站点
一.新建project 使用Pycharm,File->New Project…,选择Django,给project命名 (project不能用test命名) 新建的project目录如下: ...
LINUX内核分析第三周学习总结——构造一个简单的Linux系统MenuOS
LINUX内核分析第三周学习总结——构造一个简单的Linux系统MenuOS 张忻(原创作品转载请注明出处) <Linux内核分析>MOOC课程http://mooc.study.163. ...

随机推荐

Prometheus 基于文件的服务发现
Prometheus 基于文件的服务发现官方文档:https://github.com/prometheus/prometheus/tree/master/discovery 服务发现支持: end ...
tornado之websoket
继承WebSoketHandler def open(self): # 当一个WebSoket连接建立之后被调用 def on_message(self, message): # 当客户端发送一个消息 ...
Python协程介绍（转）
原文:https://www.liaoxuefeng.com/wiki/897692888725344/923057403198272 协程,又称微线程,纤程.英文名Coroutine. 协程的概念很 ...
封装：WPF绘制曲线视图
原文:封装:WPF绘制曲线视图一.目的:绘制简单轻量级的曲线视图二.实现: 1.动画加载曲线 2.点击图例显示隐藏对应曲线 3.绘制标准基准线 4.绘制蒙板显示标准区域曲线图示例: 心电图示例: ...
Linux下通过md5sum生成MD5文件&校验MD5
生成md5值随便找个文件执行:md5sum file_name 即可生成该文件对应md5值. 也可以一次生成多个文件的md5值:md5sum file_name1 file_name2 file_ ...
linux基础命令汇总
目录 linux系统结构常用命令切换目录命令cd 文件操作 vi和vim编辑器重定向输出>和>> 管道 | &&命令执行控制网络通讯命令系统管理命令用户和 ...
comet oj #7
A 签到题题目描述多次询问,每次询问给一个值域范围 [l,r][l,r],要回答下列四个问题: 从这个范围内选出两个整数(两个数可相同), (1) 这两个数的最小公倍数最大是多少? (2) 这两个 ...
spark任务分配----TaskSchedulerImpl源码解析
TaskSchedulerImpl 上一篇讲到DAGScheduler根据shuffle依赖对作业的整个计算链划分成多个stage之后,就开始提交最后一个ResultStage,而由于stage之间的 ...
AttributeError: module 'matplotlib' has no attribute 'verbose'
AttributeError: module 'matplotlib' has no attribute 'verbose' 翻译:attributeError:模块“matplotlib”没有“ve ...
WDA演练一：用户登陆界面设计(一)
一,新建用户表: 用户和密码参考标准的.这里给用户分了几个维度,以便后面进行接下来的业务设定. 二,新建ZLY_PORTAL 程序. 除了MAIN视图外,在添加LOGON视图. 1.导入预先做好的主页 ...

pipelinewise 学习二 创建一个简单的pipeline

生成demo pipeline

一个简单的pg 2 pg 的demo

激活pipeline

运行pipeline

参考资料

pipelinewise 学习二 创建一个简单的pipeline的更多相关文章

随机推荐

热门专题

pipelinewise 学习二创建一个简单的pipeline

pipelinewise 学习二创建一个简单的pipeline的更多相关文章