pipelinewise 提供了方便的创建简单pipeline的命令,可以简化pipeline 的创建,同时也可以帮我们学习

生成demo pipeline

pipelinewise init --name pipelinewise_samples

效果

一个简单的pg 2 pg 的demo

  • 生成tap 以及target yaml 配置
    直接使用的demo 配置文件
cp tap_postgres.yml.sample tap_postgres.yml
cp target_postgres.yml.sample target_postgres.yml
  • 修改配置
    tap
 
 ---
# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "postgres_sample" # Unique identifier of the tap
name: "Sample Postgres Database" # Name of the tap
type: "tap-postgres" # !! THIS SHOULD NOT CHANGE !!
owner: "somebody@foo.com" # Data owner to contact
# ------------------------------------------------------------------------------
# Source (Tap) - PostgreSQL connection details
# ------------------------------------------------------------------------------
db_conn:
  host: "localhost" # PostgreSQL host
  port: 15432 # PostgreSQL port
  user: "pipelinewise" # PostfreSQL user
  password: "secret" # Plain string or vault encrypted
  dbname: "postgres_source_db" # PostgreSQL database name
  #filter_schemas: "schema1,schema2" # Optional: Scan only the required schemas
                                       # to improve the performance of
                                       # data extraction
# ------------------------------------------------------------------------------
# Destination (Target) - Target properties
# Connection details should be in the relevant target YAML file
# ------------------------------------------------------------------------------
target: "postgres_dwh" # ID of the target connector where the data will be loaded
batch_size_rows: 20000 # Batch size for the stream to optimise load performance
# ------------------------------------------------------------------------------
# Source to target Schema mapping
# ------------------------------------------------------------------------------
schemas:
  - source_schema: "public" # Source schema in postgres with tables
    target_schema: "repl_pg_public" # Target schema in the destination Data Warehouse
    target_schema_select_permissions: # Optional: Grant SELECT on schema and tables that created
      - grp_stats
    # List of tables to replicate from Postgres to destination Data Warehouse
    #
    # Please check the Replication Strategies section in the documentation to understand the differences.
    # For LOG_BASED replication method you might need to adjust the source mysql/ mariadb configuration.
    tables:
      - table_name: "city"
        replication_method: "FULL_TABLE" # One of INCREMENTAL, LOG_BASED and FULL_TABLE
        replication_key: "last_update" # Important: Incremental load always needs replication key
        # OPTIONAL: Load time transformations
        #transformations:                    
        # - column: "last_name" # Column to transform
        # type: "SET-NULL" # Transformation type
      # You can add as many tables as you need...
      - table_name: "country"
        replication_method: "FULL_TABLE" # Important! Log based must be enabled in PostgreSQL
  # You can add as many schemas as you need...
  # Uncommend this if you want replicate tables from multiple schemas
  #- source_schema: "another_schema_in_postgres" 
  # target_schema: "another
 
 

target

---
# ------------------------------------------------------------------------------
# General Properties
# ------------------------------------------------------------------------------
id: "postgres_dwh" # Unique identifier of the target
name: "Postgres Data Warehouse" # Name of the target
type: "target-postgres" # !! THIS SHOULD NOT CHANGE !!
# ------------------------------------------------------------------------------
# Target - Data Warehouse connection details
# ------------------------------------------------------------------------------
db_conn:
  host: "localhost" # Postgres host
  port: 15433 # Postgres port
  user: "pipelinewise" # Postgres user
  password: "secret" # Plain string or vault encrypted
  dbname: "postgres_dwh" # Postgres database name
 
 

激活pipeline

  • 激活部署的服务
pipelinewise import --dir pipelinewise_samples

效果

 2019-09-17 05:07:55 INFO: Searching YAML config files in /app/wrk
2019-09-17 05:07:55 INFO: LOADING TARGET: target_postgres.yml
2019-09-17 05:07:55 INFO: LOADING TAP: tap_postgres.yml
2019-09-17 05:07:55 INFO: SAVING CONFIG
2019-09-17 05:07:55 INFO: SAVING MAIN CONFIG JSON to /root/.pipelinewise/config.json
2019-09-17 05:07:55 INFO: SAVING TARGET JSONS to /root/.pipelinewise/postgres_dwh/config.json
2019-09-17 05:07:55 INFO: SAVING TAP JSONS to /root/.pipelinewise/postgres_dwh/postgres_sample
2019-09-17 05:07:55 INFO: ACTIVATING TAP STREAM SELECTIONS...
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 4 concurrent workers.
2019-09-17 05:07:55 INFO: Discovering postgres_sample (tap-postgres) tap in postgres_dwh (target-postgres) target...
2019-09-17 05:07:56 INFO: Loading pre defined selection from /root/.pipelinewise/postgres_dwh/postgres_sample/selection.json
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-edgydata tap_stream_id as not selected
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-city tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-city'}
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-country tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-country'}
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-countrylanguage tap_stream_id as not selected
2019-09-17 05:07:56 INFO: Loading pre defined selection from /root/.pipelinewise/postgres_dwh/postgres_sample/selection.json
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-edgydata tap_stream_id as not selected
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-city tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-city'}
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-country tap_stream_id as selected with properties {'replication_method': 'FULL_TABLE', 'tap_stream_id': 'postgres_source_db-public-country'}
2019-09-17 05:07:56 INFO: Mark postgres_source_db-public-countrylanguage tap_stream_id as not selected
2019-09-17 05:07:56 INFO: Writing new properties file with changes into /root/.pipelinewise/postgres_dwh/postgres_sample/properties.json
[Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.3s
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 0.3s finished
2019-09-17 05:07:56 INFO: 
            -------------------------------------------------------
            IMPORTING YAML CONFIGS FINISHED
            -------------------------------------------------------
                Total targets to import : 1
                Total taps to import : 1
                Taps imported successfully : 1
                Taps failed to import : []
                Runtime : 0:00:00.409421
            -------------------------------------------------------
 
 
  • 查看状态
pipelinewise status

效果

Tap ID Tap Type Target ID Target Type Enabled Status Last Sync Last Sync Result
--------------- ------------ ------------ --------------- --------- -------- ----------- ------------------
postgres_sample tap-postgres postgres_dwh target-postgres True ready unknown

运行pipeline

  • 执行命令
pipelinewise run_tap --tap postgres_sample --target postgres_dwh

效果:

2019-09-17 05:08:36 INFO: Running postgres_sample tap in postgres_dwh target
2019-09-17 05:08:36 INFO: No table available that needs to be sync by fastsync
2019-09-17 05:08:36 INFO: Table(s) selected to sync by singer: ['postgres_source_db-public-city', 'postgres_source_db-public-country']
2019-09-17 05:08:36 INFO: Writing output into /root/.pipelinewise/postgres_dwh/postgres_sample/log/postgres_dwh-postgres_sample-20190917_050836.singer.log

数据库效果

  • 查看状态
 
pipelinewise status

参考资料

https://transferwise.github.io/pipelinewise/installation_guide/creating_pipelines.html
https://transferwise.github.io/pipelinewise/installation_guide/running_pipelines.html

pipelinewise 学习二 创建一个简单的pipeline的更多相关文章

  1. micronaut 学习 二 创建一个简单的服务

    micronaut 提供的cli 很方便,我们可以快速创建具有所需特性的应用,以下是一个简单的web server app 创建命令 mn create-app hello-world 效果 mn c ...

  2. Python框架学习之用Flask创建一个简单项目

    在前面一篇讲了如何创建一个虚拟环境,今天这一篇就来说说如何创建一个简单的Flask项目.关于Flask的具体介绍就不详细叙述了,我们只要知道它非常简洁.灵活和扩展性强就够了.它不像Django那样集成 ...

  3. 使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(二)(代码篇)

    这篇是上一篇的延续: 用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(一) 源代码在github上可以下载,地址:https://github.com/guoxia ...

  4. BitAdminCore框架应用篇:(二)创建一个简单的增删改查模块

    NET Core应用框架之BitAdminCore框架应用篇系列 框架演示:http://bit.bitdao.cn 框架源码:https://github.com/chenyinxin/cookie ...

  5. [WCF学习笔记] 我的WCF之旅(1):创建一个简单的WCF程序

    近日学习WCF,找了很多资料,终于找到了Artech这个不错的系列.希望能从中有所收获. 本文用于记录在学习和实践WCF过程中遇到的各种基础问题以及解决方法,以供日后回顾翻阅.可能这些问题都很基础,可 ...

  6. 使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(三)(错误整理篇)

    使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(一) 使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(二) 以上两篇已经把流 ...

  7. 使用ssm(spring+springMVC+mybatis)创建一个简单的查询实例(一)

    梳理下使用spring+springMVC+mybatis 整合后的一个简单实例:输入用户的 ID,之后显示用户的信息(此次由于篇幅问题,会分几次进行说明,此次是工程的创建,逆向生成文件以及这个简单查 ...

  8. django创建一个简单的web站点

    一.新建project 使用Pycharm,File->New Project…,选择Django,给project命名 (project不能用test命名)   新建的project目录如下: ...

  9. LINUX内核分析第三周学习总结——构造一个简单的Linux系统MenuOS

    LINUX内核分析第三周学习总结——构造一个简单的Linux系统MenuOS 张忻(原创作品转载请注明出处) <Linux内核分析>MOOC课程http://mooc.study.163. ...

随机推荐

  1. Java学习:数据类型转换注意事项

    数据类型的转换 当数据类型不一样时,将会发生数据类型转换. 自动类型转换(隐式) 1.特点 :代码不需要进行特殊处理,自动完成. 2.规则 :数据范围从小到大. //左边是long类型,右边是默认的i ...

  2. golang 学习笔记 --基本类型

    字符串值表示了一个一个字符值的集合,在底层,一个字符串值即一个包含了若干字节的序列,长度为0的序列与一个空字符串对应.字符串的长度即底层字节列中字节的个数. 字符串值是不可变的,对字符串的操作只会返回 ...

  3. PTA 甲级 1139

    https://pintia.cn/problem-sets/994805342720868352/problems/994805344776077312 其实这道题目不难,但是有很多坑点! 首先数据 ...

  4. SUSE12Sp3-安装DockerCE和Docker-compose

    最近在写脚本.发现还是很方便的. Docker下载地址:https://download.docker.com/linux/static/stable/x86_64/ 执行以下脚本即可安装完毕. #! ...

  5. 【转】socket通信-C#实现udp通讯

    在日常碰到的项目中,有些场景下不适合使用tcp常连接,而需要靠UDP无连接的数据收发.那么如何使用SharpSocket完成UDP收发数据呢?其中要掌握的关键点是什么呢? 点击查看原博文内容

  6. c#结束练习题

    1.输入一个秒数,输出对应的小时.分钟.秒. 例:输入“4000“(秒),输出“1小时6分40秒”. 2.计算1-1/2+1/3-1/4+...-1/100的值. 3.写一个函数,对一个一维数组排序. ...

  7. linux环境:FTP服务器搭建

    转载及参考至:https://www.linuxprobe.com/chapter-11.html https://www.cnblogs.com/lxwphp/p/8916664.html 感谢原作 ...

  8. ubuntu 开启ip包转发

    1. 开启IP转发 //临时 # echo "1"> /proc/sys/net/ipv4/ip_forward //永久 # nano /etc/sysctl.conf n ...

  9. Impala快速入门

    一.简介 Cloudera公司推出,提供对HDFS.Hbase数据的高性能.低延迟的交互式SQL查询功能.基于Hive使用内存计算,兼顾数据仓库,具有实时.批处理.多并发的优点.是CDH平台首选的PB ...

  10. Nginx学习(二)

    ------------恢复内容开始------------ Nginx配置文件 主配置文件结构:四部分 main block:主配置段,既全局配置段,对Http,mail都有效 event{ }事件 ...