简介

GoldenGate是一款可以实时投递数据到大数据平台的软件,针对apache cassandra,经过简单配置,即可实现从关系型数据将增量数据实时投递到Cassandra,以下介绍配置过程。

Cassandra安装

解压apache-cassandra-3.11.1-bin.tar.gz到 /opt/cassandra

sudo mkdir /var/lib/cassandra

sudo mkdir /var/log/cassandra

sudo chown hadoop /var/log/cassandra

sudo chown hadoop /var/lib/cassandra

启动 cassandra

/opt/cassandara/bin/Cassandra

查看状态

$ ./nodetool status

Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 127.0.0.1 92.1 KiB 256 100.0% e4f1431e-85ec-483a-a1b4-fe7bc8f7c9d2 rack1

进入Cassandra的shell

/opt/cassandara/bin/cqlsh

Connected to Test Cluster at 127.0.0.1:9042.

[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]

Use HELP for help.

cqlsh>  SELECT cluster_name, listen_address FROM system.local;
 cluster_name | listen_address
--------------+----------------
 Test Cluster |      127.0.0.1
 
(1 rows)

查看系统空间

cqlsh> desc system.local;
 
CREATE TABLE system.local (
    key text PRIMARY KEY,
    bootstrapped text,
    broadcast_address inet,
    cluster_name text,
    cql_version text,
    data_center text,
    gossip_generation int,
    host_id uuid,
    listen_address inet,
    native_protocol_version text,
    partitioner text,
    rack text,
    release_version text,
    rpc_address inet,
    schema_version uuid,
    thrift_version text,
    tokens set<text>,
    truncated_at map<uuid, blob>
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = 'information about the local node'
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 3600000
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

查看有哪些keyspaces

cqlsh>describe keyspaces;

system_schema system_auth system system_distributed system_traces

创建后面OGG投递时schema对应的keyspaces

cqlsh>CREATE KEYSPACE QASOURCE WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

创建测试表,验证一下cassandra的功能

Cqlsh>use qasource;

Cqlsh>create table test

(

id int ,

name varchar,

primary key(id)

);

插入几条记录

cqlsh:qasource> insert into test(id,name) values(1,'bck');

cqlsh:qasource> insert into test(id,name) values(2,'test');

cqlsh:qasource> select * from test;

id | name

----+------

1 | bck

2 | test

根据名称查询,由于无索引,会报错

cqlsh:qasource> select * from test where name='test';

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

基于主键可正常查询

cqlsh:qasource> select * from test where id=2;

id | name

----+------

2 | test

(1 rows)

创建索引后可正常查询

cqlsh:qasource> create index idx_test on test(name);

cqlsh:qasource> select * from test where name='test';

id | name

----+------

2 | test

(1 rows)

如果更新,则需要指定key对应的WHERE条件

cqlsh:qasource> select * from test where id=2;

id | name

----+-------

2 | newly

(1 rows)

删除记录

cqlsh:qasource> delete from test where id=2;

cqlsh:qasource> select * from test ;

id | name

----+------

1 | bck

(1 rows)

OGG配置

解压cassandra3.1.2 java driver到/u01/drivers/cassandra-java-driver-3.1.2

设置OGG for bigdata需要的环境变量

export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server

配置replicat进程

REPLICAT rcass

-- Trail file for this example is located in "AdapterExamples/trail" directory

-- Command to add REPLICAT

-- add replicat rcass, exttrail AdapterExamples/trail/tr

TARGETDB LIBFILE libggjava.so SET property=dirprm/cass.props

REPORTCOUNT EVERY 1 MINUTES, RATE

GROUPTRANSOPS 1000

MAP QASOURCE.*, TARGET QASOURCE.*;

添加投递进程,使用OGG软件中自带的队列文件

GGSCI>add replicat rcass, exttrail AdapterExamples/trail/tr

投递到cassandra需要的属性文件 cass.props

gg.handlerlist=cassandra

#The handler properties

gg.handler.cassandra.type=cassandra

gg.handler.cassandra.mode=op

gg.handler.cassandra.contactPoints=localhost

gg.handler.cassandra.ddlHandling=CREATE,ADD,DROP

gg.handler.cassandra.compressedUpdates=true

gg.handler.cassandra.cassandraMode=async

gg.handler.cassandra.consistencyLevel=ONE

goldengate.userexit.timestamp=utc

goldengate.userexit.writers=javawriter

javawriter.stats.display=TRUE

javawriter.stats.full=TRUE

gg.log=log4j

gg.log.level=INFO

gg.report.time=30sec

#Set the classpath here to the Datastax Cassandra Java Driver (3.1 latest)

#Link to the Cassandra drivers website

#http://cassandra.apache.org/doc/latest/getting_started/drivers.html#java

#Link to the Datastax Cassandra Java Driver

#https://github.com/datastax/java-driver

gg.classpath=/opt/cassandra-java-driver-3.1.2/*:/opt/cassandra-java-driver-3.1.2/lib/*

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar:./dirprm

测试

启动ogg replicat 进程

GGSCI (ol73) 12> start rcass

Sending START request to MANAGER ...

REPLICAT RCASS starting

查看状态

GGSCI (ol73) 13> info rcass

REPLICAT RCASS Last Started 2017-12-25 11:29 Status STARTING

Checkpoint Lag 00:00:00 (updated 02:56:41 ago)

Process ID 100794

Log Read Checkpoint File /u01/ogg4bd_12.3/AdapterExamples/trail/tr000000000

First Record RBA 0

GGSCI (ol73) 14> info rcass

REPLICAT RCASS Last Started 2017-12-25 11:30 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:00 ago)

Process ID 100794

Log Read Checkpoint File /u01/ogg4bd_12.3/AdapterExamples/trail/tr000000000

2015-11-06 02:45:39.000000 RBA 5660

已经正常启动,并写入数据。

统计写入的数据

GGSCI (ol73) 15> stats rcass, total

Sending STATS request to REPLICAT RCASS ...

Start of Statistics at 2017-12-25 11:31:12.

Replicating from QASOURCE.TCUSTMER to QASOURCE.TCUSTMER:

*** Total statistics since 2017-12-25 11:30:43 ***

Total inserts 5.00

Total updates 1.00

Total deletes 0.00

Total discards 0.00

Total operations 6.00

Replicating from QASOURCE.TCUSTORD to QASOURCE.TCUSTORD:

*** Total statistics since 2017-12-25 11:30:43 ***

Total inserts 5.00

Total updates 3.00

Total deletes 2.00

Total discards 0.00

Total operations 10.00

End of Statistics.

同步成功。

进入cassandra shell进行验证

./cqlsh

cqlsh> desc qasource;

CREATE KEYSPACE qasource WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE qasource.tcustord (

cust_code text,

order_date text,

product_code text,

order_id double,

product_amount bigint,

product_price double,

transaction_id double,

PRIMARY KEY (cust_code, order_date, product_code, order_id)

) WITH CLUSTERING ORDER BY (order_date ASC, product_code ASC, order_id ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE TABLE qasource.tcustmer (

cust_code text PRIMARY KEY,

city text,

name text,

state text

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

可以看到,已经增加了2张表到qasource keyspace中。

在cassandra中查询OGG写入的数据

cqlsh>use QASOURCE

cqlsh>desc keyspace QASOURCE;

查看表定义

cqlsh>desc table qasource.tcustmer;

cqlsh:qasource> desc table qasource.tcustmer;

CREATE TABLE qasource.tcustmer (

cust_code text PRIMARY KEY,

city text,

name text,

state text

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

查询数据

cqlsh>select * from qasource.tcustmer ;

cust_code | city | name | state

-----------+-------------+--------------------+-------

WILL | SEATTLE | BG SOFTWARE CO. | WA

JANE | DENVER | ROCKY FLYER INC. | CO

ANN | NEW YORK | ANN'S BOATS | NY

DAVE | TALLAHASSEE | DAVE'S PLANES INC. | FL

BILL | DENVER | BILL'S USED CARS | CO

(5 rows)

cqlsh:qasource> select * from qasource.tcustmer where cust_code='WILL';

cust_code | city | name | state

-----------+---------+-----------------+-------

WILL | SEATTLE | BG SOFTWARE CO. | WA

(1 rows)

cqlsh:qasource> select * from qasource.tcustord;

cust_code | order_date | product_code | order_id | product_amount | product_price | transaction_id

-----------+---------------------+--------------+----------+----------------+---------------+----------------

WILL | 1994-09-30 15:33:00 | CAR | 144 | 3 | 16520 | 100

BILL | 1995-12-31 15:00:00 | CAR | 765 | 3 | 14000 | 100

BILL | 1996-01-01 00:00:00 | TRUCK | 333 | 15 | 25000 | 100

(3 rows)

可以正常访问数据,测试完成。

GoldenGate实时投递数据到大数据平台(2)- Cassandra的更多相关文章

  1. GoldenGate实时投递数据到大数据平台(5) - Kafka

    Oracle GoldenGate是Oracle公司的实时数据复制软件,支持关系型数据库和多种大数据平台.从GoldenGate 12.2开始,GoldenGate支持直接投递数据到Kafka等平台, ...

  2. 大数据学习---大数据的学习【all】

    大数据介绍 什么是大数据以及有什么特点 大数据:是指无法在一定时间内用常规软件工具对其内容进行抓取.管理和处理的数据集合. 大数据是一种方法论:“一切都被记录,一切都被数字化,从数据中寻找需求,寻找知 ...

  3. 转 开启“大数据”时代--大数据挑战与NoSQL数据库技术 iteye

    一直觉得“大数据”这个名词离我很近,却又很遥远.最近不管是微博上,还是各种技术博客.论坛,碎碎念大数据概念的不胜枚举. 在我的理解里,从概念理解上来讲,大数据的目的在于更好的数据分析,否则如此大数据的 ...

  4. GoldenGate实时投递数据到大数据平台(6)– HDFS

    GoldenGate可以实时将RDBMS的数据投递到HDFS中,在前面的文章中,已经配置过投递到kafka, mongodb等数据平台,本文通过OGG for bigdata的介质中自带的示例演示实时 ...

  5. GoldenGate实时投递数据到大数据平台(3)- Apache Flume

    Apache Flume Flume NG是一个分布式.可靠.可用的系统,它能够将不同数据源的海量日志数据进行高效收集.聚合,最后存储到一个中心化数据存储系统中,方便进行数据分析.事实上flume也可 ...

  6. GoldenGate实时投递数据到大数据平台(7)– Apache Hbase

    Apache Hbase安装及运行 安装hbase1.4,确保在这之前hadoop是正常运行的.设置相应的环境变量, export HADOOP_HOME=/u01/hadoop export HBA ...

  7. GoldenGate实时投递数据到大数据平台(4)- ElasticSearch 2.x

    ES 2.x ES 2.x安装 下载elasticSearch 2.4.5, https://www.elastic.co/downloads/elasticsearch 解压下载后的压缩包,启动ES ...

  8. GoldenGate实时投递数据到大数据平台(1)-MongoDB

    mongodb安装 安装 linux下可使用apt-get install mongodb-server 或 yum install mongodb-server 进行安装. 也可以在windows上 ...

  9. [转载] 使用 Twitter Storm 处理实时的大数据

    转载自http://www.ibm.com/developerworks/cn/opensource/os-twitterstorm/ 流式处理大数据简介 Storm 是一个开源的.大数据处理系统,与 ...

随机推荐

  1. [py]python之信用卡ATM

    python之信用卡ATM 参考: http://www.cnblogs.com/wushank/p/5248916.html 他的博客写的很ok 需求介绍 模拟实现一个ATM + 购物商城程序 额度 ...

  2. 【UML】NO.48.EBook.5.UML.1.008-【UML 大战需求分析】- 组件图(Component Diagram)

    1.0.0 Summary Tittle:[UML]NO.48.EBook.1.UML.1.008-[UML 大战需求分析]- 组件图(Component Diagram) Style:DesignP ...

  3. Editplus 竖选,竖插入技巧

    竖选方法 1,Alt + C, 然后用鼠标拖选 2,按住Alt健,再用鼠标拖选 行首行尾批量添加字符 以及其它常用正则 操作:Ctrl + H, 调出查找窗口,勾选按正则表达式查询 行首批量添加   ...

  4. ansible-playbook 快速入门

    管理用户密码: --- - hosts: test tasks: - name: changed password shell: echo root:123456 | chpasswd remote_ ...

  5. windows go dll 框架

    乘着还没有添加商业功能之前,先给大家把福利分享了 希望有需要的朋友能够用的上 这个框架是在用windows平台,GO做的http/https服务,调用dll现有的库接口实现特定功能的大框架 //dll ...

  6. Mysql date,datetime的区别以及相互转换

    参考:https://blog.csdn.net/a3025056/article/details/62885104/ 在数据库中一直有这三个时间类型有点搞不太清楚. 今天就来说一下之间的区别,其实是 ...

  7. LocalStorage存储和cookie存储

    localStorage是H5的新特性,主要用来本地存储,一般浏览器支持的大小是5M,不同浏览器会有所不同,解决了cookie存储空间不足的问题. 2.使用:     ⑴.存 if(!window.l ...

  8. UVA 11796 Dog Distance(几何)

    Dog Distance [题目链接]Dog Distance [题目类型]几何 &题解: 蓝书的题,刘汝佳的代码,学习一下 &代码: // UVa11796 Dog Distance ...

  9. quartz demo01

    1,Pom.xml   加入:quartz-2.1.7.jar <dependency> <groupId>org.quartz-scheduler</groupId&g ...

  10. MySQL编码问题探究

    占个坑. 今天在向本机搭建的MySQL数据库插入中文的时候报错了. 使用 show variables like 'char%'; 及 show variables like 'collation%' ...