camus gobblin
####Camus is being phased out and replaced by Gobblin. For those using or interested in Camus, we suggest taking a look at Gobblin.
####For instructions on Migrating from Camus to Gobblin, please take a look at Camus → Gobblin Migration.
apache/incubator-gobblin: Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc. https://github.com/apache/incubator-gobblin
Apache Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Apache Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.
camus gobblin的更多相关文章
- kettle、Oozie、camus、gobblin
kettle简介 http://www.cnblogs.com/limengqiang/archive/2013/01/16/KettleApply1.html Oozie介绍 http://blog ...
- Gobblin编译支持CDH5.4.0
作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处 Gobblin的前身是linkedin的Camus,好多人也用过,准备用Gobblin的方式来抽 ...
- Gobblin采集kafka数据
作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处 找时间记录一下利用Gobblin采集kafka数据的过程,话不多说,进入正题 一.Gobblin ...
- 技术名词解释——Camus
由LinkedIn公司开发的消息队列同步框架,提供将Kafka(一种消息队列框架)的数据装载到Hadoop分布式文件系统(HDFS)的功能. 英文版原文出处:http://docs.confluent ...
- Camus导入中文乱码问题(源码修改、编译、部署、任务启动)
Camus使用过程中业务方反映从Kafka导入至HDFS中的数据有中文乱码问题,且业务方确认写入的数据编码为UTF-8,开始跟进. 问题重现: (1)编写代码将带有中文的字符串以编码UTF-8 ...
- 数据采集框架Gobblin简介
问题导读: Gobblin的架构设计是怎样的? Gobblin拥有哪些组建,如何实现可扩展? Gobblin采集执行流程的过程? 前面我们介绍Gobblin是用来整合各种数据源的通用型ETL框架,在某 ...
- 【原创】大数据基础之Gobblin(2)持久化kafka到hdfs
gobblin 0.10 想要持久化kafka到hdfs有很多种方式,比如flume.logstash.gobblin,其中flume和logstash是流式的,gobblin是批处理式的,gobbl ...
- Kafka实战解惑
目录 一. kafka简介二. Kafka架构方案三. Kafka安装四. Kafka Client API 4.1 Producers API 4.2 Consumers API 4.3 消息高可靠 ...
- 在LinkedIn的 Kafka 生态系统
在LinkedIn的 Kafka 生态系统 Apache Kafka是一个高度可扩展的消息传递系统,作为LinkedIn的中央数据管道起着至关重要的作用. Kafka 是在2010年在LinkedIn ...
随机推荐
- 机器学习实战之kNN算法
机器学习实战这本书是基于python的,如果我们想要完成python开发,那么python的开发环境必不可少: (1)python3.52,64位,这是我用的python版本 (2)numpy 1.1 ...
- CentOS下SWAP分区建立及释放内存详解
方法一: 一.查看系统当前的分区情况: >free -m 二.创建用于交换分区的文件: >dd if=/dev/zero of=/whatever/swap bs=block_size ( ...
- NIO Channel的学习笔记总结
摘自:http://blog.csdn.net/tsyj810883979/article/details/6876603 1.1 非阻塞模式 Java NIO非堵塞应用通常适用用在I/O读写等方 ...
- Linux上安装使用SSH
參考博客:http://blog.csdn.net/xqhrs232/article/details/50960520 Ubuntu安装使用SSH ubuntu默认并没有安装ssh服务,如果通过ssh ...
- code forces 1051 d
看的这个题解:http://www.cnblogs.com/tobyw/p/9685639.html 写的比较清楚. 矩阵类型的计数题 比赛时感觉就像是个dp,然后就跳过了. 现在看着题解写一下,感觉 ...
- 多个ajax执行混乱问题
多个ajax执行混乱问题,之前拿ajax取代iframe做响应布局(左侧点击,右侧展示),当执行多个点击事件时会造成一个页面的初始化触发另一个页面的on click的function, 将ajax调为 ...
- 安装 node-sass 的不成功
昨天安装项目依赖的包,差不多都装好了,然后就卡在了node-sass上,各种报错. 报错一.gyp ERR! stack Error: Can't find Python executable &qu ...
- oracle存储过程中使用字符串拼接
1.使用拼接符号“||” v_sql := 'SELECT * FROM UserInfo WHERE ISDELETED = 0 AND ACCOUNT =''' || vAccount || '' ...
- for 循环进化史
ECMAScript 6已经逐渐普及,经过二十多年的改进,很多功能也有了更成熟的语句,比如 for 循环 这篇博客将介绍一下从最初的 for 循环,到 ES6 的 for-of 等四种遍历方法 先定义 ...
- 边看chromium的代码,边想骂人...
这一年一直在看chromium for android的代码,边看边想骂,谷歌这帮人..一开始搞了个牛逼的架构,在安卓4.4上把以前android webkit团队的简单版替换掉了,结果发现性能大不如 ...