Azkaban_Oozie_action
http://azkaban.github.io/azkaban/docs/2.5/
There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.
【solve the problem of Hadoop job dependencies】
Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.
Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.
Azkaban consists of 3 key components:
- Relational Database (MySQL)
- AzkabanWebServer
- AzkabanExecutorServer
Relational Database (MySQL)
Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.
How does AzkabanWebServer use the DB?
The web server uses the db for the following reasons:
- Project Management - The projects, the permissions on the projects as well as the uploaded files.
- Executing Flow State - Keep track of executing flows and which Executor is running them.
- Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
- Scheduler - Keeps the state of the scheduled jobs.
- SLA - Keeps all the sla rules
How does the AzkabanExecutorServer use the DB?
The executor server uses the db for the following reasons:
- Access the project - Retrieves project files from the db.
- Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
- Logs - Stores the output logs for jobs and flows into the db.
- Interflow dependency - If a flow is running on a different executor, it will take state from the DB.
There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.
AzkabanWebServer
The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.
Using Azkaban is easy. Azkaban uses *.job
key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip
and uploaded through the web server through the Azkaban UI or through curl.
AzkabanExecutorServer
Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.
【 no cyclical dependencies detected】
Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip
files. The zip should contain the *.job
files and any files needed to run your jobs. Job names must be unique in a project.
Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.
Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.
After a successful upload, you should see all of your flows listed on the screen.
http://oozie.apache.org/
Apache Oozie Workflow Scheduler for Hadoop
Overview
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Oozie is a scalable, reliable and extensible system.
Azkaban_Oozie_action的更多相关文章
随机推荐
- 基于WPF系统框架设计(10)-分页控件设计
背景 最近要求项目组成员开发一个通用的分页组件,要求是这个组件简单易用,通用性,兼容现有框架MVVM模式,可是最后给我提交的成果勉强能够用,却欠少灵活性和框架兼容性. 设计的基本思想 传入数据源,总页 ...
- XCODE 4.5 IOS多语言设置 及NSLocalizedString和NSLocalizedStringFromTable的用法。
前 些天升级到Xcode4.5,现在正在用Xcode4.5+IOS6开发项目,当使用国际化时,遇到了一点问题,之前版本Xcode上新建 Localizable.strings后,添加语言的“+”号不见 ...
- 深入解析hostname
结论:/etc/sysconfig/network 确实是hostname的配置文件,hostname的值跟该配置文件中的HOSTNAME有一定的关联关系,但是没有必然关系,hostname的值来自内 ...
- EasyMvc入门教程-基本控件说明(8)提醒导航
提醒导航顾名思义就是提醒大家注意某些文字了..请看下面的例子: 实现代码如下: @Html.Q().BlockRemind().Text("我可以作为提醒使用") 有的同学会说:这 ...
- SparkStreaming和Drools结合的HelloWord版
关于sparkStreaming的测试Drools框架结合版 package com.dinpay.bdp.rcp.service; import java.math.BigDecimal; impo ...
- TensorFlow笔记一 :测试和TFboard使用
一 .第一个TF python3.6 import tensorflow as tf x=2 y=3 node1=tf.add(x,y,name='node1') node2=tf.multiply ...
- .NET中XML 注释 SandCastle 帮助文件.hhp 使用HTML Help Workshop生成CHM文件
一.摘要 在本系列的第一篇文章介绍了.NET中XML注释的用途, 本篇文章将讲解如何使用XML注释生成与MSDN一样的帮助文件.主要介绍NDoc的继承者:SandCastle. .SandCastle ...
- nginx apache防盗链
要实现防盗链,我们就必须先理解盗链的实现原理,提到防盗链的实现原理就不得不从HTTP协议说起,在HTTP协议中,有一个表头字段叫referer,采用URL的格式来表示从哪儿链接到当前的网页或文件.换句 ...
- python 查询,子查询以及1对多查询
1.添加数据: # 方法1:对象.save() book = Book(**kwargs) book.save() # 方法2:类.create(**kwargs) Book.create(**kwa ...
- Android Design Support Library(2)- TextInputLayout的使用
原创文章,转载请注明 http://blog.csdn.net/leejizhou/article/details/50494634 这篇文章介绍下Android Design Support Lib ...