Azkaban_Oozie_action
http://azkaban.github.io/azkaban/docs/2.5/
There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】
Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.
Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.
Azkaban consists of 3 key components:
- Relational Database (MySQL)
- AzkabanWebServer
- AzkabanExecutorServer
Relational Database (MySQL)
Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.
How does AzkabanWebServer use the DB?
The web server uses the db for the following reasons:
- Project Management - The projects, the permissions on the projects as well as the uploaded files.
- Executing Flow State - Keep track of executing flows and which Executor is running them.
- Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
- Scheduler - Keeps the state of the scheduled jobs.
- SLA - Keeps all the sla rules
How does the AzkabanExecutorServer use the DB?
The executor server uses the db for the following reasons:
- Access the project - Retrieves project files from the db.
- Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
- Logs - Stores the output logs for jobs and flows into the db.
- Interflow dependency - If a flow is running on a different executor, it will take state from the DB.
There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.
AzkabanWebServer
The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.
Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.
AzkabanExecutorServer
Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.
【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.
Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.
Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.
After a successful upload, you should see all of your flows listed on the screen.
http://oozie.apache.org/
Apache Oozie Workflow Scheduler for Hadoop
Overview
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Oozie is a scalable, reliable and extensible system.
Azkaban_Oozie_action的更多相关文章
随机推荐
- Ueditor 在线编辑器使用
ueditor在线编辑器插件 地址:http://ueditor.baidu.com/website/ 试用体验: 帮助文档:http://fex.baidu.com/ueditor/ 实操 引入 ...
- Linux Redhat7 开机启动python脚本
cd /usr/lib/systemd/system touch proxy.service ##################################################### ...
- POJ 3321 Apple Tree 树状数组+DFS
题意:一棵苹果树有n个结点,编号从1到n,根结点永远是1.该树有n-1条树枝,每条树枝连接两个结点.已知苹果只会结在树的结点处,而且每个结点最多只能结1个苹果.初始时每个结点处都有1个苹果.树的主人接 ...
- Loj #6287 诗歌
link: https://loj.ac/problem/6287 一开始差点写FFT了23333,并且FFT还能算这样的三元组的数量而且还不用要求这是一个排列.... 但这太大材小用了(而且很可能被 ...
- Ruby on rails初体验(三)
继体验一和体验二中的内容,此节将体验二中最开始的目标来实现,体验二中已经将部门添加的部分添加到了公司的show页面,剩下的部分是将部门列表也添加到公司的显示页面,整体思路和体验二中相同,但是还是会有点 ...
- 网络安全---大学霸_ITDaren
http://blog.csdn.net/u014621518/article/category/2191665
- python traceback学习(转)
1. Python中的异常栈跟踪 之前在做Java的时候,异常对象默认就包含stacktrace相关的信息,通过异常对象的相关方法printStackTrace()和getStackTrace()等方 ...
- 修改Linux基本配置
1.修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=server1.cn 2.修改ip地址 vi /etc/sysconfig/netw ...
- 关于 Shiro 的权限匹配器和过滤器
项目源码:https://github.com/weimingge14/Shiro-project演示地址:http://liweiblog.duapp.com/Shiro-project/login ...
- HTML5 Canvas 绘制六叶草
注意: context.arc(横坐标,纵坐标,弧半径,起始角度,终止角度,逆顺时针);这个函数挺难用,主要原因是最后参数和角度的关系.不管文档怎么说,按我的实际经验,逆顺时针=false时,是逆时针 ...