http://azkaban.github.io/azkaban/docs/2.5/

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.

Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban consists of 3 key components:

  • Relational Database (MySQL)
  • AzkabanWebServer
  • AzkabanExecutorServer

Relational Database (MySQL)

Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.

How does AzkabanWebServer use the DB?

The web server uses the db for the following reasons:

  • Project Management - The projects, the permissions on the projects as well as the uploaded files.
  • Executing Flow State - Keep track of executing flows and which Executor is running them.
  • Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
  • Scheduler - Keeps the state of the scheduled jobs.
  • SLA - Keeps all the sla rules

How does the AzkabanExecutorServer use the DB?

The executor server uses the db for the following reasons:

  • Access the project - Retrieves project files from the db.
  • Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
  • Logs - Stores the output logs for jobs and flows into the db.
  • Interflow dependency - If a flow is running on a different executor, it will take state from the DB.

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

AzkabanWebServer

The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.

Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.

AzkabanExecutorServer

Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.

【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.

Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.

Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.

After a successful upload, you should see all of your flows listed on the screen.

http://oozie.apache.org/

Apache Oozie Workflow Scheduler for Hadoop

Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Azkaban_Oozie_action的更多相关文章

随机推荐

  1. TensorFlow——共享变量的使用方法

    1.共享变量用途 在构建模型时,需要使用tf.Variable来创建一个变量(也可以理解成节点).当两个模型一起训练时,一个模型需要使用其他模型创建的变量,比如,对抗网络中的生成器和判别器.如果使用t ...

  2. bzoj 5056: OI游戏

    5056: OI游戏 Time Limit: 1 Sec  Memory Limit: 64 MBSubmit: 204  Solved: 162[Submit][Status][Discuss] D ...

  3. BZOJ 1188 [HNOI2007]分裂游戏

    AC通道:http://www.lydsy.com/JudgeOnline/problem.php?id=1188 学习SG函数的过程中,我先看了一篇叫做 <2008-贾志豪-组合数学略述... ...

  4. maxwell简单部署使用

    详细资料可以参考maxwell官网  (mysql + maxwell + kafka + elasticsearch) 说明:本文主要是关于配置maxwell监听mysql的数据修改并实时将修改内容 ...

  5. Maven出现错误No plugin found for prefix 'jetty' in the current project and in the plugin groups的问题解决

    只需在maven的setting.xml文件上加入如下节点: <pluginGroups> <pluginGroup>org.mortbay.jetty</pluginG ...

  6. 剖析ifstream打开含中文路径名文件失败的原因

    http://blog.csdn.net/yukin_xue/article/details/7543423 最近写程序的时候遇到了使用ifstream打开含中文路径文件时失败的问题,在网上翻了一下, ...

  7. popcount 算法分析

    转载: http://blog.csdn.net/gaochao1900/article/details/5646211 http://www.cnblogs.com/Martinium/archiv ...

  8. 【MVC2】发布到IIS上User.Identity.Name变成空

    VS中运行时通过User.Identity.Name能取到用户名,发布到IIS上后,该值为空. 调查后发现在网站设定→[认证]中同时打开了[Windows认证]和[匿名认证], 关掉[匿名认证]后就能 ...

  9. GEM演唱会

    周六去魔都看邓紫棋演唱会,各位看官可能要问.杭州不是也有嚒.为嘛去魔都-..由于po主是逗比哈哈(- ̄▽ ̄-) 早上睡到自然醒,然后開始做午饭.吃完躺沙发上看电视,看到一点多认为应该要出发了(演唱会7 ...

  10. SQLserver字符串分割函数

    一.按指定符号分割字符串,返回分割后的元素个数,方法很简单,就是看字符串中存在多少个分隔符号,然后再加一,就是要求的结果.CREATE function Get_StrArrayLength(  @s ...