http://azkaban.github.io/azkaban/docs/2.5/

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.

Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban consists of 3 key components:

  • Relational Database (MySQL)
  • AzkabanWebServer
  • AzkabanExecutorServer

Relational Database (MySQL)

Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.

How does AzkabanWebServer use the DB?

The web server uses the db for the following reasons:

  • Project Management - The projects, the permissions on the projects as well as the uploaded files.
  • Executing Flow State - Keep track of executing flows and which Executor is running them.
  • Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
  • Scheduler - Keeps the state of the scheduled jobs.
  • SLA - Keeps all the sla rules

How does the AzkabanExecutorServer use the DB?

The executor server uses the db for the following reasons:

  • Access the project - Retrieves project files from the db.
  • Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
  • Logs - Stores the output logs for jobs and flows into the db.
  • Interflow dependency - If a flow is running on a different executor, it will take state from the DB.

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

AzkabanWebServer

The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.

Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.

AzkabanExecutorServer

Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.

【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.

Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.

Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.

After a successful upload, you should see all of your flows listed on the screen.

http://oozie.apache.org/

Apache Oozie Workflow Scheduler for Hadoop

Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Azkaban_Oozie_action的更多相关文章

随机推荐

  1. tomcat7.0.55配置HTTP强制跳转到HTTPS

    首先需要配置好HTTPS单向或双向链接 参考: tomcat7.0.55配置单向和双向HTTPS连接(二) 然后编辑tomcat的conf目录下的web.xml 在<welcome-file-l ...

  2. 洛谷——1115 最大子段和(区间DP)

    题目描述 给出一段序列,选出其中连续且非空的一段使得这段和最大. 输入输出格式 输入格式: 输入文件maxsum1.in的第一行是一个正整数N,表示了序列的长度. 第2行包含N个绝对值不大于10000 ...

  3. NOI模拟题1 Problem A: sub

    题面 Sample Input 5 7 2 -1 -3 1 1 1 2 1 3 3 4 3 5 2 1 3 0 2 1 2 1 2 1 1 -3 2 Sample Output 2 4 5 2 HIN ...

  4. Oracle数据库搭建

  5. curl的用法

    1.官网:https://curl.haxx.se/ 2.版本安全漏洞:https://curl.haxx.se/docs/security.html 3.github:https://github. ...

  6. 2017.2.16 开涛shiro教程-第十七章-OAuth2集成(二)客户端

    原博客地址:http://jinnianshilongnian.iteye.com/blog/2018398 根据下载的pdf学习. 开涛shiro教程-第十七章-OAuth2集成 3.客户端 客户端 ...

  7. 2016.7.14 generator基于注解和基于xml自动生成代码的区别

    1.generator配置文件generatorConfig.xml的区别 2.生成代码的区别 注:二者的实体类都一样. (1)基于XML 生成的文件有: 后面省略. 也就是说,基于xml的方式,是要 ...

  8. VS (Visual Studio) 快捷键

    Ctrl + M + O: 折叠所有方法 Ctrl + M + M: 折叠或者展开当前方法 Ctrl + M + L:  展开所有方法

  9. JAVA Eclipse创建Android程序界面不显示怎么办

    一般是由于你创建的Android应用程序版本太高导致的,请设置4或以下版本,对于已有的项目,可以在属性-Android中修改目标生成的版本号                               ...

  10. centos DHCP

    yum install dhcp cat /usr/share/doc/dhcp-4.2.5/dhcpd.conf.example > /etc/dhcp/dhcpd.conf vim /etc ...