http://azkaban.github.io/azkaban/docs/2.5/

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

【solve the problem of Hadoop job dependencies】

Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products.

Initially a single server solution, with the increased number of Hadoop users over the years, Azkaban has evolved to be a more robust solution.

Azkaban consists of 3 key components:

  • Relational Database (MySQL)
  • AzkabanWebServer
  • AzkabanExecutorServer

Relational Database (MySQL)

Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.

How does AzkabanWebServer use the DB?

The web server uses the db for the following reasons:

  • Project Management - The projects, the permissions on the projects as well as the uploaded files.
  • Executing Flow State - Keep track of executing flows and which Executor is running them.
  • Previous Flow/Jobs - Search through previous executions of jobs and flows as well as access their log files.
  • Scheduler - Keeps the state of the scheduled jobs.
  • SLA - Keeps all the sla rules

How does the AzkabanExecutorServer use the DB?

The executor server uses the db for the following reasons:

  • Access the project - Retrieves project files from the db.
  • Executing Flows/Jobs - Retrieves and updates data for flows and that are executing
  • Logs - Stores the output logs for jobs and flows into the db.
  • Interflow dependency - If a flow is running on a different executor, it will take state from the DB.

There is no reason why MySQL was chosen except that it is a widely used DB. We are looking to implement compatibility with other DB's, although the search requirement on historically running jobs benefits from a relational data store.

AzkabanWebServer

The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.

Using Azkaban is easy. Azkaban uses *.job key-value property files to define individual tasks in a work flow, and the _dependencies_ property to define the dependency chain of the jobs. These job files and associated code can be archived into a *.zip and uploaded through the web server through the Azkaban UI or through curl.

AzkabanExecutorServer

Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server. There were several reasons for splitting these services: we will soon be able to scale the number of executions and fall back on operating Executors if one fails. Also, we are able to roll our upgrades of Azkaban with minimal impact on the users. As Azkaban's usage grew, we found that upgrading Azkaban became increasingly more difficult as all times of the day became 'peak'.

【 no cyclical dependencies detected】

Select the archive file of your workflow files that you want to upload. Currently Azkaban only supports *.zip files. The zip should contain the *.job files and any files needed to run your jobs. Job names must be unique in a project.

Azkaban will validate the contents of the zip to make sure that dependencies are met and that there's no cyclical dependencies detected. If it finds any invalid flows, the upload will fail.

Uploads overwrite all files in the project. Any changes made to jobs will be wiped out after a new zip file is uploaded.

After a successful upload, you should see all of your flows listed on the screen.

http://oozie.apache.org/

Apache Oozie Workflow Scheduler for Hadoop

Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Azkaban_Oozie_action的更多相关文章

随机推荐

  1. 组队训练3回放 ——hnqw1214

    组队训练3回放 练习赛过程回放: 开场先看最后一题, 发现是专题训练时做过的网络流原题, cst照着之前的打一遍,第一遍WA, 发现数组开小了,改大后AC. 这时候qw看B题, 一开始想不到方法, c ...

  2. Codeforces 371B Fox Dividing Cheese(简单数论)

    题目链接 Fox Dividing Cheese 思路:求出两个数a和b的最大公约数g,然后求出a/g,b/g,分别记为c和d. 然后考虑c和d,若c或d中存在不为2,3,5的质因子,则直接输出-1( ...

  3. Akka之Actor生命周期

    我们首先来看一下官方给出的Actor的声明周期的图: 在上图中,Actor系统中的路径代表一个地方,其可能会被活着的Actor占据.最初路径都是空的.在调用actorOf()时,将会为指定的路径分配根 ...

  4. windows内核实现的34个关键问题

    http://book.kongfz.com/237217/670391178/#bookComm

  5. mysql之创建数据库,创建数据表

    写在前面 项目中用到mysql数据库,之前也没用过mysql,今天就学下mysql的常用的语法,发现跟sql server的语法极其相似.用起来还是蛮简单的. 一个例子 1.创建一个名为School的 ...

  6. IIS7.5上安装Git服务器

    系统要求: IIS 7及以上 .NET FrameWork 4.5 ASP.NET 4以上 安装步骤: 从Bonobo官网下载最新版本的BonoboService: 解压下载的zip包: 在IIS中新 ...

  7. 泽熙学到的 z

    叶展,原泽熙投资总经理助理,现任齐鲁证券资产管理公司总裁助理,齐鲁星空.星汉等集合理财投资经理. 导读:三年前,我加入了泽熙投资,正式成为一名职业投资者.做职业投资者一直是我的理想.在股市中用眼光和头 ...

  8. dedecms 调取当前栏目的链接和 栏目名称

    <a href="{dede:field name='typeurl' function=”GetTypeName(@me)”/}" target="_blank& ...

  9. 一种全新的自动调用ajax方法

    <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta http ...

  10. 3747: [POI2015]Kinoman|线段树

    枚举左区间线段树维护最大值 #include<algorithm> #include<iostream> #include<cstdlib> #include< ...