Introduction

I've had a lot of questions as to exactly how notifications work. This will attempt to explain exactly when and how host and service notifications are sent out, as well as who receives them.

Notification escalations are explained here.

When Do Notifications Occur?

The decision to send out notifications is made in the service check and host check logic. The calculations for whether a notification is to be sent are only triggered when processing a host or service check corresponding to that notification; they are not triggered simply because the <notification_interval> has passed since a previous notification was sent. Host and service notifications occur in the following instances...

  • When a hard state change occurs. More information on state types and hard state changes can be found here.
  • When a host or service remains in a hard non-OK state and the time specified by the <notification_interval> option in the host or service definition has passed since the last notification was sent out (for that specified host or service).

Who Gets Notified?

Each host and service definition has a <contact_groups> option that specifies what contact groups receive notifications for that particular host or service. Contact groups can contain one or more individual contacts.

When Nagios sends out a host or service notification, it will notify each contact that is a member of any contact groups specified in the <contactgroups> option of the service definition. Nagios realizes that a contact may be a member of more than one contact group, so it removes duplicate contact notifications before it does anything.

What Filters Must Be Passed In Order For Notifications To Be Sent?

Just because there is a need to send out a host or service notification doesn't mean that any contacts are going to get notified. There are several filters that potential notifications must pass before they are deemed worthy enough to be sent out. Even then, specific contacts may not be notified if their notification filters do not allow for the notification to be sent to them. Let's go into the filters that have to be passed in more detail...

Program-Wide Filter:

The first filter that notifications must pass is a test of whether or not notifications are enabled on a program-wide basis. This is initially determined by the enable_notificationsdirective in the main config file, but may be changed during runtime from the web interface. If notifications are disabled on a program-wide basis, no host or service notifications can be sent out - period. If they are enabled on a program-wide basis, there are still other tests that must be passed...

Service and Host Filters:

The first filter for host or service notifications is a check to see if the host or service is in a period of scheduled downtime. If it is in a scheduled downtime, no one gets notified. If it isn't in a period of downtime, it gets passed on to the next filter. As a side note, notifications for services are suppressed if the host they're associated with is in a period of scheduled downtime.

The second filter for host or service notification is a check to see if the host or service is flapping (if you enabled flap detection). If the service or host is currently flapping, no one gets notified. Otherwise it gets passed to the next filter.

The third host or service filter that must be passed is the host- or service-specific notification options. Each service definition contains options that determine whether or not notifications can be sent out for warning states, critical states, and recoveries. Similiarly, each host definition contains options that determine whether or not notifications can be sent out when the host goes down, becomes unreachable, or recovers. If the host or service notification does not pass these options, no one gets notified. If it does pass these options, the notification gets passed to the next filter... Note: Notifications about host or service recoveries are only sent out if a notification was sent out for the original problem. It doesn't make sense to get a recovery notification for something you never knew was a problem.

The fourth host or service filter that must be passed is the time period test. Each host and service definition has a <notification_period> option that specifies which time period contains valid notification times for the host or service. If the time that the notification is being made does not fall within a valid time range in the specified time period, no one gets contacted. If it falls within a valid time range, the notification gets passed to the next filter... Note: If the time period filter is not passed, Nagios will reschedule the next notification for the host or service (if its in a non-OK state) for the next valid time present in the time period. This helps ensure that contacts are notified of problems as soon as possible when the next valid time in time period arrives.

The last set of host or service filters is conditional upon two things: (1) a notification was already sent out about a problem with the host or service at some point in the past and (2) the host or service has remained in the same non-OK state that it was when the last notification went out. If these two criteria are met, then Nagios will check and make sure the time that has passed since the last notification went out either meets or exceeds the value specified by the <notification_interval> option in the host or service definition. If not enough time has passed since the last notification, no one gets contacted. If either enough time has passed since the last notification or the two criteria for this filter were not met, the notification will be sent out! Whether or not it actually is sent to individual contacts is up to another set of filters...

Contact Filters:

At this point the notification has passed the program mode filter and all host or service filters and Nagios starts to notify all the people it should. Does this mean that each contact is going to receive the notification? No! Each contact has their own set of filters that the notification must pass before they receive it. Note: Contact filters are specific to each contact and do not affect whether or not other contacts receive notifications.

The first filter that must be passed for each contact are the notification options. Each contact definition contains options that determine whether or not service notifications can be sent out for warning states, critical states, and recoveries. Each contact definition also contains options that determine whether or not host notifications can be sent out when the host goes down, becomes unreachable, or recovers. If the host or service notification does not pass these options, the contact will not be notified. If it does pass these options, the notification gets passed to the next filter... Note: Notifications about host or service recoveries are only sent out if a notification was sent out for the original problem. It doesn't make sense to get a recovery notification for something you never knew was a problem...

The last filter that must be passed for each contact is the time period test. Each contact definition has a <notification_period> option that specifies which time period contains valid notification times for the contact. If the time that the notification is being made does not fall within a valid time range in the specified time period, the contact will not be notified. If it falls within a valid time range, the contact gets notified!

Notification Methods

You can have Nagios notify you of problems and recoveries pretty much anyway you want: pager, cellphone, email, instant message, audio alert, electric shocker, etc. How notifications are sent depend on the notification commands that are defined in your object definition files.

 Note: If you install Nagios according to the quickstart guide, it should be configured to send email notifications. You can see the email notification commands that are used by viewing the contents of the following file: /usr/local/nagios/etc/objects/commands.cfg.

Specific notification methods (paging, etc.) are not directly incorporated into the Nagios code as it just doesn't make much sense. The "core" of Nagios is not designed to be an all-in-one application. If service checks were embedded in Nagios' core it would be very difficult for users to add new check methods, modify existing checks, etc. Notifications work in a similiar manner. There are a thousand different ways to do notifications and there are already a lot of packages out there that handle the dirty work, so why re-invent the wheel and limit yourself to a bike tire? Its much easier to let an external entity (i.e. a simple script or a full-blown messaging system) do the messy stuff. Some messaging packages that can handle notifications for pagers and cellphones are listed below in the resource section.

Notification Type Macro

When crafting your notification commands, you need to take into account what type of notification is occurring. The $NOTIFICATIONTYPE$ macro contains a string that identifies exactly that. The table below lists the possible values for the macro and their respective descriptions:

Value Description
PROBLEM A service or host has just entered (or is still in) a problem state. If this is a service notification, it means the service is either in a WARNING, UNKNOWN or CRITICAL state. If this is a host notification, it means the host is in a DOWN or UNREACHABLE state.
RECOVERY A service or host recovery has occurred. If this is a service notification, it means the service has just returned to an OK state. If it is a host notification, it means the host has just returned to an UP state.
ACKNOWLEDGEMENT This notification is an acknowledgement notification for a host or service problem. Acknowledgement notifications are initiated via the web interface by contacts for the particular host or service.
FLAPPINGSTART The host or service has just started flapping.
FLAPPINGSTOP The host or service has just stopped flapping.
FLAPPINGDISABLED The host or service has just stopped flapping because flap detection was disabled..
DOWNTIMESTART The host or service has just entered a period of scheduled downtime. Future notifications will be supressed.
DOWNTIMESTOP The host or service has just exited from a period of scheduled downtime. Notifications about problems can now resume.
DOWNTIMECANCELLED The period of scheduled downtime for the host or service was just cancelled. Notifications about problems can now resume.

Helpful Resources

There are many ways you could configure Nagios to send notifications out. Its up to you to decide which method(s) you want to use. Once you do that you'll have to install any necessary software and configure notification commands in your config files before you can use them. Here are just a few possible notification methods:

  • Email
  • Pager
  • Phone (SMS)
  • WinPopup message
  • Yahoo, ICQ, or MSN instant message
  • Audio alerts
  • etc...

Basically anything you can do from a command line can be tailored for use as a notification command.

If you're looking for an alternative to using email for sending messages to your pager or cellphone, check out these packages. They could be used in conjuction with Nagios to send out a notification via a modem when a problem arises. That way you don't have to rely on email to send notifications out (remember, email may *not* work if there are network problems). I haven't actually tried these packages myself, but others have reported success using them...

  • Gnokii (SMS software for contacting Nokia phones via GSM network)
  • QuickPage (alphanumeric pager software)
  • Sendpage (paging software)
  • SMS Client (command line utility for sending messages to pagers and mobile phones)

If you want to try out a non-traditional method of notification, you might want to mess around with audio alerts. If you want to have audio alerts played on the monitoring server (with synthesized speech), check out Festival. If you'd rather leave the monitoring box alone and have audio alerts played on another box, check out the Network Audio System (NAS) and rplay projects.

Notifications Nagios的更多相关文章

  1. Nagios学习实践系列——配置研究[监控当前服务器]

    其实上篇Nagios学习实践系列——基本安装篇只是安装了Nagios基本组件,虽然能够打开主页,但是如果不配置相关配置文件文件,那么左边菜单很多页面都打不开,相当于只是一个空壳子.接下来,我们来学习研 ...

  2. Nagios监控平台搭建

    Nagios是一款开源的免费网络监视工具,能有效监控Windows.Linux和Unix的主机状态,交换机路由器等网络设置,打印机等.在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员, ...

  3. Nagios配置文件详解

    首先要看看目前Nagios的主配置路径下有哪些文件.[root@nagios etc]# ll总用量 152-rwxrwxr-x. 1 nagios nagios 1825 9月  24 14:40 ...

  4. Linux下Nagios的安装与配置[转]

    一.Nagios简介 Nagios是一款开源的电脑系统和网络监视工具,能有效监控Windows.Linux和Unix的主机状态,交换机路由器等网络设置,打印机等.在系统或服务状态异常时发出邮件或短信报 ...

  5. linux项目-之监控-nagios

    nagios core plugins 对象 主机(交换机,路由器,防火墙,服务器,虚拟机等),主机组 服务(主机上提供的服务如80,3306,1521,21等)/资源(cpu,内存使用情况,磁盘,网 ...

  6. Nagios页面介绍(四)

    四.nagios页面介绍 Nagios 4.0.8版本登录后图片

  7. Nagios告警和监控主机安装介绍(三)

    Nagios邮件告警 配置sendEmail 解压缩tar –zxvf sendEmail-v1.56.tar.gz cd sendEmail-v1.56 将可执行程序复制cp sendEmail / ...

  8. Nagios配置和命令介绍(二 )

    Nagios配置 Nagios 主要用于监控一台或者多台本地主机及远程的各种信息,包括本机资源及对外的服务等.默认的Nagios 配置没有任何监控内容,仅是一些模板文件.若要让Nagios 提供服务, ...

  9. Nagios监控Oralce

    一.本文说明: 本文是监控本地的Oracle,其实监控远端的Oracle也是跟下面的步骤差不多的. 二.安装Nagios.Nagios插件.NRPE软件: 安装步骤可以参考<Linux下Nagi ...

随机推荐

  1. JSP之Cookie

    Cookie是小段的文本信息,在网络服务器上生成,并发送给浏览器,通过使用cookie可以标识用户身份,记录用户名和密码,跟踪重复等. 首先创建index.jsp: <%@page import ...

  2. asp.net中C#对象与方法 属性详解

    C#对象与方法 一.相关概念: 1.对象:现实世界中的实体 2. 类:具有相似属性和方法的对象的集合 3.面向对象程序设计的特点:封装  继承 多态 二.类的定义与语法 1.定义类: 修饰符 类名称 ...

  3. Swift使用闭包表达式

    Swift中的闭包表达式很灵活,其标准语法格式如下:{ (参数列表) ->返回值类型 in    语句组}其中,参数列表与函数中的参数列表形式一样,返回值类型类似于函数中的返回值类型,但不同的是 ...

  4. IOS 下拉菜单

    由于之前曾经用到过下拉菜单,所以现在花一些时间回过头来细细整理了一下,逐步完善这个下拉菜单,并提供一些比较基本的功能,以便日后如果有需要的话可以进行复用,并提供给需要的人参考.下拉菜单同样分为数据源和 ...

  5. Java知识思维导图

    注:图片来源于网络,谢谢分享. 1 开发环境Java SDK 下载和安装2 环境变量的配置(path和classpath)3 编程基础 标识符命名规范4 Java数据类型5 运算符6 分支语句(if, ...

  6. setTimeout、clearTimeout、setInterval,clearInterval ——小小计时器

    先看下效果 话不多说上代码~ <!DOCTYPE HTML> <html> <head> <meta http-equiv="Content-Typ ...

  7. 编辑器笔记——sublime text3 编译sass

    如已经安装sass 和 sass build两个插件到ST,command+b编译也没有问题,那么另外安装koala,在koala中引入你要编辑的sass,并把该sass文件设置为自动编译,那么用ST ...

  8. [java学习笔记]java语言基础概述之数组的定义&常见操作(遍历、排序、查找)&二维数组

    1.数组基础 1.什么是数组:           同一类型数据的集合,就是一个容器. 2.数组的好处:           可以自动为数组中的元素从零开始编号,方便操作这些数据. 3.格式:  (一 ...

  9. [转]Excel生成批量SQL语句,处理大量数据的好办法

    当有大量重复体力工作写入或修改数据到数据库中时,可以 第一,将Excel数据整理好了之后,通过SQL的导入功能直接导进数据库,但是得保证数据库字段和Excel的字段一致. 第二,通过Excel来生成对 ...

  10. YII千万级PV架构经验分享--俯瞰篇--性能介绍

    一张图,啥也不说了.直接上图,大图真难画. 呃,非得写满二百个字,其实本来想画均衡负债,一些服务器假设列子的,突然发现,没有业务要求,画不出来.写了这么久了,天天熬夜,得休息几天再继续.其实还有非常重 ...