Alarms

(For a short video about this topic, click here and here.)

An “alarm” is an event that is the result of an OmniCenter check entering a CRITICAL state. An alarm is not the same thing as an alert notification, although the terms “alarm” and “alert” are often confused.

Alarms display as red elements in the OmniCenter UI, and indicate that a check is currently experiencing a critical failure.

When an OmniCenter check initially fails, it enters what is called a “soft” CRITICAL state. This state means that OmniCenter is waiting to see if the failure is an actual problem or just a momentary glitch. When the check fails for long enough or after a certain number of times (depending on the check), it enter a “hard” CRITICAL state. This is the point at which an alarm event is generated.

A newly generated alarm causes immediate, out-of-band host checks to be performed on the host device that is the source of the alarm and all of its layer 3 parents (parenting must be properly set up in OmniCenter for this to work).

If the host and all of its parents are determined to be operational (i.e. not down or unreachable), the alarm opens a new OmniCenter incident for the failed check alarm. This incident will detail the failure of the check and provide a starting point for troubleshooting.

If the host or any of its parents are determined to be down or unreachable, the failed host check of the topmost downed parent will generate its own alarm, which will generate its own incident.

Within the resulting incident, the failed host check alarm becomes what is called the “primary alarm.” Any other alarms that have occurred as a result of that primary alarm (for example, the initial alarm that triggered the host checks), are bundled into the incident of the primary alarm as “related alarms” (rather than allowing each related alarm to open its own separate incident, causing redundant incidents).

This bundling of alarms into one incident is called “incident management.” Any action groups associated with the related alarms are suppressed, and only the action groups associated with the primary alarm are run. Suppressing the action groups of related alarms potentially avoids redundant alert notifications and execution of commands that would otherwise be ineffective on an unreachable device.

Administrators can also create custom incident management rules to forcibly correlate otherwise unrelated alarms into a single incident.

Alarms are collected in incidents. Incidents trigger actions (such as alert notifications and active response commands).

When the condition that generated an alarm returns to normal, the alarm will eventually clear without further action by a user.

If a resulting incident contains only a single alarm, the incident will close by itself automatically, with a record of that incident entered into the OmniCenter incident log.

But, if a resulting incident contains multiple alarms, that incident will not close until the primary and all related alarms have cleared. Then the incident will close and be recorded in the OmniCenter incident log.

The conditions required to cause an alarm are set in the configuration settings for each individual OmniCenter check. Checks added to devices using device templates will use the settings configured for the check in the template when adding that check to devices.

Updated on December 10, 2019

Was this article helpful?

Need Support?
Can’t find the answer you’re looking for? Don’t worry we’re here to help!
Contact Support

Leave a Reply