The device template baselining tool allows an OmniCenter administrator to evaluate and tune the performance of service checks and threshold checks that are being applied to devices in your network by device templates.
It lets you see which device templates and which template-based service and threshold checks are having the greatest impact on the number of alarms you see for a given time period. Using the information the tool provides, you can then tune the settings for individual service and threshold checks being applied to your devices to reduce the number of alarms while still being alerted to real problems.
The tool is accessed from the device template administration page and is available to admins and super admins only. From the main menu, select Administration → Templates to get to the Device Templates Administration page, then click the Baseline Templates button to open the tool.
The main page of the baselining tool provides a table with a list of the device templates that are actively controlling service and threshold checks for managed devices (templates that are not in use will not be displayed here), along with indicators on how those templates are performing.
By default, the table is sorted with the overall worst performing device templates towards the top, but can be sorted by any column. The worst performing templates are those with the greatest number of checks in a failed state for a given time period. Click anywhere inside a row to open that device template in the template page of the tool (see below).
Above the table is a drop-down selector to change the evaluation time period by which the impact of the templates is measured. The following selections are available:
- Last 7 Days
- Last 30 Days
The selection defaults to Yesterday. If you change the selection for Time Period, click the View New Report button to update the display.
To the right of the device template name in the template table are three columns (aggregate, thresholds and services) containing thermometer-style performance indicators. The values these columns represent are outlined below. If there are no checks of a given type configured on the template, the indicator will show “Not configured” for that column instead of a performance indicator.
The indicator columns are:
This column aggregates the performance metrics from the other two columns to provide an overall performance value. The percentage value displayed is how many combined service and threshold checks (out of all service and threshold checks controlled by this template) remained in an OK state for the selected time period. Use this column to quickly see the worst performing templates. By default, the table is sorted by this column.
This column focuses on threshold check performance only. The percentage value displayed is how many threshold checks (out of all threshold checks controlled by this template) remained in an OK state for the selected time period. Click on this column’s header to sort the worst performing templates (for threshold checks only) to the top.
This column focuses on service check performance only. The percentage value displayed is how many service checks (out of all service checks controlled by this template) remained in an OK state for the selected time period. Click on this column’s header to sort the worst performing templates (for service checks only) to the top.
The template page of the baselining tool shows a more detailed view of the performance of an individual template.
At the top is the 30-day baseline chart. This chart shows, for each day of the past 30 days, the average percentage of threshold and service checks that were in the okay or failed/degraded state for that device template. Move the mouse over the chart for detailed information about that day. The chart can be printed or exported as an image using the icon at the top right of the chart. This chart always shows 30 days, regardless of what time period is selected in the section below it.
Below the chart are two tables, one for threshold checks and the other for service checks. These tables display performance information broken down by individual check “types,” and represent all of the different types of threshold and service checks being applied to your managed devices by this template. Both tables display information in exactly the same way for their respective checks. As with all tables in this tool, the worst performing elements are sorted to the top by default. But, the tables can then be independently sorted by any column you wish.
The tables contain the following columns:
The name of the type of threshold or service check.
The number of individual instances of this type of check that are currently applied to any managed devices.
Displays the impact that the instances of those check are having on the performance of this template. The percentage value displayed is how many checks (of this type) remained in an OK state for the selected time period. Think of “impact” as any red on the green bar if the percentage value is below 100%.
Click anywhere inside a row to open that check type in the instance page of the tool (see below).
The instance page of the baselining tool shows a more detailed view of the performance of a particular check type.
At the top is a 30-day baseline chart, very similar to what is shown in the template page. However, in this case, this chart shows the average percentage of instances of the respective check that were in the okay or failed/degraded state.
Below the chart is a table of individual instances of a check. This table represents all of the instances of this type of check applied to any managed devices using this template. Clicking anywhere on a row will open the Device Dashboard of the device listed in the DEVICE column.
The table contains the following columns:
The name of the specific managed device that the check instance listed in the DESCRIPTION column is applied to.
The name of the check as displayed in the DESCRIPTION field of the check’s configuration settings.
The impact this instance of this check type is having on the performance of this template. The percentage value displayed is how much of the selected time period the check instance remained in an OK state.