...A place where sharing IT monitoring knowledges

Thursday, 12 October 2017

Plugin Of The Month: Checking UPS alarms

NOTE: This post covers a Nagios/Icinga/Centreon Core compatible plugin addressed to monitor the active alarms. If you are interested in the background information that supports the plugin, see the post Monitoring UPS Devices: UPS-MIB.

Background

Monitoring the UPSs health is critical when trying to keep alive your ITC infrastructure. One of the most supported ways of monitoring UPSs, instead of using proprietary solutions, is SNMP via the MIB UPS (RFC 1628), widely supported by most of the devices in the market.

Specifically, MIB UPS (RFC 1628) supports an active alarms table in the managed device (OID upsAlarmTable, 1.3.6.1.2.1.33.1.6.2). Each table input stores:
  • upsAlarmId: An unique active alarm identifier
  • upsAlarmDescr: A reference to an alarm description object
  • upsAlarmTime: The value of sysUpTime when the alarm condition was detected
MIB includes a list of alarms objects called Well-Known-Alarms:

ID OID Description
1 upsAlarmBatteryBad (1.3.6.1.2.1.33.1.6.3.1) One or more batteries have been determined to require replacement.
2 upsAlarmOnBattery (1.3.6.1.2.1.33.1.6.3.2) The UPS is drawing power from the batteries.
3 upsAlarmLowBattery (1.3.6.1.2.1.33.1.6.3.3) The remaining battery run-time is less than or equal to upsConfigLowBattTime.
4 upsAlarmDepletedBattery (1.3.6.1.2.1.33.1.6.3.4) The UPS will be unable to sustain the present load when and if the utility power is lost.
5 upsAlarmTempBad (1.3.6.1.2.1.33.1.6.3.5) A temperature is out of tolerance.
6 upsAlarmInputBad (1.3.6.1.2.1.33.1.6.3.6) An input condition is out of tolerance.
7 upsAlarmOutputBad (1.3.6.1.2.1.33.1.6.3.7) An output condition (other than OutputOverload) is out of tolerance.
8 upsAlarmOutputOverload (1.3.6.1.2.1.33.1.6.3.8) The output load exceeds the UPS output capacity.
9 upsAlarmOnBypass (1.3.6.1.2.1.33.1.6.3.9) The Bypass is presently engaged on the UPS.
10 upsAlarmBypassBad (1.3.6.1.2.1.33.1.6.3.10) The Bypass is out of tolerance.
11 upsAlarmOutputOffAsRequested (1.3.6.1.2.1.33.1.6.3.11) The UPS has shutdown as requested, i.e., the output is off.
12 upsAlarmUpsOffAsRequested (1.3.6.1.2.1.33.1.6.3.12) The entire UPS has shutdown as commanded.
13 upsAlarmChargerFailed (1.3.6.1.2.1.33.1.6.3.13) An uncorrected problem has been detected within the UPS charger subsystem.
14 upsAlarmUpsOutputOff (1.3.6.1.2.1.33.1.6.3.14) The output of the UPS is in the off state.
15 upsAlarmUpsSystemOff (1.3.6.1.2.1.33.1.6.3.15) The UPS system is in the off state.
16 upsAlarmFanFailure (1.3.6.1.2.1.33.1.6.3.16) The failure of one or more fans in the UPS has been detected.
17 upsAlarmFuseFailure (1.3.6.1.2.1.33.1.6.3.17) The failure of one or more fuses has been detected.
18 upsAlarmGeneralFault (1.3.6.1.2.1.33.1.6.3.18) A general fault in the UPS has been detected.
19 upsAlarmDiagnosticTestFailed (1.3.6.1.2.1.33.1.6.3.19) The result of the last diagnostic test indicates a failure.
20 upsAlarmCommunicationsLost (1.3.6.1.2.1.33.1.6.3.20) A problem has been encountered in the communications between the agent and the UPS.
21 upsAlarmAwaitingPower (1.3.6.1.2.1.33.1.6.3.21) The UPS output is off and the UPS is awaiting the return of input power.
22 upsAlarmShutdownPending (1.3.6.1.2.1.33.1.6.3.22) A upsShutdownAfterDelay countdown is underway.
23 upsAlarmShutdownImminent (1.3.6.1.2.1.33.1.6.3.23) The UPS will turn off power to the load in less than 5 seconds; this may be either a timed shutdown or a low battery shutdown.
24 upsAlarmTestInProgress (1.3.6.1.2.1.33.1.6.3.24) A test is in progress, as initiated and indicated by the Test Group. Tests initiated via other implementation-specific mechanisms can indicate the presence of the testing in the alarm table, if desired, via a OBJECT-IDENTITY macro in the MIB document specific to that implementation and are outside the scope of this OBJECT-IDENTITY

Plugin description

check_ups_alarms is a Nagios/Icinga/Centreon Core compatible plugin for checking the active alarms in a UPS MIB compliant device.

The alarms to be checked (see table above) can be defined as a list of IDs in the warning and critical plugin arguments. If one of these alarms becomes active, the plugin will return the status associated to the list where the active alarm is defined.
check_ups_alarms  is based on fetching session data using SNMP v1/2c, so it's necessary that the device being checkek supported this protocol and served info managed by the UPS MIB (RFC 1628).

You can get detailed help and usage examples by running the script with the  --help option.

Usage examples

check_UPS_alarms -H 192.168.0.1
Checks the active alarms on a host with address 192.168.0.1 using SNMP protocol version 1 and 'public' as community. Plugin returns always OK.

check_UPS_alarms -H 192.168.0.1 -w 1..4,11 -c 5..10
Similar to the previous example but returning WARNING if alarm(s) with ids 1 to 4 and 11 are active and CRITICAL if alarms with ids 5 to 10 are active

Download

You can download the latest version of the plugin from GitHub.

The development of this plugin, that now is freely released, implies hours of reading technical documentation, programming and testing. I will be more than glad if you support this effort by clicking in some of the interesting advertisements that you can find on this website.

Last but not least, if you find some bug don't hesitate in contacting me for fixing it quickly. Feedback comments are welcome too!

Sunday, 24 September 2017

Plugin Of The Month: Check Cisco VPN active sessions

NOTE: This post covers a Nagios/Icinga/Centreon Core compatible plugin addressed to monitor the VPN sessions open in a Cisco device. If you are interested in the background information that supports the plugin, see the post Monitoring Cisco VPN sessions

Description

check_cisco_cras_sessions is a Nagios/Icinga/Centreon Core compatible plugin for checking the active sessions on a Cisco Remote Access Server (cras) device.

It can check overall or typed sessions supporting email, ipsec, LAN to LAN (l2l), load balancing (lb), SSL VPN Client (svc) and Web VPN sessions. It can also check sessions based on absolute (count) or relative, taking as base the  max sessions supportable by the device. Finally it can totalize (sum) sessions prior to compare against thresholds.

Based on the previous defined capabilities the plugin can be used in different ways:
  • For controlling if a device is reaching its limits by checking all sessions in relative mode, ie, comparing the overall sessions with the max sessions supportable and returning the result as a percent.
  • For controlling if a device is reaching its license limits by checking a given set of session types in a totalized mode (Cisco ASA licensing  restricts the number of SSL VPN Client + Web VPN sessions)
  • Finally for fine controlling sessions by type restricting the type of sessions checked to just one.
check_cisco_cras_sessions is based on fetching session data using SNMP v1/2c, so it's necessary that the device being checkek supported this protocol and served info managed by the ciscoRemoteAccessMonitorMIB MIB.

You can get detailed help and usage examples by running the script with the  --help option.

Usage examples

check_cisco_cras_sessions -H 192.168.0.12
Checks the number of sessions on a host with address 192.168.0.12 using SNMP protocol version 1 and 'public' as community. Plugin returns always OK.

check_cisco_cras_sessions -H 192.168.0.12 -w 30 -c 50
Similar to the previous example but returning WARNING if the number of sessions of any kind is higher than 30 and CRITICAL if it's higher than 50.

check_cisco_cras_sessions -H 192.168.0.12 -s email -s ipsec -w 30 -c 50
Similar to the previous example but just checking the Email and IPSec sessions.

check_cisco_cras_sessions -H 192.168.0.12 -s email -s ipsec -T -w 30 -c 50
Similar to the previous example but totalizing the sessions, ie, returning WARNING if the sum of email and ipsec sessions is higher than 30 and CRITICAL if it's higher than 50.

check_cisco_cras_sessions -H 192.168.0.12 -p -w 30 -c 50
Sessions of any kind are checked and their total is managed as percent over the device max supportable sessions. Thresholds and results are considered as percent.

Download

You can download the latest version of the plugin here.

The development of this plugin, that now is freely released, implies hours of reading technical documentation, programming and testing. I will be more than glad if you support this effort by clicking in some of the interesting advertisements that you can find on this website.

Last but not least, if you find some bug don't hesitate in contacting me for fixing it quickly. Feedback comments are welcome too!

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes