The post Monitoring Windows services covered how to remotely check a service status (running, stopped) using WMI, a powerful framework for fetching info from Windows based systems.
All of us managing monitoring systems know how important is providing proactive capabilities to the system in order to fix simple problems as a first step once an incidence is detected. Maybe the best example might be an stopped Windows service: It might be desirable that the monitoring system tried to restart it and, once a given number of unsuccessful tries were made, it performed a notification to the administrators in order to manage the problem in a more human-like way.
Sadly WMI is not so useful when trying to being interactive with the remote system. If not using sql-like syntax, it's is possible calling a local script when a given condition is true (for instance when a service is stopped) but the Linux wmi client (wmic) only support sql-like queries. Moreover, even if sql queries supported running commands under certain circumstances, a remote script might exist on the Windows server side in order to be run (whose existence migth be a problem when dealing with strict remote server administrators).
Let's dance
SAMBA is the Linux implementation of the Windows SMB protocol that allows, among others, supporting Remote Procedure Call transport (RPC over SMB)... and obviously RPC allow us remotely calling Windows procedures, what seems a good solution for our purpose.
samba-client is a package available for different platforms (it is called smbclient in Debian-like plataforms) that groups different utilities for interacting from Linux hosts with remote SMB compatible systems (as Windows servers). One of these utilities is net, that is meant to work just like the net utility available for Windows and DOS.
On a Windows system, we can restart an stopped service calling net in this way:
net start my_windows_service
Using the samba net utility, we can do the same action from a remote Linux system in this way:
net rpc service start my_windows_service \
-I 192.168.0.64 \
-U myDomain/jdoe%jdoe_password
The only difference is that, while in Windows you can use both long (quoting it) or short service name, in Linux you can use just the short service name.
The previous command started a service called my_windows_service on a remote Windows server with address 192.168.0.64 using the privileges of the user jdoe (authenticated with password jdoe_password) belonging to the Active Directory domain myDomain. It is possible doing it using a local user if the domain name (and the slash) is omitted:
The previous command started a service called my_windows_service on a remote Windows server with address 192.168.0.64 using the privileges of the user jdoe (authenticated with password jdoe_password) belonging to the Active Directory domain myDomain. It is possible doing it using a local user if the domain name (and the slash) is omitted:
net rpc service start my_windows_service \
-I 192.168.0.64 \
-U jdoe%jdoe_password
Finally, using net is possible checking if a given service is running, something useful for validating that a restarted service operation succeeds:
net rpc service status my_windows_service \
-I 192.168.0.64 \
-U myDomain/jdoe%jdoe_password
In practice
Let's assume we are managing a Nagios Core based system that monitors the status of some services running on remote Windows servers. The way to do it was covered in the post Monitoring Windows services.
Now we want to give our monitoring system proactive capabilities in this way: Once a monitored windows service is detected as stopped, our monitoring system must restart it for a given number of tries and, if not achieved, stop doing it and notifying the incidence to the defined contacts (or contactgroup members).
That can be achieved by defining an event handler bound to the service check. Since an event handler executes a command every time a service or host is in a soft state and the first time it goes to a hard (OK or non-OK) state, we will create a command that restarts the Windows service if the Nagios service check is in a non-OK, soft state. Since we can define how many checks can be run before going to a hard state via the service property max_check_attempts, we can set how many service restart tries can be performed before going to a hard state and then running a notification. Let's see it step by step:
#!/bin/sh
#
# restart_win-service
# Restarts a remote windows service if nagios service is
# in a non ok, soft state
# Arguments: -
sevice_status service_status_type user_id
# server_address service_name
#
if [ "$1" != 'OK' -a "$2" == 'SOFT' ]; then
# We are in a soft, non OK status:
# Restart the service
net rpc service start $5 -I $4 -U $3 > /dev/null 2> /dev/null
fi
2.- Define a Nagios command representing the previous script:
define command{
command_name restart_win_service
command_line $USER1$/restart_win-service $SERVICESTATE$ $SERVICESTATETYPE$ $ARG1$ $HOSTADDRESS$ $ARG2$
}
3.- Configure the service for using the command restart_win_service as event handler and running it for three times before notifying the problem
define service {
...
enable_event_handler 1
event_handler restart_win_service!myDomain/jdoe%jdoe_password!my_windows_service
max_check_attempts 3
...
}
Tweet |
|
this is a long shot but is there any way to make this work with Check_MK Raw 1.5.0.p6? I've been trying to get event handlers going in it (nagios 3 core) and i'm about at the end of my rope.
ReplyDelete