Monitoring and alerting system

It is possible to monitor complex IT structures by the software Nagios (Network + Hagios), formerly called NetSaint. Nagios offers a collection of modules for network, host and especially service monitoring, and a web interface to query the collected data. Nagios is under the GPL, is thus free software, and runs on many Unix-like operating systems. Nagios and the Nagios logo are registered trademarks of Ethan Galstad.


The hosts and services, which are to be controlled, are configure via configuration files and are made public by Nagios. But the monitoring can be done only if commands have been defined accordingly. The pooling in groups for individual hosts, services, and contacts is also possible.


Nagios is able to scan and evaluate the status of various services (eg SSH, FTP, HTTP) and the disk space, memory and CPU utilization, uptime, etc. on various modules (plug-ins retrieve). Some test methods work at the protocol level (TCP, UDP, SNMP ...), and so it is possible to monitor various operating systems. For more specific tasks further programs are used which are also freely available (NC_Net, NSClient).It is even possible, with the appropriate additional hardware, to monitor environmental conditions (e.g. temperatures, humidity, levels of fluid storage ...). 

Once a service or a host reaches a (partly adjustable) critical value, or it isn’t longer available or it is unreachable, Nagios alerted the contact person in multiple channels (e.g. e-mail, SMS, pager, IM messages, phone calls. . .). It is possible to determine in which order messages should be made to other contacts when a fault is not resolved after the first reports (escalation management). It is also possible to consider existing dependencies by the monitoring of services. If the accessibility from a computer and an ongoing program are monitored, so in case of failure of the entire computer the message about the no longer running program would be suppressed.  


To make Nagios system failsafer, more redundant and securer for a false alarm, there is the possibility of the Distributed Monitoring and the Redundant / Failover Monitoring Setup. In several distributed monitoring there are peripheral installed Nagios-instances used, which send your results by NSCA to a central Nagios server, which processes as a passive check. By redundant monitoring two Nagios instances work parallel as in a clusternode and they stay to each other up to date withheartbeate  

Using the Nagios Remote Plugin Executor (nrpe) or SSH, it is also possible to run Plug-Ins on remote computers, which report the results of their investigations to the Nagios server. However, the monitoring via SNMP is more elegant, although its configuration is more difficult if you also want to monitor passive (SNMP Traps).