Nagios installation and configuration - The quick and DIRTY guide ------------------------------------------------------------------ Nagios is a commonly used free tool to monitor systems and their devices over the network. It's probably one of the best, simplest, most expandable, and cheapest network monitoring tool out there. This install will cover FreeBSD 6.2, with Nagios 2.6 Install nagios from ports:- # cd /usr/ports/net-mgmt/nagios # make install Nagios will then create a nagios user and group for itself. You have to ensure that nagios is enabled at boot time with the following rc.conf entry:- nagios_enable="YES" Nagios relies on a webserver (commonly apache) to use to display its monitoring results. The following configuration lines in httpd.conf are important for nagios to work. Nagios should be secure and ONLY accessed by people with the proper authority to do so. So first we add some extra lines to implement HTACCESS:- ---------------------------------------------------------- DocumentRoot /usr/local/www/nagiosOptions None AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/etc/nagios/htpasswd.users Require valid-user Options ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/etc/nagios/htpasswd.users Require valid-user ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/ Alias /nagios/ /usr/local/www/nagios/ ---------------------------------------------------------------------------------------- We will create a password file in nagios configuration directory. First we create the nagiosadmin user, mainly because the default configurations expects this user to be available for full CGI control of nagios (should you allow it later on). # htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin and you can create more users with the same syntax (just drop the -c option, because it creates a new user file, and we already have one now) # htpasswd /usr/local/etc/nagios/htpasswd.usersNagios is configured by default to use authentication. Apache will need a restart. Make sure you add apache_enable="YES" (or apache22_enable="YES" if you are using the latest apache) to /etc/rc.conf, and restart apache with the start script in /usr/local/etc/rc.d If you get log files created in /var/log, you'll be able to check that it's up and running. If nothing happens at all, you've probably mistyped in the rc.conf enable setting. From here a bunch of sample configuration files are found in /usr/local/etc/nagios Most of them will need to be "de-sampled" before nagios can work in earnest First we have to configure nagios - this means de-sampling all the sample configuration files for starters. (Don't delete them. We may need them if we accidently delete our real configurations.) # cp cgi.cfg-sample cgi.cfg # cp nagios.cfg-sample nagios.cfg # cp commands.cfg-sample commands.cfg # cp localhost.cfg-sample localhost.cfg # cp resource.cfg-sample resource.cfg Nagios itself can be started with the start script in /usr/local/etc/rc.d (make sure that the nagios_enable="YES" /etc/rc.conf line has been added first) Of course at this point, nagios is up and working but you have no configuration for any servers or services yet. So that's the next job. nagios.cfg --------------- The main configuration file for nagios. Handles location of logs files and configuration files mainly. A few changes will probably need to be made here. Nagios' configuration files can be made to be incredibly complex. You can either have everything in a single file, or you can split up your configurations into seperate directories with seperate log files. It's all a little overwhelming to configure how you want to configure things (at the risk of sounding redundant) in the nagios.cfg file. In older versions of Nagios, the default install gave you lots of sample configuration files and templates to rely on when setting up hosts, groups, notifications, etc. Well, those days are gone. Have a look at the templates in localhost.cfg to see what can be done. You can tell nagios to read specific files with object definition templates on them (with the cfg_file option) or you can specify whole directories of config files. You can have multiple files and directories where you can have nagios search for files. This means that you can organise object templates anywhere you like. Nagios can read whole directories of random files, and if it finds a template it recognises, it deals with it effectively. Just as long as the template is a valid one, it doesn't care what the file is called - just as long as the file extension is .cfg and nagios is configured to read the file. More on object definitions later.. check_external_commands Allows you to specify whether or not Nagios should check for external commands. This will most likely need to be set to "1" if we intend to enable users in nagios to execute commands. The default options for external commands are good enough to leave alone. cgi.cfg ------------ nagios_check_command This is something you may want to uncomment because it checks the status of the nagios process. If you don't use it, you'll see warning messages in the CGIs about the Nagios process not running, and you won't be able to execute any commands from the web interface. The following directives give privileges to users on the system. For the most part, you will want to uncomment and edit these, because by default, nagios won't let any user see any of the host information:- authorized_for_system_information List of users authorised to see Nagios process information from webpage. By default, nobody is allowed to. authorized_for_configuration List of users who have full access to view all configuration information By default, nobody is allowed to authorized_for_all List of users that can view information for all hosts and services monitored. By default, users can only see info for hosts and services they are contacts for. authorized_for_all_service_commands authorized_for_all_host_commands List of users who can get at all commands from the CGI By default users can only issue commands for hosts or services they are contacts for. ping_syntax For FreeBSD ports and Linux, ping syntax may need not be edited, but for other BSDs, Solaris, and other weird UNIXes this may need to be changed refresh_rate by default, the nagios CGIs refresh every 90 seconds, which is good enough. This option can be manhandled if you want to change it. resource.cfg ------------- To most users, this file will remain untouched. If you are doing some fancy monitoring of MySQL or something like that which requires rather sensitive information in order to make it all work, this is the file for it. Otherwise if you are just doing some basic ping and disk monitoring, ignore this file. commands.cfg ----------------------- This file houses all the command definitions for all of the host and service checks that Nagios can carry out. The commands are defined by a command_name (which nagios remembers its tests by), and a command_line (which is the command and parameters that are run to complete a particular test. This file holds all the default nagios commands. It's best to leave it untouched and create an extra config file where you can set up all your other 3rd party commands and plugins. **** Templates and object inheritance **** Nagios allows you to define values within a specific template, and then to use those templates within other templates of the same type to save a lot of typing. Understanding this is crucial to configuring nagios. For example, there is a "generic-service" service template in the localhost.cfg file in the nagios configuration directory. This service template (and subsequently all its values) can be used within other service templates. This allows templates to be reused for multiple objects. The generic-service template hosts values such as "active_checks_enabled" and "notifications_enabled" and these values can be automatically injected into parent service temples (for real services) to save typing in these same values for each and every host on our system. Note that the "generic-service" service template has the line "register 0" as a final line. This line exists so that Nagios KNOWS that generic-service is only a TEMPLATE define, and NOT an actual host! The generic-service template does not carry a service_name or command, nor other options which would make it a fully fledged service. Without the "register 0" option, nagios would produce configuration errors upon startup (unable to find the service name). **** Object definition files **** Object definition files and their locations are defined in nagios.cfg All object definition files should have the file extension of .cfg or nagios will most likley skip over them. the file localhost.cfg is an object definition file which conatins many different examples and templates of definitions you can use to create your own configurations. Either you can edit this file directly, or create seperate files to ease administration. Already, there are generic-host, generic-service, and local-service templates available for use in our own configurations. Here are the main templates we need to concern ourselves with:- * define host - You can define the names of hosts and what their IP addresses are * define hostgroups - You can create groups of hosts to better organise your machines on the nagios display * define contacts - You define individual administrators and their contact information for notifications * define contactgroups - You define groups of administrators for notification (so they all get the same notification) * define command - You define commands, based upon actual unix command line tools * define services- You define services to test with, commands, and what hosts to run the tests on * define servicegroups - You define groups of services so they all group together There are a bunch of other extension, dependancy, and escalation defines, but for the most part they are all advanced settings and beyond the scope of normal nagios usage. ( so I won't cover them ). Also, there is a timeperiod define, but a default "24x7" template is already available by default - so unless we intend to not monitor 24 hours a day, 7 days a week... we can mostly ignore it. In many cases, "services" are the highest ranking defines that draw all the other defines in to actually do the work. To define a service, you need hosts, hostsgroups, contacts and contact groups, timeperiods, commands, etc to carry out the service testing. Also, as we have already seen, there are a lot of default commands for services set up in command.cfg. I recommend making another seperate file for customized commands (and other 3rd party plugins) to avoid confusion. **** Really basic configuration - just as a demo **** Ok, here I have a basic configuration to simply PING two hosts at a normal interval. Please note that I'm making use of the "general-host" and "general-service" templates from the "localhost.cfg" file in the default nagios configuration directory. Also note that there is a default "nagiosadmin" contact and "admin" contact group available by default. (As you can guess from the config, I set up this test environment on vmware) HOSTS define host{ host_name dummy1 alias vmware address 192.168.217.129 max_check_attempts 20 check_period 24x7 contact_groups sysadmin notification_interval 60 notification_period 24x7 notification_options d,u,r use generic-host } define host{ host_name dummy2 alias othervmware address 192.168.217.132 max_check_attempts 20 check_period 24x7 contact_groups sysadmin notification_interval 60 notification_period 24x7 notification_options d,u,r use generic-host } HOSTGROUPS define hostgroup{ hostgroup_name virtualmachines alias Dummy Group! members dummy1,dummy2 } CONTACT define contact{ contact_name naynay alias Nathan service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,u,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email naynay@localhost } CONTACTGROUPS define contactgroup{ contactgroup_name sysadmin alias Default admin group members naynay } SERVICE define service{ host_name dummy1,obsd service_description PING check_command check_ping!80.0,20%!500.0,50% check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups sysadmin notification_interval 240 notification_period 24x7 notification_options c,r use generic-service } Nagios should produce some nice displays about the ping status of the hosts now. **** adding in extra-plugins **** Nagios comes with many standard plugins that can perform numerous kinds of check on the hosts on my system. Nagios is very expandable in the sense that anyone can design and implement plugins for Nagios, in addition to the ones that come with it. You can download and install the most useful plugins (which require compiling) with the nagios-plugins port (find it at /usr/ports/net-mgmt/nagios-plugins). This is strongly recommended. Over the years, people have made all kinds of new plugins for Nagios, to check all sorts of programs and devices. There's a plug in I made for Nagios to check the status of FreeBSD GEOM devices on hosts within my network. Implementing extra 3rd party plugins is as simple as copying these files to the /usr/local/libexec/nagios # cp /usr/local/libexec/nagios # chmod 555 /usr/local/libexec/nagios From there, you can configure and define individual commands using these plugins using the command templates. How you configure them depends on the plugin in question. ***** NRPE - Doing things remotely ***** Nagios can easily do network tests which don't require access the the machine (such as ping tests and http tests). However, there will be times when you need information from a server that can only be accessed internally with a valid log in (such a disk size, process availability, etc). When you need to do tests like these, you need "NRPE" NRPE is a seperate port at /usr/ports/net-mgmt/nrpe2. (The reason it is called nrpe2, is that nrpe is the old version that worked with "netsaint" - a precursor to nagios. Needless to say, nrpe and nrpe2 and not compatible. Make sure you get nrpe2 for nagios) nrpe functions on a client machine as a daemon that the Nagios server can talk with. Note that you don't need to run a nrpe daemon on your Nagios server (nagios can access its own hosting server quite adequately). Your Nagios server then uses its own "check_nrpe2" plugin to communicate with nrpe daemons on client machines. On your Nagios server, you need to make sure that check_nrpe2 is a recognized command, so you need to create a command define:- COMMAND define command{ command_name check_nrpe2 command_line /usr/local/libexec/nagios/check_nrpe2 -h $HOSTADDRESS$ -c $ARG1$ } Of course, you can define other local 3rd party plugins in exactly the same way. Each plugin has different arguments however. check_nrpe2 expects a target host address (-h) and command to run on nrpe on that host (-c) nrpe on client machines is set up with the /usr/ports/net-mgmt/nrpe2 port, as well as the nagios-plugins port. First, make sure you enable it at boot time from /etc/rc.conf nrpe2_enable="YES" Copy the sample config file in /usr/local/etc # cp /usr/local/etc/nrpe.cfg-sample /usr/local/etc/nrpe.cfg configuring nrpe is not hard. First you should edit the allowed_hosts parameter. Set this so that it reflects only localhost and your Nagios server. This helps prevent attacks on the daemon. allowed_hosts=127.0.0.1, Towards the bottom of the config file you get to configure commands that will run locally on the client, but can be called into operation by the nagios server. This is done using the "command" parameter. Below we register the vaguely named example "someplugin" to the remote command "check_someplugin". Note that the someplugin local command has a preset argument. It is possible to allow arguments from the nagios server, if you enable the aptly named dont_blame_nrpe option. command[check_someplugin]=/usr/local/libexec/nagios/someplugin someargument Finally, you can set up service checks to this remote command by creating a service template on the Nagios server. All you have to do is remember to add the following line to your service config:- check_command check_nrpe2!check_someplugin And that does the job! **** Manipulating sudo on the Nagios host server ***** Nagios usually executes while running as the underprivileged "nagios" user. As a result, any commands which require root to run just won't. Of course, anything running with root privileges constitutes a security risk. The sudo command is your trade off. You can install that from /usr/ports/security/sudo It's more of a hassle (and a larger security risk) to have nagios run as root, so it is infinitely better to selectively use sudo for only the commands that really need them. What we have to do is to create a sub-directory in /usr/local/libexec/nagios (I'll call it "sudo") and then copy all your sudo-requiring scripts there. Then to lock the sudo directory down and only allow root access. You don't want your plugin commands to be hijacked and able to run root commands of an underprivileged user, so security on the plugin directory needs to be escalated. # chmod -R 700 /usr/local/libexec/nagios/sudo Then, add the sudo rule for that special directory with visudo (don't forget the trailing slash on the path) nagios ALL=(ALL) NOPASSWD:/usr/local/libexec/nagios/sudo/ Then define your command to use sudo first on your command line:- define command { command_name check_needsroot command_line /usr/local/bin/sudo /usr/local/libexec/nagios/sudo/check _needsroot } **** Enabling sudo for nrpe ***** In the nrpe.cfg file, there is an option for enabling sudo (command_prefix=/usr/local/bin/sudo), but this will execute every nrpe2 plugin as root, which isn't necessary. So forget about it. Otherwise, having remote servers run nrpe and execute sudo scripts is almost identical to the way you would set it up on the local nagios server. You create a special nagios sudo plugin directory, add the nagios user to sudo, place your sudo requiring scripts in there, and lock it with chmod. The only special thing you have to do is to edit nrpe.cfg, and ensure that the command definition requiring root calls sudo first before the plugin. command[check_needsroot]=sudo /usr/local/libexec/nagios/sudo/check_needsroot