Nagios basics

Nagios installation and configuration - The quick and DIRTY guide
------------------------------------------------------------------

Nagios is a commonly used free tool to monitor systems and their devices over
the network. It's probably
one of the best, simplest, most expandable, and cheapest network monitoring tool
out there.

This install will cover FreeBSD 6.2, with Nagios 2.6

Install nagios from ports:-

# cd /usr/ports/net-mgmt/nagios
# make install

Nagios will then create a nagios user and group for itself.
You have to ensure that nagios is enabled at boot time with the following
rc.conf entry:-

nagios_enable="YES"

Nagios relies on a webserver (commonly apache) to use to display its monitoring
results.

The following configuration lines in httpd.conf are important for nagios to
work.
Nagios should be secure and ONLY accessed by people with the proper authority to
do so.

So first we add some extra lines to implement HTACCESS:-

----------------------------------------------------------
DocumentRoot /usr/local/www/nagios

Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/etc/nagios/htpasswd.users
Require valid-user

Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/etc/nagios/htpasswd.users
Require valid-user

ScriptAlias /nagios/cgi-bin/ /usr/local/www/nagios/cgi-bin/
Alias /nagios/ /usr/local/www/nagios/

----------------------------------------------------------------------------------------

We will create a password file in nagios configuration directory.
First we create the nagiosadmin user, mainly because the default configurations
expects this user to be available for full CGI control of nagios (should you
allow it later on).

# htpasswd -c /usr/local/etc/nagios/htpasswd.users nagiosadmin

and you can create more users with the same syntax (just drop the -c option,
because it creates a new user file, and we already have one now)

# htpasswd /usr/local/etc/nagios/htpasswd.users

Nagios is configured by default to use authentication.

Apache will need a restart. Make sure you add apache_enable="YES" (or
apache22_enable="YES" if you are using the latest apache) to /etc/rc.conf, and
restart apache with the start script in /usr/local/etc/rc.d
If you get log files created in /var/log, you'll be able to check that it's up
and running. If nothing happens at all, you've probably mistyped in the rc.conf
enable setting.

From here a bunch of sample configuration files are found in /usr/local/etc/nagios

Most of them will need to be "de-sampled" before nagios can work in earnest

First we have to configure nagios - this means de-sampling all the sample
configuration files for starters. (Don't delete them. We may need them if we
accidently delete our real configurations.)

# cp cgi.cfg-sample cgi.cfg
# cp nagios.cfg-sample nagios.cfg
# cp commands.cfg-sample commands.cfg
# cp localhost.cfg-sample localhost.cfg
# cp resource.cfg-sample resource.cfg

Nagios itself can be started with the start script in /usr/local/etc/rc.d
(make sure that the nagios_enable="YES" /etc/rc.conf line has been added first)

Of course at this point, nagios is up and working but you have no configuration
for any servers or services yet. So that's the next job.

nagios.cfg
---------------

The main configuration file for nagios. Handles location of logs files and
configuration files mainly. A few changes will probably need to be made here.

Nagios' configuration files can be made to be incredibly complex. You can either
have everything in a single file, or you can split up your configurations into
seperate directories with seperate log files. It's all a little overwhelming to
configure how you want to configure things (at the risk of sounding redundant)
in the nagios.cfg file.

In older versions of Nagios, the default install gave you lots of sample
configuration files and templates to rely on when setting up hosts, groups,
notifications, etc. Well, those days are gone.
Have a look at the templates in localhost.cfg to see what can be done.

You can tell nagios to read specific files with object definition templates on
them (with the cfg_file option) or you can specify whole directories of config
files. You can have multiple files and directories where you can have nagios
search for files. This means that you can organise object templates anywhere you
like. Nagios can read whole directories of random files, and if it finds a
template it recognises, it deals with it effectively. Just as long as the
template is a valid one, it doesn't care what the file is called - just as long
as the file extension is .cfg and nagios is configured to read the file.
More on object definitions later..

check_external_commands

Allows you to specify whether or not Nagios should check for external commands.
This will most likely need to be set to "1" if we intend to enable users in
nagios to execute commands. The default options for external commands are good
enough to leave alone.

cgi.cfg
------------

nagios_check_command

This is something you may want to uncomment because it checks the
status of the nagios process. If you don't use it, you'll see warning messages
in the CGIs about the Nagios process not running, and you won't be able to
execute any commands from the web interface.

The following directives give privileges to users on the system. For the most
part, you will want to uncomment and edit these, because by default, nagios
won't let any user see any of the host information:-

authorized_for_system_information

List of users authorised to see Nagios process information from webpage.
By default, nobody is allowed to.

authorized_for_configuration

List of users who have full access to view all configuration information
By default, nobody is allowed to

authorized_for_all

List of users that can view information for all hosts and services monitored.
By default, users can only see info for hosts and services they are contacts
for.

authorized_for_all_service_commands
authorized_for_all_host_commands

List of users who can get at all commands from the CGI
By default users can only issue commands for hosts or services they are contacts
for.

ping_syntax

For FreeBSD ports and Linux, ping syntax may need not be edited, but for other
BSDs, Solaris, and other weird UNIXes this may need to be changed

refresh_rate

by default, the nagios CGIs refresh every 90 seconds, which is good enough. This
option can be manhandled if you want to change it.

resource.cfg
-------------

To most users, this file will remain untouched. If you are doing some fancy
monitoring of MySQL or something like that which requires rather sensitive
information in order to make it all work, this is the file for it. Otherwise if
you are just doing some basic ping and disk monitoring, ignore this file.

commands.cfg
-----------------------

This file houses all the command definitions for all of the host and service
checks that Nagios can carry out. The commands are defined by a command_name
(which nagios remembers its tests by), and a command_line (which is the command
and parameters that are run to complete a particular test.
This file holds all the default nagios commands. It's best to leave it untouched
and create an extra config file where you can set up all your other 3rd party
commands and plugins.

**** Templates and object inheritance ****

Nagios allows you to define values within a specific template, and then to use
those templates within other templates of the same type to save a lot of typing.
Understanding this is crucial to configuring nagios.

For example, there is a "generic-service" service template in the localhost.cfg
file in the nagios configuration directory. This service template (and
subsequently all its values) can be used within other service templates.
This allows templates to be reused for multiple objects.

The generic-service template hosts values such as "active_checks_enabled" and
"notifications_enabled" and these values can be automatically injected into
parent service temples (for real services) to save typing in these same
values for each and every host on our system.

Note that the "generic-service" service template has the line "register 0" as a
final line. This line exists so that Nagios KNOWS that generic-service is only a
TEMPLATE define, and NOT an actual host! The generic-service template does not
carry a service_name or command, nor other options which would make it a
fully fledged service. Without the "register 0" option, nagios would produce
configuration errors upon startup (unable to find the service name).

**** Object definition files ****

Object definition files and their locations are defined in nagios.cfg
All object definition files should have the file extension of .cfg or nagios
will most likley skip over them.

the file localhost.cfg is an object definition file which conatins many
different examples and templates
of definitions you can use to create your own configurations. Either you can
edit this file directly, or create
seperate files to ease administration.

Already, there are generic-host, generic-service, and local-service templates
available for use in our own configurations.

Here are the main templates we need to concern ourselves with:-

* define host - You can define the names of hosts and what their IP addresses
are
* define hostgroups - You can create groups of hosts to better organise your
machines on the nagios display
* define contacts - You define individual administrators and their contact
information for notifications
* define contactgroups - You define groups of administrators for notification
(so they all get the same notification)
* define command - You define commands, based upon actual unix command line
tools
* define services- You define services to test with, commands, and what hosts to
run the tests on
* define servicegroups - You define groups of services so they all group
together

There are a bunch of other extension, dependancy, and escalation defines, but
for the most part they are all advanced settings and beyond the scope of normal
nagios usage. ( so I won't cover them ). Also, there is a timeperiod define, but
a default "24x7" template is already available by default - so unless we intend
to not monitor 24 hours a day, 7 days a week... we can mostly ignore it.

In many cases, "services" are the highest ranking defines that draw all the
other defines in to actually do the work.
To define a service, you need hosts, hostsgroups, contacts and contact groups,
timeperiods, commands, etc to carry out the service testing.

Also, as we have already seen, there are a lot of default commands for services
set up in command.cfg. I recommend making another seperate file for customized
commands (and other 3rd party plugins) to avoid confusion.

**** Really basic configuration - just as a demo ****

Ok, here I have a basic configuration to simply PING two hosts at a normal
interval. Please note that I'm making use of the "general-host" and
"general-service" templates from the "localhost.cfg" file in the default nagios
configuration directory. Also note that there is a default "nagiosadmin" contact
and "admin" contact group available by default.

(As you can guess from the config, I set up this test environment on vmware)

HOSTS

define host{
host_name dummy1
alias vmware
address 192.168.217.129
max_check_attempts 20
check_period 24x7
contact_groups sysadmin
notification_interval 60
notification_period 24x7
notification_options d,u,r
use generic-host
}

define host{
host_name dummy2
alias othervmware
address 192.168.217.132
max_check_attempts 20
check_period 24x7
contact_groups sysadmin
notification_interval 60
notification_period 24x7
notification_options d,u,r
use generic-host
}

HOSTGROUPS

define hostgroup{
hostgroup_name virtualmachines
alias Dummy Group!
members dummy1,dummy2
}

CONTACT

define contact{
contact_name naynay
alias Nathan
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email naynay@localhost
}

CONTACTGROUPS

define contactgroup{
contactgroup_name sysadmin
alias Default admin group
members naynay
}

SERVICE
define service{
host_name dummy1,obsd
service_description PING
check_command check_ping!80.0,20%!500.0,50%
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups sysadmin
notification_interval 240
notification_period 24x7
notification_options c,r
use generic-service
}

Nagios should produce some nice displays about the ping status of the hosts now.

**** adding in extra-plugins ****

Nagios comes with many standard plugins that can perform numerous kinds of check
on the hosts on my system. Nagios is very expandable in the sense that anyone
can design and implement plugins for Nagios, in addition to the ones that come
with it. You can download and install the most useful plugins (which require
compiling) with the nagios-plugins port (find it at
/usr/ports/net-mgmt/nagios-plugins). This is strongly recommended.

Over the years, people have made all kinds of new plugins for Nagios, to check
all sorts of programs and devices. There's a plug in I made for Nagios to check
the status of FreeBSD GEOM devices on hosts within my network.
Implementing extra 3rd party plugins is as simple as copying these files to the
/usr/local/libexec/nagios

# cp /usr/local/libexec/nagios
# chmod 555 /usr/local/libexec/nagios

From there, you can configure and define individual commands using these plugins
using the command templates. How you configure them depends on the plugin in
question.

***** NRPE - Doing things remotely *****

Nagios can easily do network tests which don't require access the the machine
(such as ping tests and http tests). However, there will be times when you need
information from a server that can only be accessed internally with a valid
log in (such a disk size, process availability, etc). When you need to do tests
like these, you need "NRPE"

NRPE is a seperate port at /usr/ports/net-mgmt/nrpe2. (The reason it is called
nrpe2, is that nrpe is the old version that worked with "netsaint" - a precursor
to nagios. Needless to say, nrpe and nrpe2 and not compatible. Make sure you get
nrpe2 for nagios)

nrpe functions on a client machine as a daemon that the Nagios server can talk
with. Note that you don't need to run a nrpe daemon on your Nagios server
(nagios can access its own hosting server quite adequately). Your Nagios server
then uses its own "check_nrpe2" plugin to communicate with nrpe daemons on
client machines. On your Nagios server, you need to make sure that check_nrpe2
is a recognized command, so you need to create a command define:-

COMMAND

define command{
command_name check_nrpe2
command_line /usr/local/libexec/nagios/check_nrpe2 -h $HOSTADDRESS$ -c $ARG1$
}

Of course, you can define other local 3rd party plugins in exactly the same way.
Each plugin has different arguments however. check_nrpe2 expects a target host
address (-h) and command to run on nrpe on that host (-c)

nrpe on client machines is set up with the /usr/ports/net-mgmt/nrpe2 port, as
well as the nagios-plugins port.

First, make sure you enable it at boot time from /etc/rc.conf

nrpe2_enable="YES"

Copy the sample config file in /usr/local/etc

# cp /usr/local/etc/nrpe.cfg-sample /usr/local/etc/nrpe.cfg

configuring nrpe is not hard. First you should edit the allowed_hosts parameter.
Set this so that it reflects only localhost and your Nagios server. This helps
prevent attacks on the daemon.

allowed_hosts=127.0.0.1,

Towards the bottom of the config file you get to configure commands that will
run locally on the client, but can be called into operation by the nagios
server. This is done using the "command" parameter. Below we register the
vaguely named example "someplugin" to the remote command "check_someplugin".
Note that the someplugin local command has a preset argument. It is possible to
allow arguments from the nagios server, if you enable the aptly named
dont_blame_nrpe option.

command[check_someplugin]=/usr/local/libexec/nagios/someplugin someargument

Finally, you can set up service checks to this remote command by creating a
service template on the Nagios server. All you have to do is remember to add the
following line to your service config:-

check_command check_nrpe2!check_someplugin

And that does the job!

**** Manipulating sudo on the Nagios host server *****

Nagios usually executes while running as the underprivileged "nagios" user. As a
result, any commands which require root to run just won't. Of course,
anything running with root privileges constitutes a security risk. The sudo
command is your trade off. You can install that from /usr/ports/security/sudo

It's more of a hassle (and a larger security risk) to have nagios run as root,
so it is infinitely better to selectively use sudo for only the commands that really need them.

What we have to do is to create a sub-directory in /usr/local/libexec/nagios
(I'll call it "sudo") and then copy all your sudo-requiring scripts there. Then
to lock the sudo directory down and only allow root access. You don't want your plugin commands to be hijacked and able to run root commands of an
underprivileged user, so security on the plugin directory needs to be
escalated.

# chmod -R 700 /usr/local/libexec/nagios/sudo

Then, add the sudo rule for that special directory with visudo (don't forget the
trailing slash on the path)

nagios ALL=(ALL) NOPASSWD:/usr/local/libexec/nagios/sudo/

Then define your command to use sudo first on your command line:-

define command {
command_name check_needsroot
command_line /usr/local/bin/sudo /usr/local/libexec/nagios/sudo/check
_needsroot
}

**** Enabling sudo for nrpe *****

In the nrpe.cfg file, there is an option for enabling sudo (command_prefix=/usr/local/bin/sudo), but this will execute every nrpe2 plugin as root, which isn't necessary. So forget about it.

Otherwise, having remote servers run nrpe and execute sudo scripts is almost identical to the way you would set it up on the local nagios server.
You create a special nagios sudo plugin directory, add the nagios user to sudo, place your sudo requiring scripts in there, and lock it with chmod.

The only special thing you have to do is to edit nrpe.cfg, and ensure that the command definition requiring root calls sudo first before the plugin.

command[check_needsroot]=sudo /usr/local/libexec/nagios/sudo/check_needsroot

Hosted by www.Geocities.ws