Author Topic: Nagios monitoring environment  (Read 6745 times)

0 Members and 7 Guests are viewing this topic.

Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Nagios monitoring environment
« on: March 06, 2013, 03:16:33 pm »
Hi all,

at work I am using two debian machines running nagios in a failover configuration to monitor critical services/hosts. At the moment we are working on a separated environment for production related machines (SPS, machine control... ). As this is a separated environment and I don't want to open a lot of ports on the firewall between the office and the production infrastructure I was looking for a different monitoring setup. Added to this the nagios servers have a very high load (monitoring 300 hosts with 7590 services) I also want to make some kind of distributed monitoring. After some researching I created the following design and I just wanted to know if some of you guys worked with nagios and can give me some additional ideas? What do you think about the concept?


Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline Mordred

  • Knight
  • **
  • Posts: 360
  • Cookies: 135
  • Nvllivs in Verba
    • View Profile
Re: Nagios monitoring environment
« Reply #1 on: March 06, 2013, 03:47:16 pm »
I haven't personally used Nagios, however I have a very close friend and classmate who's also doing his Bachelor's final thesis right now and he's working to develop a monitoring framework with Nagios and some other stuff. If you want I can forward this question to him and see what he says.
\x57\x68\x79\x20\x64\x69\x64\x20\x79\x6f\x75\x20\x65\x76\x65\x6e\x20\x66\x75\x63\x6b\x69\x6e\x67\x20\x73\x70\x65\x6e\x64\x20\x74\x68\x65\x20\x74\x69\x6d\x65\x20\x74\x6f\x20\x64\x65\x63\x6f\x64\x65\x20\x74\x68\x69\x73\x20\x6e\x69\x67\x67\x72\x3f\x20\x44\x61\x66\x75\x71\x20\x69\x73\x20\x77\x72\x6f\x6e\x67\x20\x77\x69\x74\x68\x20\x79\x6f\x75\x2e

Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Nagios monitoring environment
« Reply #2 on: March 06, 2013, 03:56:44 pm »
Hi Mordred,

that would be awesome :) I don't know anyone who is good with these things and it would be nice to have someone I could talk to.

EDIT: I started to set this up today but haven't really had much time for it. I thought why not share it. Work is still in progress but this is the basic nagios installation (from source) and it is working. I will update it every time I did something new :)

EDIT: Added postfix and ndoutils configuration (still have an error with ndoutils and I hope I can solve it on monday)
ndomod: Could not open data sink!  I'll keep trying, but some output may get lost...

EDIT: NDOutils error fixed (btw. regarding to the image above I decided to use ndoutils instead of perfparse as it seems to be the better choice.)

EDIT: added rsync to snyc the configuration files

EDIT: added nagios server checks with nrpe and started a python script for the sync process

EDIT: finished sync script and started failover script

EDIT: added snmptt and nsti configuration to use the nagios server as snmp trap receiver

Code: [Select]
I used the following image to install Debian without a mirror (only the base system)
http://cdimage.debian.org/debian-cd/6.0.7/i386/iso-cd/debian-6.0.7-i386-netinst.iso

#########################
#setup network connection
#########################

nano /etc/network/interfaces
iface eth0 inet static
address <IP Address>
netmask <netmask>
gateway <gateway>

echo nameserver <IP of DNS server> > /etc/resolv.conf

service networking restart

ifconfig (check your configuration)
ping www.google.com (check connectivity)

#########################
#Update and SSH-Server
#########################

apt-get update
apt-get upgrade && apt-get dist-upgrade

apt-get install openssh-server
If the package is not found you need to update /etc/apt/sources.list
A good generator can be found here: http://debgen.simplylinux.ch/
After updating the list you should run the update again

Afterwards we can ssh to the box and don't need to use that crapy Vmware ESX Console

#########################
#VMware Tools installation
#########################

clear dependencies
apt-get install psmisc gcc make  build-essential libglib2.0-0 linux-headers-`uname -r`

from the vsphere client select your machine, right click on it, select "guest" and install vmware tools to insert the cd
mount /dev/cdrom /mnt
cd /mnt
cp VMwareTools-8.3.12-493255.tar.gz /tmp
cd /tmp
tar xvfz VMwareTools-8.3.12-493255.tar.gz
cd vmware-tools-distrib/
./vmware-install.pl
(answer all questions with the defaults and you should be fine)
rm VMwareTools-8.3.12-493255.tar.gz

#########################
#Prepare the system
#########################

I haven't found a good list providing all dependencies for the nagios installation. So this is what I found out through try and error.
I would be more than happy if someone could give me a link or something else

apt-get install tree (always usefull to have that)
apt-get install dnsutils (includes nslookup command)

php5 with mysql libraries installation
apt-get install php5
apt-get install php5-mysql

perl
apt-get install perl
cpan -i Net::SNMP

required graphic libraries (network map, performance graphs, etc.)
http://support.nagios.com/knowledgebase/faqs/index.php?option=com_content&view=article&id=52&catid=35&faq_id=55&expand=false&showdesc=true
apt-get install libgd2-noxpm
apt-get install libgd2-noxpm-dev
apt-get install libgd-gd2-noxpm-perl
apt-get install graphviz
apt-get install libgraphviz-dev
apt-get install libgd-gd2-perl
apt-get install libgd-graph-perl
apt-get install libgd-graph3d-perl
apt-get install php5-gd
apt-get install libpng12-0
apt-get install libpng12-dev
apt-get install libjpeg8
apt-get install libjpeg8-dev
apt-get install zlib-bin
apt-get install zlib1g-dev

apache2 installation
apt-get install apache2
apt-get install libapache2-mod-perl2
apt-get install libapache2-mod-scgi
apt-get install libapache2-mod-php5

snmp (without it the nagios-plugins won't compile the check_snmp module)
apt-get install snmp snmpd

additional packages needed for some plugins I use
apt-get install libssl-dev
apt-get install libmysqld-dev
apt-get install libldap-2.4-2
apt-get install libldap2-dev

mysql installation (only required if you want to use it.. lol)
apt-get install mysql-server-5.1
apt-get install mysql-client-5.1
apt-get install mysql-admin
apt-get install php5-mysql
apt-get install libmysql++-dev

postfix installation
apt-get install postfix
(I installed it in satellite mode cause the postfix should only relay to our Exchange server)

#########################
#Compile Nagios
#########################

create nagios user
/usr/sbin/useradd -m nagios
passwd nagios

create group and add user
/usr/sbin/groupadd nagios
/usr/sbin/usermod -G nagios nagios

create group for executing command from the nagios webinterface
/usr/sbin/groupadd nagcmd
/usr/sbin/usermod -G nagcmd nagios
/usr/sbin/usermod -G nagcmd www-data

get nagios and plugins
cd /tmp
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.4.4.tar.gz
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.16.tar.gz
tar xzvf nagios-3.4.4.tar.gz
tar xzvf nagios-plugins-1.4.16.tar.gz

compile nagios
cd /nagios
./configure --with-command-group=nagcmd > log.txt
(I piped the output to a text file for further reference. You should review the file to see if you got errors or missing dependencies
If you installed all above mentioned packets you should be fine)
If you have errors and need to install additional packets be sure to run make clean before you run the configure script again.
these errors are ok:
checking pthreads.h usability... no
checking pthreads.h presence... no
checking for pthreads.h... no
checking socket.h usability... no
checking socket.h presence... no
checking for socket.h... no
checking uio.h usability... no
checking uio.h presence... no
checking for uio.h... no
checking for pthread_create in -lcma... no
checking for main in -liconv... no
checking for gdImagePng in -lgd (order 1)... no
checking ltdl.h usability... no
checking ltdl.h presence... no
checking for ltdl.h... no

make all
make install
make install-init
make install-commandmode
make install-config
make install-webconf
make install-exfoliation (you can use make install-classicui if you prefer the classic webinterface)

create Webinterface user
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
service apache2 restart

#########################
#Compile Nagios Plugins
#########################

cd nagios-plugins-1.4.16
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

#########################
#Finish Nagios installation
#########################

check config file
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors
service nagios restart

#########################
#Postfix configuration
#########################

First we need to edit /usr/local/nagios/etc/objects/contacts.cfg
Change the email address to the one you want to use. The part has the following comment: <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******

To test if your system is able to send mails (via the host you set as relay during postfix installation):
echo "testbody" | mail -s "testsubject" email@address.com

postfix already created a config file for us which is located in:
/etc/postfix/main.cf
open it and do the following customizations

queue_directory = /var/spool/postfix
command_directory = /usr/sbin
data_directory = /var/lib/postfix
mail_owner = postfix
mydomain = domain.local
mydestination = $myhostname, localhost.$mydomain, localhost
unknown_local_recipient_reject_code = 550
mynetworks = 127.0.0.0/8
relay_domains = $mydestination, hash:/etc/postfix/relay
debug_peer_list = 2
newaliases_path = /usr/bin/newaliases
mailq_path = /usr/bin/mailq
setgid_group = mail
html_directory = /usr/share/doc/packages/postfix-doc/html
manpage_directory = /usr/share/man
sample_directory = /usr/share/doc/packages/postfix-doc/samples
inet_protocols = ipv4
relay_domains = $mydestination, hash:/etc/postfix/relay
transport_maps = hash:/etc/postfix/transport
sender_canonical_maps = hash:/etc/postfix/sender_canonical

afterwards we need to create three files we are referencing in the main.cf
(I use two different relays cause we have a sms gateway which turns emails into and sms and sends them.)

filename: /etc/postfix/relay
domain.com OK    #primary email domain used by Exchange
sms.local OK    #secondary email domain used by sms gateway

filename: /etc/postfix/sender_canonical (define the send from addresses)
nagios nagios.hostname@domain.com
root root.hostname@domain.com

filename: /etc/postfix/transport
domain.com smtp:[IP_EXCHANGE]
sms.local smtp:[IP_SMS_GATEWAY]

some postfix tips
show all configurations: postconf
To see how you differ from the defaults: postconf -n
create db from configuration: postmap /etc/postfix/filename
empty mail queue: postsuper -d ALL
This is a great ressource: https://wiki.archlinux.org/index.php/Postfix
check your config: postfix check

Create db's from the new created files
postmap /etc/postfix/relay
postmap /etc/postfix/sender_canonical
postmap /etc/postfix/transport

running the postfix check command gave me the following errors
postfix/postfix-script: warning: not owned by postfix: /var/spool/postfix/public
postfix/postfix-script: warning: not owned by group mail: /var/spool/postfix/maildrop
postfix/postfix-script: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue
postfix/postfix-script: warning: not set-gid or not owner+group+world executable: /usr/sbin/postdrop

Fix it with:
chown -R postfix.mail /var/spool/postfix/public
chown -R postfix.mail /var/spool/postfix/maildrop
chmod g+s /usr/sbin/po*

you can now try to send a custom service notification from the nagios webinterface

#########################
#Install NDOutils
#########################

Download NDOUtils from here: http://sourceforge.net/projects/nagios/files/ndoutils-1.x/ndoutils-1.5.2/ndoutils-1.5.2.tar.gz/download
tar xvzf ndoutils-1.5.2.tar.gz
cd ndoutils-1.5.2
./configure --with-mysql-lib=/usr/lib64/mysql --prefix=/usr/local/ndoutils-1.5.2 --with-ndo2db-user=nagios --with-ndo2db-group=nagios

make
make install
cd src
cp ndomod-3x.o ndo2db-3x file2sock log2ndo /usr/local/nagios/bin
cd ..
cd config
cp ndo2db.cfg-sample ndomod.cfg-sample /usr/local/nagios/etc

change file permissions of the binaries
chmod 744 /usr/local/nagios/bin/ndo*
chown nagios:nagios /usr/local/nagios/etc/ndo2db.cfg
chown nagios:nagios /usr/local/nagios/etc/ndomod.cfg
chmod 666 ndo2db.cfg
chmod 666 ndomod.cfg
chown nagios:nagios /usr/local/nagios/bin/ndo*

create MySQL DB and initialize
mysql -u root -p
mysql> create user 'ndouser'@'localhost';
mysql> create database nagios;
mysql> grant all on nagios.* to 'ndouser'@'localhost' identified by 'ndopassword';
mysql> flush privileges;
mysql> exit

cd download directory
mysql -u root -p nagios < mysql.sql

a couple of mysql commands: http://www.pantz.org/software/mysql/mysqlcommands.html

Edit nagios.cfg (nano /usr/local/nagios/etc/nagios.cfg)
add the following lines:
broker_module=/usr/local/nagios/bin/ndomod-3x.o config_file=/usr/local/nagios/etc/ndomod.cfg

make sure you also have the following line:
event_broker_options=-1

Edit ndomod.cfg and ndo2db.cfg with the mysql data (/usr/local/nagios/etc)
and change the path's to your nagios (basically replace ndoutils-1.5.2 with nagios)

install init script
cp /download/ndoutils-1.5.2/daemon-init /etc/init.d/ndo2db
Edit the following lines in the init script
servicename=ndo2db
prefix=/usr/local/ndoutils-1.5.2
exec_prefix=/usr/local/ndoutils-1.5.2
Ndo2dbBin=/usr/local/ndoutils-1.5.2/bin/ndo2db
Ndo2dbCfgFile=/usr/local/nagios/etc/ndo2db.cfg
Ndo2dbVarDir=/usr/local/nagios/var
Ndo2dbRunFile=$Ndo2dbVarDir/ndo2db.lock
Ndo2dbLockDir=/var/lock/subsys
Ndo2dbLockFile=ndo2db
Ndo2dbUser=nagios
Ndo2dbGroup=nagios
chmod +x /etc/init.d/ndo2db
/etc/init.d/ndo2db start

check if daemon is running
ps -ef | grep ndo2db

check config file
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors
service nagios restart

You can verify your setup by looking into the db.
mysql -u root -p
use nagios
select * from nagios_hosts

check for errors:
cat /usr/local/nagios/var/nagios.log

#########################
#Install second Nagios (Backup)
#########################

Repeat all steps mentioned above first to have the exact same system!!!

#########################
#setup rsync
#########################

To synchonize the configuration files between both machines we will use rsync
In the following I will refer them as NagiosMaster and NagiosBackup

Create a new folder where you want to store your host configuration files
mkdir /usr/local/nagios/etc/sites

Put the new path to nagios.cfg
cfg_dir=/usr/local/nagios/etc/sites

Install rsync on both machines
apt-get install rsync

make sure you have the following line in /etc/ssh/sshd_config on NagiosBackup
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile      %h/.ssh/authorized_keys

create the public/private ssh key pair on NagiosMaster
ssh-keygen -t rsa
!!! Don't enter a passphrase, just hit enter!!!

now we transfer the public key from NagiosMaster to NagiosBackup
scp /root/.ssh/id_rsa.pub IP_NAGIOS_BACKUP:/root/.ssh

Rename the key on NagiosBackup and set file/folder permissions
cd /root/.ssh
mv id_rsa.pub authorized_keys
chmod 600 /root/.ssh/authorized_keys
chmod 700 /root/.ssh

to test the keys we can connect from NagiosMaster to NagiosBackup
ssh root@IP_NAGIOSBACKUP
if you are promted for a password something is wrong :)

we create a pretty simple rsync script on NagiosMaster and place it in /root (rsync.sh)
#!/bin/bash
SOURCEPATH='/usr/local/nagios/etc/sites'
DESTPATH='/usr/local/nagios/etc/'
DESTHOST='10.132.72.171'
LOGFILE='/var/log/rsync.log'
echo $'\n\n' >> $LOGFILE
rsync -e 'ssh -p 22' -avzp $SOURCEPATH $DESTHOST:$DESTPATH
echo "Completed at: `/bin/date`" >> $LOGFILE

chmod +x /root/rsync.sh

Now we place a sample file in /usr/local/nagios/etc/sites and run the script to see if the file is synced to NagiosBackup

Last but not least we create a cron job to execute the script every hour (NagiosMaster)
crontab -e
0 * * * * /root/rsync.sh
service cron restart

#########################
#mutual check
#########################

Now we need to make both Nagios Server check each other. These host configuration files should bot be synced between
the boxes so we place them into another directory. We also mv the deafult localhost.cfg into this directory

mkdir /usr/local/nagios/etc/nosync
mv /usr/local/nagios/etc/objects/localhost.cfg /usr/local/nagios/etc/nosync/localhost.cfg

now we need to tell nagios that this folder contains host config files too by adding the following line to the nagios.cfg
cfg_dir=/usr/local/nagios/etc/sites
cfg_dir=/usr/local/nagios/etc/nosync

and we change the following
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg (comment that out)

now we can create a host configuration in /usr/local/nagios/etc/nosync on both server checking each other

on NagiosMaster we create the file NagiosBackup.cfg with the following checks:

###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the local machine

define host{
        use                     linux-server
        host_name               NagiosBackup
        alias                   Nagios Backup
        address                 NAGIOSBACKUP_IP
        }

###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             PING
    check_command            check_ping!100.0,20%!500.0,60%
        }

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             CPU Load
    check_command            check_nrpe!check_load
        }

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             Current Users
    check_command            check_nrpe!check_users
        }

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             Total Processes
    check_command            check_nrpe!check_total_procs
        }

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             Zombie Processes
    check_command            check_nrpe!check_zombie_procs
        }

define service{
        use                             generic-service
        host_name                       NagiosBackup
        service_description             Nagios Process
    check_command            check_nrpe!check_nagios_proc
        }


And on NagiosBackup we create the file NagiosMaster.cfg with the following checks just for a different host.
We didn't use any custom Hostgroups/Servicegroups/Templates and so on for now. We will add that later.
You may have noticed that the checks are performed via nrpe. Now we need to install and configure it

#########################
#Check_nrpe install/configuration
#########################

The following steps need to be done on both Nagios machines

we need xinetd for this add-on
apt-get install xinetd

Download from here:
http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz/download?use_mirror=freefr

tar xzvf nrpe-2.13.tar.gz
cd nrpe-2.13
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd

Edit /etc/xinetd.d/nrpe
only_from       = 127.0.0.1 NAGIOSBACKUP_IP (NAGIOSMASTER_IP depends on the host)

add the following entry to /etc/services
nrpe            5666/tcp                        # nrpe

Restart xinetd service
service xinetd restart

check if the service is running and listening
netstat -at | grep nrp
Output should be:
tcp        0      0 *:nrpe                  *:*                     LISTEN

To check the functionality we can use the new plugin
/usr/local/nagios/libexec/check_nrpe -H localhost (you can change localhost to the IP of one host to check functionality)
should return: NRPE v2.13

Last but not least we need to define the check commands in /usr/local/nagios/etc/objects/commands.cfg
# check_nrpe
define command{
    command_name    check_nrpe
    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
    }

and add the following to /usr/local/nagios/etc/nrpe.cfg
command[check_nagios_proc]=/usr/local/nagios/libexec/check_procs -C nagios -c 1:1000000 -p 1

service xinetd restart

After completing these steps we can check the nagios config files and restart the service
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors
service nagios restart

Afterwards you should be able to see the new host in the nagios webinterface. We now have two nagios servers monitoring each other

#########################
#sync status information
#########################

Firts of all we disable notifications, service checks and hosts checks on the NagiosBackup server
Webinterface -> Process Info -> turn of all mentioned processes.

Next we create a host configuration with only ping to have some data on the Master the Backup machine has not

nano /usr/local/nagios/etc/sites/somehost.cfg

###############################################################################
###############################################################################
#
# HOST DEFINITION
#
###############################################################################
###############################################################################

# Define a host for the local machine

define host{
        use                     linux-server
        host_name               SomeHost
        alias                   Test Data Host
        address                 SomeHost_IP
        }

###############################################################################
###############################################################################
#
# SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

define service{
        use                             generic-service
        host_name                       SomeHost
        service_description             PING
    check_command            check_ping!100.0,20%!500.0,60%
        }

check the config files and restart Nagios Process
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors
service nagios restart

Edit nagios.cfg on NagiosMaster and NagiosBackup
retain_state_information=1 (seems to be default)

I decided to use python now and combine both rsync tasks in one script. To get the status information synced we need
to copy the file retention.dat. This file contains all state informations (NagiosBackup/NagiosMaster and localhost included)
and some shouldn't be syncronized.

After I created and tested the script I realized that it is not working like I expected it to work. After consulting the official Nagios documentation I found this:

Format:     state_retention_file=<file_name>
Example:     state_retention_file=/usr/local/nagios/var/retention.dat

This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. In order to make Nagios retain state information between program restarts, you must enable the retain_state_information option.

So I edited the script to first stop the nagios process on the backup machine, copy over the new retention file and start nagios process afterwards.
As we are doing the restart during the sync process now we don't need the restart function anymore. I left it in the script for future use.. lol

You can copy the finished and working script from the code box below :)

Last but not least we change our crontab to execute this script every 30 minutes
crontab -e
0,30 * * * * /usr/local/nagios/etc/nagiossync_v1.py
service cron restart

#########################
#Configure Failover
#########################

from the official nagios documentation (http://nagios.sourceforge.net/docs/3_0/redundancy.html)

Set up a cron job on the slave host that periodically (say every minute) runs a script that checks the staus of the Nagios process on the master host (using the check_nrpe plugin on the slave host and the nrpe daemon and check_nagios plugin on the master host). The script should check the return code of the check_nrpe plugin . If it returns a non-OK state, the script should send the appropriate commands to the external command file to enable both notifications and active service checks. If the plugin returns an OK state, the script should send commands to the external command file to disable both notifications and active checks.
By doing this you end up with only one process monitoring hosts and services at a time, which is much more efficient that monitoring everything twice.

Fist we need to disable some directives on NagiosBackup in nagios.cfg
execute_service_checks = 0
enable_notifications = 0
check_external_commands = 1

Lets create another script, this time on NagiosBackup and we call it failovercheck.py

I will use the following two nagios plugins to perform the "is master alive" checks

root@NagiosBackup:/usr/local/nagios/libexec# ./check_nrpe -H IP_NAGIOSMASTER -c check_nagios_proc
PROCS OK: 1 process with command name 'nagios', PPID = 1
PROCS CRITICAL: 0 processes with command name 'nagios', PPID = 1

root@NagiosBackup:/usr/local/nagios/libexec# ./check_ping -H IP_NAGIOSMASTER -w 100.0,20% -c 500.0,60%
PING OK - Packet loss = 0%, RTA = 2.38 ms|rta=2.375000ms;100.000000;500.000000;0.000000 pl=0%;20;60;0
PING CRITICAL - Packet loss = 100%|rta=500.000000ms;100.000000;500.000000;0.000000 pl=100%;20;60;0

you can copy the script from the code box below :)

for making the script completely work you need to allow NagiosBackup to ssh to NagiosMaster with a rsa key.
This can be done by following the steps described in the rsync section. You just need to switch the hosts.

Last but not least you should create a cronjob to execute the script every 5 minutes.

#########################
#Nagios SNMP Trap Receiver
#########################

Install snmptt (http://snmptt.sourceforge.net/docs/snmptt.shtml)
cpan -i Text::ParseWords
cpan -i Getopt::Long
cpan -i Config::IniFiles
cpan -i Time::HiRes
cpan -i Text::Balanced
cpan -i Sys::Syslog
cpan -i DBI
cpan -i DBD::mysql

Download SNMPTT, extract and compile/install
http://sourceforge.net/projects/snmptt/files/snmptt/snmptt_1.3/snmptt_1.3.tgz/download
tar xvfz snmptt_1.3.tgz
cd /snmptt_1.3
cp snmptt /usr/sbin
chmod +x /usr/sbin/snmptt
cp snmptthandler-embedded /usr/sbin/
cp snmptthandler /usr/sbin
cp snmptt.ini /etc/snmp/
mkdir /var/log/snmptt

Configure snmptrapd and install the service
nano /etc/snmptrapd.conf
insert this: perl do "/usr/sbin/snmptthandler-embedded";
mkdir /var/spool/snmptt
cp snmptt-init.d /etc/init.d/snmptt

the following must be changed in the init.d file
# source function library
. /etc/init.d/functions
Change to:
# source function library
. /lib/lsb/init-functions

And
daemon /usr/sbin/snmptt $OPTIONS
needs to be changed to:
start_daemon /usr/sbin/snmptt $OPTIONS

create /etc/snmp/snmptt.conf

Edit snmptt.ini regarding to your needs. This is a sample what I changed:
dns_enable = 1
resolve_value_ip_addresses = 1
net_snmp_perl_enable = 1
translate_log_trap_oid = 4
translate_value_oids = 4
translate_enterprise_oid_format = 4
translate_trap_oid_format = 4
translate_varname_oid_format = 4
daemon_uid = root
log_system_enable = 1
db_translate_enterprise = 1
mysql_dbi_enable = 1
mysql_dbi_username = snmptt
mysql_dbi_password = password
DEBUGGING = 2
DEBUGGING_FILE = /var/log/snmptt/snmptt.debug
DEBUGGING_FILE_HANDLER = /var/log/snmptt/snmptthandler.debug

create MySQL DB and initialize
mysql -u root -p
mysql> create user 'snmptt'@'localhost';
mysql> create database snmptt;
mysql> grant all on snmptt.* to 'snmptt'@'localhost' identified by 'snmpttpassword';
mysql> flush privileges;
mysql> exit

cp snmptt.logrotate /etc/logrotate.d/snmptt

service snmptt start
snmptrapd -On -C -c /etc/snmp/snmptrapd.conf

Download NSTI
http://exchange.nagios.org/directory/Addons/SNMP/Nagios-SNMP-Trap-Interface-%28NSTI%29/details
tar xzvf nsti-rc1.4.tar.gz
cd nsti/dist
mysql -u root -p snmptt < snmptt-1.2.sql
mysql -u root -p snmptt < snmptt_unknown.sql
mysql -u root -p snmptt < snmptt_test.sql

edit nsti/etc/config.ini to your needs. I only edited the mysql user pwd:

add this to your /etc/apache2/conf.d/nagios.conf

Alias /nsti "/usr/local/nsti/"

<Directory "/usr/local/nsti">
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
   AuthName "Nagios Access"
   Authtype Basic
   AuthUserFile /usr/local/nagios/etc/htpasswd.users
   Require valid-user
</Directory>

service apache2 restart
service snmptt restart
cp -R /<path_to_download_nsti> /usr/local/
chmod -R +r nsti
chmod -R +x nsti

Now you should be able to access the webinterface http://IP_NAGIOSMASTER/nsti

To receive traps add the following lines to /etc/snmp/snmptrapd.conf
authCommunity log,execute,net COMMUNITY
traphandle default /usr/sbin/snmptt

and restart the service

Create a link to nsti on the Nagios Webinterface
edit /usr/local/nagios/share/side.php and add the following lines

<div class="navsection">
<div class="navsectiontitle">Custom</div>
<div class="navsectionlinks">
<ul class="navsectionlinks">
<li><a href="/nsti" target="<?php echo $link_target;?>">SNMP Traps</a></li>
</ul>
</div>
</div>

Translate mib for snmptt
cp /home/administrator/Downloads/snmptt_1.3/snmpttconvertmib /etc/snmp
place some mib file in /usr/share/mibs (I placed the Riverbed Steelhead MIB there)

apt-get install snmp-mibs-downloader
MIBDIRS=/usr/share/mibs:/usr/share/mibs/ietf:/usr/share/mibs/iana:/usr/share/mibs/netsnmp
export MIBDIRS
MIBS=ALL;export MIBS

edit /etc/default/snmpd
export MIBS=UCD-SNMP-MIB

convert the MIB for snmptt
/etc/snmp# ./snmpttconvertmib --in=/usr/share/mibs/STEELHEAD-MIB.txt --out=/etc/snmp/snmptt_steelhead.conf

now we need to add the new mib to snmptt.ini

[TrapFiles]
# A list of snmptt.conf files (this is NOT the snmptrapd.conf file).  The COMPLETE path
# and filename.  Ex: '/etc/snmp/snmptt.conf'
snmptt_conf_files = <<END
/etc/snmp/snmptt.conf
/etc/snmp/snmptt_steelhead.conf
END

service snmptt restart
You can now send a test trap to see that it is working :)

fyi the cisco mibs come in another format *.my. To translate them you need something like that
/etc/snmp/snmpttconvertmib --format_desc=6 --net_snmp_perl --in=/usr/share/mibs/CISCO-CIDS-MIB.my --out=/etc/snmp/snmptt_cisco_cids.conf

to convert all standard mibs for snmp you can do the following:
for i in /usr/share/mibs/ietf/*
> do
> /etc/snmp/snmpttconvertmib --in=$i --out=snmptt_ietf.conf
> done

#########################
#Updating Nagios Core
#########################

during the setup a new Nagios core version was released. Here the steps to update the core

wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.5.0.tar.gz
tar xzvf nagios-3.5.0.tar.gz

cd nagios
./configure --with-command-group=nagcmd

make all
make install

After completing these steps we can check the nagios config files and restart the service
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
If there are no errors
service nagios restart

#########################
#Troubleshooting
#########################

After some time I received the following message:
PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/lib/php5/20090626+lfs/gd.so'
But only on one Nagios server. I reinstalled php5-gd and the issue was fixed. I am pretty sure I installed it before ??

foo {\nhost_name=H and ends with }
« Last Edit: March 25, 2013, 05:54:54 pm by RedBullAddicted »
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Nagios monitoring environment
« Reply #3 on: March 19, 2013, 05:11:53 pm »
Hi,

I hope this is not considered as double post. I just wanted to have a extra post for the scripts I made.

The above written installation instructions point to those scripts.

nagiossync.py (synchronize host configuration and status information (check results, downtime, comments ...) from NagiosMaster to NagiosBackup)

Code: (python) [Select]
#!/usr/bin/env python

#!/usr/bin/env python

import commands
import os
import time
import re

def logfilesize(logfile):
    if os.path.exists(logfile):
        logsize = os.path.getsize(logfile)
        if logsize >= 10485760:
            os.system('rm %s' %logfile)

def SyncConfigDir(sourcepath, desthost, destpath, logfile):
    localtime = time.asctime(time.localtime(time.time()))
    logfile = open(logfile, 'a')
    logfile.write('\n### starting config sync at %s ###\n' %localtime)
    status, output = commands.getstatusoutput("rsync -e 'ssh -p 22' -azvp %s %s:%s" %(sourcepath, desthost, destpath))
    logfile.write(output)   
    logfile.write('\n### End of config sync ###\n')
    logfile.close()

def SyncStatusInfo(sourcepath, destpath, desthost, retentionpath, logfile):
    retentionfile = retentionpath+'/retention.dat'
    localtime = time.asctime(time.localtime(time.time()))
    logfile = open(logfile, 'a')
    logfile.write('\n### starting to sync status information at %s ###\n' %localtime)
   
    #get retention.dat from NagiosBackup
    status, output = commands.getstatusoutput("rsync -chavzP --stats root@%s:%s/retention.dat /tmp" %(desthost, retentionpath))   
    if os.path.exists("/tmp/retention.dat"):
        logfile.write('received retention.dat from %s\n' %desthost)
    else:
        logfile.write('error receiving retention.dat from %s\n' %desthost)           

    #open retention.dat from NagiosMaster
    try:
        openretfile = open(retentionfile, 'r')
        retfilecontent = openretfile.read()
    except IOError:
        logfile.write('could not open %s' %retentionfile)
   
    #remove unwanted information from retfilecontent
    pattern = re.compile('(info\s\{)(.*?)(\})', re.DOTALL)
    retfilecontent = pattern.sub('', retfilecontent)

    pattern = re.compile('(program\s\{)(.*?)(\})', re.DOTALL)
    retfilecontent = pattern.sub('', retfilecontent)

    pattern = re.compile('(contact\s\{)(.*?)(\})', re.DOTALL)
    retfilecontent = pattern.sub('', retfilecontent)

    pattern = re.compile('[A-Za-z]*\s\{\shost_name=NagiosBackup(.*?)(\})', re.DOTALL)
    retfilecontent = pattern.sub('', retfilecontent)

    pattern = re.compile('[A-Za-z]*\s\{\shost_name=localhost(.*?)(\})', re.DOTALL)
        retfilecontent = pattern.sub('', retfilecontent)

    #read retention.dat file from NagiosBackup
   
    logfile.write('\n### writing status state information to NagiosBackup retention.dat file ###\n')
    try:
        backupretfile = open('/tmp/retention.dat', 'r')
        backupretfilecontent = backupretfile.read()
        backupretfile.close()
    except IOError:
        logfile.write('Could not open retention.dat file from /tmp')

    #remove unwanted state information (My hosts all start with an h or an m except the localhost and NagiosBackup/NagiosMaster)

    pattern = re.compile('[A-Za-z]*\s\{\shost_name=H(.*?)(\})', re.DOTALL)
    backupretfilecontent = pattern.sub('', backupretfilecontent)
   
    pattern = re.compile('[A-Za-z]*\s\{\shost_name=M(.*?)(\})', re.DOTALL)
    backupretfilecontent = pattern.sub('', backupretfilecontent)

    #create new retention.dat file and send it to NagiosBackup

    try:
        newbackupretfile = open('/tmp/newretention.dat', 'w')
    except IOError:
        logfile.write('could not create new retention.dat file /tmp/newretention.dat')

    newbackupretfile.write(backupretfilecontent)
    newbackupretfile.write('\n# synced data from NagiosMaster\n')
    newbackupretfile.write(retfilecontent)
    newbackupretfile.close()
    logfile.write('successfully added service state information to retention.dat file\n')
   
    #set correct permissions to new retention file, rename, send and do the cleanup
    os.system('rm /tmp/retention.dat')
    os.system('mv /tmp/newretention.dat /tmp/retention.dat')
    logfile.write('\n### stoping Nagios process on NagiosBackup ###\n')
    status, output = commands.getstatusoutput("ssh root@%s 'service nagios stop'" %desthost)
    logfile.write(output)
    status, output = commands.getstatusoutput("ssh root@%s 'mv %s /usr/local/nagios/var/retention.dat_orig'" %(desthost, retentionfile))
    logfile.write('\n### trying to copy retention.dat to NagiosBackup ###\n')
    status, output = commands.getstatusoutput("rsync -e 'ssh -p 22' -azvp /tmp/retention.dat %s:%s" %(desthost, retentionpath))
    logfile.write(output)
    status, output = commands.getstatusoutput("ssh root@%s 'chown nagios:nagios %s'" %(desthost, retentionfile))
    logfile.write('\n### trying to start nagios process in NagiosBackup again ###\n')
    status, output = commands.getstatusoutput("ssh root@%s 'service nagios start'" %desthost)
    logfile.write(output)
    os.system('rm /tmp/retention.dat')

    logfile.close()   

def Restart(logfile, desthost):
    #send service nagios restart to NagiosBackup to make him read the new synced configuration
        localtime = time.asctime(time.localtime(time.time()))
        logfile = open(logfile, 'a')
        logfile.write('\n### sending restart command to %s at %s ###\n' %(desthost, localtime))
    status, output = commands.getstatusoutput("ssh root@%s 'service nagios restart'" %desthost)
        logfile.write(output)
        logfile.write('\n### Nagios Process has been restarted ###\n')
        logfile.close()
   

def checkFailover(desthost):
    #check if a failover happend. If it is the case we don't copy the status informations
    status, output = commands.getstatusoutput("ssh root@%s 'cat /usr/local/nagios/etc/failfile.log'" %desthost)
    if "cat: /usr/local/nagios/etc/failfile.log: No such file or directory" in output:
        return True
    else:
        return False   
   
if __name__ == "__main__":

    sourcepath = '/usr/local/nagios/etc/sites'
    destpath = '/usr/local/nagios/etc/'
    desthost = '10.132.72.171'
    retentionpath = '/usr/local/nagios/var'
    logfile = '/var/log/nagiossync.log'

    logfilesize(logfile)
    SyncConfigDir(sourcepath, desthost, destpath, logfile)
    if checkFailover(desthost):
        SyncStatusInfo(sourcepath, destpath, desthost, retentionpath, logfile)
    #Restart(logfile, desthost)


failovercheck.py (check if NagiosMaster is alive and the nagios process is running. If not do a failover to NagiosBackup)

Code: (python) [Select]
#!/usr/bin/env python

import commands
import sys
import time
import datetime
import os

def NagiosProcess(check_nrpe, nagiosmasterip):
    status, output = commands.getstatusoutput("%s -H %s -c check_nagios_proc" %(check_nrpe, nagiosmasterip))
    if output.startswith('PROCS CRITICAL'):
        return True
    else:
        return False

def NagiosPing(check_ping, nagiosmasterip):
    status, output = commands.getstatusoutput("%s -H %s -c check_nagios_proc" %(check_ping, nagiosmasterip))
    if output.startswith('PING CRITICAL'):
        return True
    else:
        return False

def failover(procRun, pingReply, nagiosmasterip, nagiossync, nagioscfg, logfile, failfile):
    if procRun and not pingReply:
        time.sleep(30)
        procRun2 = NagiosProcess(check_nrpe, nagiosmasterip)
        if procRun2:
            logfile.write('\nNagios process on %s is not running! Performed two checks\n' %nagiosmasterip)
            logfile.write('enabled execute_service_checks and enable_notifications in nagios.cfg\n')
            os.system("sed -i 's/execute_service_checks=0/execute_service_checks=1/' %s" %nagioscfg)
            os.system("sed -i 's/enable_notifications=0/enable_notifications=1/' %s" %nagioscfg)
            os.system("ssh root@%s '%s'" %(nagiosmasterip, nagiossync))
            logfile.write('executed nagiossync_v1.py on %s\n' %nagiosmasterip)           
            faillog = open(failfile, 'w')
            faillog.write('failover')
            faillog.close()

    if (pingReply and procRun) or (pingReply and not procRun):
        time.sleep(30)
        pingReply2 = NagiosPing(check_ping, nagiosmasterip)
        if pingReply2:
            logfile.write('\nNagiosMaster does not respond! Performed two checks\n')
            logfile.write('enabled execute_service_checks and enable_notifications in nagios.cfg\n')
            os.system("sed -i 's/execute_service_checks=0/execute_service_checks=1/' %s" %nagioscfg)
            os.system("sed -i 's/enable_notifications=0/enable_notifications=1/' %s" %nagioscfg)
            logfile.write('restarting Nagios process')
            status, output = commands.getstatusoutput('service nagios restart')
            logfile.write(output)
            faillog = open(failfile, 'w')
            faillog.write('failover')
            faillog.close()
   
def failback(procRun, pingReply, nagioscfg, failfile, logfile):
    if not procRun and not pingReply:
        logfile.write('\nNagiosMaster is back online again! disabling checks and notifications now\n')
        os.system("sed -i 's/execute_service_checks=1/execute_service_checks=0/' %s" %nagioscfg)
        os.system("sed -i 's/enable_notifications=1/enable_notifications=0/' %s" %nagioscfg)
        logfile.write('restarting Nagios process')
        status, output = commands.getstatusoutput('service nagios restart')
        logfile.write(output)
        os.system('rm %s' %failfile)

if __name__ == "__main__":

    check_nrpe = '/usr/local/nagios/libexec/check_nrpe'
    check_ping = '/usr/local/nagios/libexec/check_ping'
    faillog = '/var/log/failover.log'
    nagiossync = '/usr/local/nagios/etc/nagiossync_v1.py'
    nagioscfg =  '/usr/local/nagios/etc/nagios.cfg'
    nagiosmasterip = '10.132.72.170'
    failfile = '/usr/local/nagios/etc/failfile.log'

    logfile = open(faillog, 'a')

    procRun = NagiosProcess(check_nrpe, nagiosmasterip)
    pingReply = NagiosPing(check_ping, nagiosmasterip)   

    if (not procRun and not pingReply) and os.path.exists(failfile):
        failback(procRun, pingReply, nagioscfg, failfile, logfile)

    if (procRun or pingReply) and not os.path.exists(failfile):
        failover(procRun, pingReply, nagiosmasterip, nagiossync, nagioscfg, logfile, failfile)
       

    logfile.close()

EDIT: both scripts are working for now. They need a bit of optimization but I will leave them for now.
EDIT: Fixed a little bug in the nagiossync script. If the Nagios Process on NagiosMaster failed we don't want to sync the status information from the master to the backup server anymore.

EDIT: I created a small plugin to check security alerts send from an ips to the snmptrapd service on the nagios server. As I wanted to have other information in the email as the information provided in the nagios webinterface I needed to let the email be generated by the plugin and not the standard way nagios does it. This has one con. You need to have the recipient in the plugin configured and can't use the standard configuration script nagios provides for email recipients (contacts.cfg).

Code: (python) [Select]
#!/usr/bin/env python

import MySQLdb as mdb
import sys
import os
import socket

def getData(dbserver, dbuser, dbpass, db):
    try:
        dbcon = mdb.connect(dbserver, dbuser, dbpass, db)
        dbcur = dbcon.cursor()
        dbcur.execute('SELECT * FROM snmptt WHERE hostname LIKE "ips" AND severity LIKE "warning" OR severity LIKE "critical" ORDER BY "id"')
        dbdata = dbcur.fetchall()
        return dbdata
    except mdb.Error, e:
        print "Error %d: %s" % (e.args[0],e.args[1])
            sys.exit(2)    #CRITICAL = 2
    finally:
        if dbcon:
            dbcon.close()

def notify(dbdata, contacts, dbserver, dbuser, dbpass, db ):
    i = 0
    while i < int(len(dbdata)):
        #dbdata[i][0] = id, dbdata[i][6] = hostname, dbdata[i][9] = severity, dbdata[i][12] = traptext, dbdata[i][11] = traptime
        try:
            host = socket.gethostbyaddr(dbdata[i][6])
            host = host[0]
        except socket.herror:
            host = dbdata[i][6]
        try:
            address = socket.gethostbyname(dbdata[i][6])
        except socket.herror:
            address = dbdata[i][6]
        for contact in contacts:
            os.system('echo "*****Nagios*****\n\nNotification Type: %s\n\nService: SNMP Traps\nHost: %s\nAddress: %s\nState: %s\n\nDate/Time: %s\n\nAdditional Info:\n\n%s" | mail -s "** Security Alert %s **" %s' %(dbdata[i][9], host, address, dbdata[i][9], dbdata[i][11], dbdata[i][12], host, contact))
        deleteTrap(dbdata[i][0], dbserver, dbuser, dbpass, db)
        i += 1

def deleteTrap(id, dbserver, dbuser, dbpass, db):
    try:
        dbcon = mdb.connect(dbserver, dbuser, dbpass, db)
        dbcur = dbcon.cursor()
        dbcur.execute('DELETE FROM db WHERE id=%s' %id)
        dbdata = dbcur.fetchall()
        return dbdata
    except mdb.Error, e:
        print "Error %d: %s" % (e.args[0],e.args[1])
            sys.exit(2)    #CRITICAL = 2
    finally:
        if dbcon:
            dbcon.close()
       

def nagiosStatus(dbdata):
    warningCount = len(dbdata)
   
    if warningCount != 0:
        print "TRAPS WARNING: %s traps in database - Mail will be send and trap deleted" %warningCount
        sys.exit(1)    #WARNING = 1
    elif warningCount == 0:
        print "TRAPS OK: no warning traps in database"
        sys.exit(0)      #OK = 0

if __name__ == "__main__":
    dbserver = 'localhost'
    dbuser = 'dbuser'
    dbpass = 'dbpass'
    db = 'db'
    contacts = ['recipient1@domain.com', 'recipient2@domain.com']

    dbdata = getData(dbserver, dbuser, dbpass, db)
   
    if dbdata is not None:
        notify(dbdata, contacts, dbserver, dbuser, dbpass, db)     
   
    nagiosStatus(dbdata)
 
« Last Edit: April 14, 2013, 10:17:15 am by RedBullAddicted »
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline xciter

  • /dev/null
  • *
  • Posts: 6
  • Cookies: 2
    • View Profile
Re: Nagios monitoring environment
« Reply #4 on: March 20, 2013, 09:50:49 am »
Hello,


Did not have that much time this couple of weeks to help you. Anyway, since Nagios has been out of active development for some time now and you want to do distributed monitoring I suggest you check into Check_MK.


This is an open-source plugin that sits on top of Nagios and provides a better check capabilities than Nagios and a much better web interface. The website includes everything you need to know to get it running, administrate and extend it.




Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Nagios monitoring environment
« Reply #5 on: March 20, 2013, 11:04:51 am »
Hi xciter,

at the moment I am using MK_Livestatus as broker for NagVis but I don't want to do that again. Its too much work to maintain all those network maps in Nagvis. Additional to that I use merlin (op5) as broker for the Ninja Webinterface (op5) which is pretty nice but I don't want to use that on the new installation too. Didn't know that check_mk provides a possiblity to deploy distributed monitoring. Thanks for that and I will have a look at it. Regarding to my current research I thought about using mod gearman from ConSol (http://labs.consol.de/lang/de/nagios/mod-gearman/) which seems to be pretty nice. Tbh I haven't had the time to have a closer look to it.

Cheers,
RBA
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline xciter

  • /dev/null
  • *
  • Posts: 6
  • Cookies: 2
    • View Profile
Re: Nagios monitoring environment
« Reply #6 on: March 21, 2013, 10:05:07 am »
Well we did a little research beforehand and found that Check_MK probably involves the least configuration of all of them. You can get a demo version from their main website which come with everything pre-installed. Their desktop/ server plugin is quite nice and can be easily extended. Very comprehensive documentation. :)
« Last Edit: March 21, 2013, 10:05:30 am by xciter »

Offline Snayler

  • Baron
  • ****
  • Posts: 812
  • Cookies: 135
    • View Profile
Re: Nagios monitoring environment
« Reply #7 on: March 21, 2013, 11:08:22 am »
I've been trying Zabbix, it has distributed monitoring and high availability as features. You might want to give it a try.

Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Nagios monitoring environment
« Reply #8 on: March 21, 2013, 01:26:43 pm »
Hi Snayler,

I thought about moving to another monitoring framework but in the time I am using nagios (more than 4 years now) I created a lot of custom checks and scripts and it would take some time to migrate all of them. Throughout all these years I gained a lot of knowledge about the nagios core and how things are working. These are the main reasons why I decided to stay with it :) But thanks for the suggestion anyways.

Cheers,
RBA
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline Snayler

  • Baron
  • ****
  • Posts: 812
  • Cookies: 135
    • View Profile
Re: Nagios monitoring environment
« Reply #9 on: March 25, 2013, 09:37:42 pm »
No problem. I'm currently doing some work with monitoring frameworks for my internship, so I must install some frameworks, test them, review them, etc... I've been messing around with Nagios XI, PRTG, Zabbix and Cacti. Do you know of any other interesting framework one could try? And do you prefer Nagios XI or a Nagios core install on some distro? Sorry about the hijack.

Offline RedBullAddicted

  • VIP
  • Sir
  • *
  • Posts: 519
  • Cookies: 189
    • View Profile
Re: Nagios monitoring environment
« Reply #10 on: March 26, 2013, 06:21:33 am »
Hi Snayler,

no problem and I don't consider it as hijack because it is some kind of topic related :) If you are looking for an opensource solution which is free of charge you should consider Incinga (https://www.icinga.org/).

Quote
Icinga is a fork of Nagios and is backward compatible. So, Nagios configurations, plugins and addons can all be used with Icinga. Though Icinga retains all the existing features of its predecessor, it builds on them to add many long awaited patches and features requested by the user community.

You mentioned Nagios XI so you are willing to pay for a solution? If thats the case you can have a look at Op5 Monitor (http://www.op5.com/network-monitoring/op5-monitor/). This solution is based on Nagios as well. I am on their mailing list for quite some time now and I need to say that they really care about the customers and the developers are pretty good. There are a lot of other solutions for different puporses. If you have a lot of HP hardware the HP System Insight Manager can be very usefull. You can query the SIM database with nagios too and present the status from both systems in one interface.

As far as I know Nagios XI is Nagios with support and you need to pay for it every year. Tbh I don't really think that I need support for my Nagios installation. I always prefered Nagios with snmptt, mrtg and a database backend. I know the Nagios Core pretty good and I know what I need to check if something is not working like I expected. If you have any problems during your validation regarding Nagios Core feel free to contact me :)

Hope this helps
Deep into that darkness peering, long I stood there, wondering, fearing, doubting, dreaming dreams no mortal ever dared to dream before. - Edgar Allan Poe

Offline Snayler

  • Baron
  • ****
  • Posts: 812
  • Cookies: 135
    • View Profile
Re: Nagios monitoring environment
« Reply #11 on: March 26, 2013, 12:49:51 pm »
Icinga looks nice, gonna definitely try it out. Op5 seems worth it, too. Thanks for suggesting them.
You mentioned Nagios XI so you are willing to pay for a solution?
Well, I "installed" Nagios XI (more like copied it) because I'm lazy and they offer a trial on which I can test things on. I don't mind if the software is commercial or open-source. Of course I prefer open-source, but since this report will serve as a guide to others, I would prefer to include a mix of open-source and commercial products and point out which are the best and most featured on both sides. So, as long as the commercial product has some kind of trial/demo I can test things on, I'm interested.

Again, thanks for your suggestions, they will be very useful.