ConSol* Consulting & Solutions Sofware GmbH Deutschland
ConSol* Consulting & Solutions Software GmbH DeutschlandConSol* Consulting & Solutions Software GmbH Deutschland
 
    Home  Open Source  Nagios  check_logfiles  

Description

check_logfiles is a Plugin for Nagios which scans log files for specific patterns.


Motivation

The conventional plugins which scan log files are not adequate in a mission critical environment. Especially the missing ability to handle logfile rotation and inclusion of the rotated archives in the scan allow gaps in the monitoring. Check_logfiles was written because these deficiencies would have prevented Nagios from replacing a propritetary monitoring system.

Features

  • Detection of rotations - usually nightly logfiles are rotated and compressed. Each operating system or company has it's own naming scheme. If this rotation is done between two runs of check_logfiles also the rotated archive has to be scanned to avoid gaps. The most common rotation schemes are predefined but you can describe any strategy (shortly: where and under which name is a logfile archived).
  • More than one pattern can be defined which again can be classified as warning patterns and critical patterns.
  • Triggered actions - Usually nagios plugins return just an exit code and a line of text, describing the result of the check. Sometimes, however, you want to run some code during the scan every time you got a hit. Check_logfiles lets you call scripts either after every hit or at the beginning or the end of it's runtime.
  • Exceptions - If a pattern matches, the matched line could be a very special case which should not be counted as an error. You can define exception patterns which are more specific versions of your critical/warning patterns. Such a match would then cancel an alert.
  • Thresholds - You can define the number of matching lines which are necessary to activate an alert.
  • Protocol - The matching lines can be written to a protocol file the name of which will be included in the plugin's output.
  • Macros - Pattern definitions and logfile names may contain macros, which are resolved at runtime.
  • Performance data - The number of lines scanned and the number of warnings/criticals is output.
  • Windows - The plugin works with Unix as well as with Windows (e.g. with ActiveState Perl).

Introduction

Usually you call the plugin with the -f option which gets the name of a configuration file:


nagios:~> check_logfiles -f <configfile>
OK - no errors or warnings

In it's most simple form check_logfiles can get all the essential parameters as command line options. However, not all features can be utilized in this case.


nagios:~> check_logfiles --tag=ssh --logfile=/var/adm/messages → ...
    → --rotation=SOLARIS → ...
    → --criticalpattern="Failed password for root" 
OK - no errors or warnings |ssh=1722;0;0;0
nagios:~> check_logfiles --tag=ssh --logfile=/var/adm/messages → ...
    → --rotation=SOLARIS → ...
    → --criticalpattern="Failed password for root" 
CRITICAL - (1 errors in check_logfiles.protocol-2007-04-25-20-59-20) →
    → - Apr 25 20:59:15 siapp8 sshd[10849]: → ...
    → [ID 800047 auth.info] Failed password for root → ...
    → from 172.16.224.11 port 24206 ssh2 |ssh=2831;0;1;0

In principle check_logfiles scans a log file until the end-of-file is reached. The offset will then be saved in a so-called seekfile. The next time check_logfiles runs, this offset will be used as the starting position inside the log file. In the event that a rotation has occurred in the meantime, the rest of the rotated archive will be scanned also.


Documentation

For the most simple applications it is sufficient to call check_logfile with command line parameters. More complex scan jobs can be described with a config file.

Command line options
  • --tag=<identifier> A short unique descriptor for this search. It will appear in the output of the plugin and is used to separare the different services.
  • --logfile=<filenname> This is the name of the log file you want to scan.
  • --rotation=<method> This is the method how log files are rotated.
  • --criticalpattern=<regexp> A regular expression which will trigger a critical error.
  • --warningpattern=<regexp> The same...a match results in a warning.
  • --criticalexception=<regexp> / --warningexception=<regexp> Exceptions which are not counted as errors.
  • --okpattern=<regexp> A pattern which resets the error counters.
  • --noprotocol Normally all the matched lines are written into a protocol file with this file's name appearing in the plugin's output. This option switches this off.
  • --syslogserver With this option you limit the pattern matching to lines originating from the host check_logfiles is running on.
  • --syslogserver With this option you limit the pattern matching to lines originating from the host named in this option.
  • --sticky[=<lifetime>] Errors are propagated through successive runs.
  • -f <configfile> The name of a configuration file. The syntax of this file is described in the next section.
  • -F <configdir> The name of a configuration directory. Configfiles ending in .cfg or .conf are (recursively) imported.
  • --searches=<tag1,tag2,...> A list of tags of those searches which are to be run. Using this parameter, not all searches listed in the config file are run, but only those selected. (--selectedsearches is also possible)

Format of the configuration file

The definitions in this file are written with Perl-syntax. There is a distinction between global variables which influence check_logfiles as a whole and variables which are related to the single searches. A "search" combines where to search, what to search for, which weight a hit has, which action will be triggered in case of a hit, and so on...

$seekfilesdir

A directory where files with status information will be saved after a run of check_logfiles. This status information helps check_logfiles to remember up to which position the log file has been scanned during the last run. This way only newly written lines of log files will be read.

The default is /tmp or the directory which has been specified with the --with-seekfiles-dir of ./configure.

$protocolsdir

A directory where check_logfiles writes protocol files with the matched lines.

The default is /tmp or the directory which has been specified with the --with-protocol-dir of ./configure.

$protocolretention

The lifetime of protocol files in days. After these days the files are deleted automatically

The default is 7 days.

$scriptpath

A list of directories where the triggered scripts can be found.

The default is /bin:/usr/bin:/sbin:/usr/sbin or the directories which has been specified with the --with-trusted-path of ./configure.

$MACROS

A hash with user-defined macro definitions.

see below.

$prescript

An external script which will be executed during the startup of check_logfiles. The macro $CL_TAG gets the value “startup”. $prescriptparams, $prescriptstdin and $prescriptdelay may be used like scriptparams, scriptstdin and scriptdelay.

 

$postscript

An external script which will be executed before the termination of check_logfiles. The macro $CL_TAG$ gets the value "summary". $postscriptparams, $postscriptstdin and $postscriptdelay may be used like scriptparams, scriptstdin and scriptdelay.

 

$options

A list of options which control the influence of pre- and postscript. Allowed options are smartpostscript, supersmartpostscript, smartprescript and supersmartprescript.

 

@searches

An array whose elements (hash references) describe the actual work of check_logfiles. The keys for these hash references can be found in the next table.

 

tag

A unique identifier.

logfile

The name of the log file to scan.

archivedir

The name of the directory where archives will be moved to after a log file rotation. The default is the directory where the logfile resides.

rotation

One of the predefined methods or a regular expression, which helps identify the rotated archives. If this key is missing, check_logfiles assumes that the log file will be simply overwritten instead of rotated.

type

One of "rotating" (default if rotation was given), "simple" (default if no rotation was given), "virtual" (for files which will strictly be scanned from the beginning), "errpt" (if instead of a logfile the output of the AIX errpt command should be scanned), "ipmitool" (if the IPMI System Event Log should be scanned) or "oraclealertlog" (if the alertlog of an Oracle database should be scanned through a database connection).

criticalpatterns

A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, this is considered a critical error. If the expression begins with a "!", then the meaning is reversed. It counts as a critical error if no match for this pattern is found.

criticalexceptions

One or more regular expressions which invalidate a preceding match of criticalpatterns.

criticalthreshold

A number which denotes how many lines have to match a pattern until they are considered a critical error.

warningpatterns

Corrensponds to criticalpatterns, except a warning instead of a critical error is created.

warningexceptions

see above

warningthreshold

see above

okpatterns

A regular expression or a reference to an array of such expressions. If one of these expressions matches a line in the logfile, all previous found warnings and criticals are discarded.

script

If a pattern matches, this script will be executed. It must reside under one of the directories specified in $scriptpath. The script gets plenty of information about the hit via environment variables.

scriptparams

Yo can provide command line parameters for the script here. They may contain macros. If $script is a code reference, $scriptparams must be a pointer to an array.

scriptstdin

If the script expects input through stdin, you can describe it here. The string may also contain macros.

scriptdelay

After the script has finished, check_logfiles may sleep for <delay> seconds before continuing it's work.

options

This is a string with a comma-separated list of options which let you fine-tune the search. Each option can be switched off be preceeding it's name with "no". The options in detail are explained in the next table:

template

Instead of a tag , a search can also be identified by a template name. If you call check_logfiles with the --tag option, the according search will be run as if it was defined with a tagname. See examples.

[no]script

Controls wether a script can be executed.

default: off

[no]smartscript

Controls wether exitcode and output of the script shall be treated like an additional match.

default: off

[no]supersmartscript

Controls wether exitcode and output of the script should replace the triggering match.

default: off

[no]protocol

Controls wether the matching lines are written to a protocol file for later investigation.

default: on

[no]count

Controls wether hits are counted and decide over the final exit code. If not you can use check_logfiles also just to execute the triggered scripts.

default: on

[no]syslogserver

If set, only lines originating from the local host are taken into account. This is important if check_logfiles runs on a syslog server where many other hosts report their events to.

default: off

[no]syslogclient=string

A prefilter. Only lines matching the string are further examined.

 

[no]perfdata

Controls wether performance data should be added to the output.

default: on

[no]logfilenocry

Controls how to react, if the log file does not exist. By default this is a reason for an UNKNOWN error. If nologfilenocry is set, the missing log file will be acquiesced.

default: on

[no]case

Controls wether regular expressions are case-sensitive

default: on

[no]sticky[=seconds]

Controls wether an error is propagated through successive runs of check_logfiles. Once an error was found, the exitcode will be non-zero until an okpattern resets it or until the error expires after <second> seconds. Do not use this option until you know exactly what you do.

default: off

[no]savethresholdcount

Controls wether the hit counter will be saved between the runs. If yes, hit numbers are added until a threshold is reached (criticalthreshold). Otherwise the run begins with resetted counters.

default: on

[no]encoding=string

The logfile is encoded in Unicode. (e.g. ucs-2)

default: off


Predefined macros

$CL_USERNAME

The name of the user executing check_logfiles

$CL_HOSTNAME$

The hostname without domain

$CL_DOMAIN$

The DNS-domain

$CL_FQDN$

Both together

$CL_IPADDRESS$

The IP-adress

$CL_DATE_YYYY$

The current year

$CL_DATE_MM$

The current month (1..12)

$CL_DATE_DD$

The day of the month

$CL_DATE_HH$

The current hour (0..23)

$CL_DATE_MI$

The current minute

$CL_DATE_SS$

The current second

$CL_DATE_CW$

The current calendar week (ISO 8601:1988)

$CL_SERVICEDESC$

The name of the config file without extension.

$CL_NSCA_SERVICEDESC$

the same

$CL_NSCA_HOST_ADDRESS$

The local address 127.0.0.1

$CL_NSCA_PORT$

5667

$CL_NSCA_TO_SEC$

10

$CL_NSCA_CONFIG_FILE$

send_nsca.cfg

 

The following macros change their value during the runtime.

$CL_TAG$

The tag of the current search

$CL_TEMPLATE$

The name of the template used (if any).

$CL_LOGFILE$

The file to be scanned next

$CL_SERVICEOUTPUT$

The last matched line.

$CL_SERVICESTATEID$

The error level as a number 0..3

$CL_SERVICESTATE$

The error level as a word (OK, WARNING, CRITICAL, UNKNOWN)

$CL_SERVICEPERFDATA$

The Performancedata.

$CL_PROTOCOLFILE$

The file where all matching lines are written.

These macros are also available in scripts called out of check_logfiles. Their values are stored in environment variables, whose names are derived from the macro's names. The preceding CL_ is replaced by CHECK_LOGFILES_. You can also access user defined macros. Their names are also prefixed with CHECK_LOGFILES_.



nagios:~> cat check_logfiles.cfg
$scriptpath = '/usr/bin/my_application/bin:/usr/local/nagios/contrib';
$MACROS = {
    MY_FUNNY_MACRO => 'hihihihohoho',
    MY_VOLUME => 'loud'
};

@searches = (
  {
    tag => 'fun',
    logfile => '/var/adm/messages',
    criticalpatterns => 'a funny pattern',
    script => 'laugh.sh',
    scriptparams => '$MY_VOLUME$',
    options => 'noprotocol,script,perfdata'
  },
);



nagios:~> cat /usr/bin/my_application/bin/laugh.sh
#! /bin/sh
if [ -n "$1" ]; then
  VOLUME=$1
fi
printf "It is %d:%d and my status is %s\n" \
  $CHECK_LOGFILES_DATE_HH \
  $CHECK_LOGFILES_DATE_MI \
  $CHECK_LOGFILES_SERVICESTATE

printf "I found something funny: %s\n" "$CHECK_LOGFILES_SERVICEOUTPUT"
if [ "X$VOLUME" == "Xloud" ]; then
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO" | tr 'a-z' 'A-Z'
else
  echo "$CHECK_LOGFILES_MY_FUNNY_MACRO"
fi  
printf "Thank you, %s. You made me laugh.\n" "$CHECK_LOGFILES_USERNAME"

Performance data

The number of scanned lines as well as the number of pattern matches (critical, warning and unknown) are appended to the plugin's output in performance data format. You can suppress this by using the noperfdata option.


nagios:~> check_logfiles --logfile=/var/adm/messages → ...
    → --criticalpattern="Failed password" --tag=ssh
CRITICAL - (4 errors) - May  9 11:33:12 localhost sshd[29742] → ...
    → Failed password for invalid user8 ... |ssh_lines27 → ...
    → ssh_warnings=0 ssh_criticals=4 ssh_unknowns=0

nagios:~> check_logfiles --logfile=/var/adm/messages → ...
    → --criticalpattern="Failed password" --tag=ssh --noperfdata
CRITICAL - (2 errors) - May  9 11:58:48 localhost sshd[29813] → ...
    → Failed password for invalid user8 ... 

Scripts

It is possible to execute external scripts out of check_logfiles. This can be at the startup phase ($prescript), before termination ($postscript) or every time a pattern matches a line. See example above.

With the option "smartscript" output and exitcode of the script are treated like a match in the logfile and reflected in the overall result. The option "supersmartscript" makes output and exitcode of the script replace those of the triggering match.

Pre- and Postscript declared as supersmart scripts directly influence the process of check_logfiles. The option "supersmartprescript" causes an immediate abort of check_logfiles if the prescript has a non-zero exit code. In this case output and exitcode of check_logfiles correspond to those of the prescript. With the option "supersmartpostscript" output and exitcode of check_logfiles can be determined by the postscript. Thus a more meaningful output is possible.


Using check_logfiles with Nagios

If you have just one service which uses check_logfiles you can hard-code the config file in your services.cfg/nrpe.cfg


define service {
  service_description   check_sanlogs
  host_name              oaschgeign.muc
  check_command       check_nrpe!check_logfiles
  is_volatile           1
  check_period          7x24
  max_check_attempts    1
  ...
}

define command {
  command_name          check_nrpe
  command_line          $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

command[check_logfiles]=/opt/nagios/libexec/check_logfiles → ...
    → -f logdefs.cfg

If multiple services are based on check_logfiles you need multiple config files. I propose to name them after the service_description. In the following example we would have a directory cfg.d with config files solaris_check_sanlogs and solaris_check_apachelogs.


define service {
  service_description  logfilescan
  register             0
  is_volatile          1
  check_period         7x24
  max_check_attempts   1
  ...
}

define service {
  service_description  solaris_check_sanlogs
  host_name            oaschgeign.muc
  check_command     → ...
      → check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        sanadmin
  use                  logfilescan
}

define service {
  service_description  solaris_check_apachelogs
  host_name            oaschgeign.muc
  check_command     → ...
      → check_nrpe_arg!20!check_logfiles!cfg.d/$SERVICEDESC$
  contact_group        webadmin
  use                  logfilescan
}

define command {
  command_name         check_nrpe_arg
  command_line         $USER1$/check_nrpe → ...
      → -H $HOSTADDRESS$ -t $ARG1$ -c $ARG2$ -a $ARG3$
}

# nrpe.cfg von Host 
[check_logfiles]=/opt/nagios/libexec/check_logfiles -f $ARG1$

The corresponding line in the host's nrpe.cfg looks like that:


[check_logfiles]=/opt/nagios/libexec/check_logfiles -f $ARG1$

If you use nsclient++ under Windows, the entry in the NSC.ini looks like that:


check_logfiles=C:\Perl\bin\perl C:\libexec\check_logfiles -f $ARG1$

Installation

  • After unpacking the tar-archive you have to call ./configure. With ./configure --help you can show the options if you want to modify the default settings. However, these settings can later be overridden again by variables in the config file.
  • Linux systems are more restrictive regarding the permission of log files. The /var/log/messages file is not readable for non-root users. If you run check_logfiles as an unprivileged user, follow the link below and look for a trick in the examples.
  • --prefix=BASEDIRECTORY Specify here the directory where you want to install check_logfiles. (default: /usr/local/nagios)
  • --with-nagios-user=SOMEUSER The user which will own the check_logfiles script. (default: nagios)
  • --with-nagios-group=SOMEGROUP The group (default: nagios)
  • --with-perl=PATH_TO_PERL The path to your perl binary. (default: The perl in the current PATH)
  • --with-gzip=PATH_TO_GZIP The path to your gzip binary. (default: The gzip in the current PATH)
  • --with-trusted-path=PATH_YOU_TRUST The path where you expect your triggered scripts. (default: /sbin:/usr/sbin:/bin:/usr/bin)
  • --with-seekfiles-dir=SEEKFILES_DIR The directory where status files will be kept. (default: /tmp)
  • --with-protocols-dir=PROTOCOLS_DIR The directory where protocol files will be written to. (default: /tmp)
  • Under Windows you build the plugin with perl winconfig.pl

Scanning of an Oracle-Alertlog with the operating mode "oraclealertlog"

If you want to scan the alert log of an oracle database without having access to the database server on the operating system level (e.g. it is a Windows server or you are not allowed to log in to a Unix server for security reasons) and therefore no access to the alert file, then this file can be mapped to a database table. The contents of the file are then visible through a database connection by executing SQL SELECT statements. If you specify the type "oraclealertlog" in a check_logfiles configuration, this method is used to scan the alert log. You need some extra parameters in the configuration.


# extra parameters in the configuration file
@searches = ({
  tag => "oratest",
  type => "oraclealertlog",
  oraclealertlog => {
    connect => "db0815",       # connect identifier
    username => "nagios",      # database user
    password => "hirnbrand",   # database password
  },
  criticalpatterns => [
...

Preparations on the part of the database administrator

Maping external files to database tables is possible since Version 9. Use this script to prepare your database.


Preparations on the part of the Nagios administrator

Installation of the Perl-Modules DBI and DBD::Oracle (http://search.cpan.org/~pythian/DBD-Oracle-1.21/Oracle.pm).


Examples

Here you can find example configurations for several scenarios.


Download

check_logfiles-2.4.tar.gz


External Links


Changelog

  • 2008-05-07 2.4 Support for Oracle Alertlogs through a database connection.
  • 2008-05-06 2.3.3 Option -F which is used to search multiple configfiles in a directory.
  • 2008-02-26 2.3.2.1 Bugfix to support Perl 5.10. More encoding tinkering.
  • 2008-02-12 2.3.2 Support for IPMI System Event Log, Errpt Bugfix, ucs-2 encoded files for Windows.
  • 2007-12-27 2.3.1.2 Can now handle very large files, $CL_PROTOCOLFILE$, $CL_SERVICEPERFDATA$, more commandline options.
  • 2007-11-16 2.3.1.1 Bugfix in sticky code. Thanks Marc Richter. New option savethresholdcount. Thanks Hannu Kivimäki.
  • 2007-10-16 2.3.1 Templates, bzip2 archives, scriptparam bugfix, threshold counters are inherited.
  • 2007-09-10 2.3 Bugfixes. Type errpt. Okpatterns. Options sticky and syslogclient. New format for performance data.
  • 2007-06-08 2.2.4.1 Bugfix (--searches)
  • 2007-06-06 2.2.4 Support for "virtual" files like Linux /proc/*
  • 2007-06-05 2.2.3 Bugfixes
  • 2007-06-02 2.2.2 Support for supersmart scripts with empty output.
  • 2007-06-01 2.2.1 Smart scripts. Scripts can be embedded perl code.
  • 2007-05-21 2.1.1 Bugfixes
  • 2007-05-21 2.1 Native Windows now supported. New option --selectedsearches. New rotation method mod_log_rotate.
  • 2007-05-10 2.0 Complete Redesign. Official handling of non-rotating logfiles. Performancedata.

Copyright

2007 Gerhard Laußer

Check_logfiles is released under the GNU General Public License. GPL


Author

Gerhard Laußer (gerhard.lausser@consol.de) will gladly answer your questions.