Wikia

Request Tracker Wiki

AutoCloseOnNagiosRecoveryMessages

Comments0
856pages on
this wiki

Problem Description Edit

We use Nagios to check if our machines are up and working. Every time something strange happens (swap use is too high, CPU load is above 10, and so on ) it sends an e-mail with a subject like " * PROBLEM boxxor/CPU load os CRITICAL *". As soon as things back back to normal it sends another message " * RECOVERY boxxor/CPU load os OK *". So, this will create two tickets in RT - two tickets that ougt to be manually merged and closed. To make things easier here I adapted the above script to merge ALL pending open/new PROBLEM messages related to a given RECOVERY message and automatically close/resolve these tickets.

History Edit

  • Mar 2004 - original version from Todd Chapman extracted from an email message
  • Nov 2009 - Sunnavy uploads plugin to the CPAN
  • Mar 2010 - Kamil's simplification of Todd's variant (requires Nagios3)

Solutions Edit

Original Todd's version Edit

Description: Merge Into Existing Ticket on match Condition: OnCreate

Action: User Defined Custom action preparation code:

1;

Custom action cleanup code:

# If the subject of the ticket matches a pattern suggesting
   # that this is a Nagios RECOVERY message  AND there is
   # an existing ticket (open or new) in the "General" queue with a matching
   # "problem description", (that is not this ticket)
   # merge this ticket into that ticket
   #
   # Based on http://marc.free.net.ph/message/20040319.180325.27528377.en.html
   
   my $problem_desc = undef;
   
   my $Transaction = $self->TransactionObj;
   my $subject = $Transaction->Attachments->First->GetHeader('Subject');
   if ($subject =~ /\*\* RECOVERY (\w+) - (.*) OK \*\*/) {
       # This looks like a nagios recovery message
       $problem_desc = $2;
   
       $RT::Logger->debug("Found a recovery msg: $problem_desc");
   } else {
       return 1;
   }
   
   # Ok, now let's merge this ticket with it's PROBLEM msg.
   my $search = RT::Tickets->new($RT::SystemUser);
   $search->LimitQueue(VALUE => 'General');
   $search->LimitStatus(VALUE => 'new', OPERATOR => '=', ENTRYAGGREGATOR => 'or');
   $search->LimitStatus(VALUE => 'open', OPERATOR => '=');
   
   if ($search->Count == 0) { return 1; }
   my $id = undef;
   while (my $ticket = $search->Next) {
       # Ignore the ticket that opened this transation (the recovery one...)
       next if $self->TicketObj->Id == $ticket->Id;
       # Look for nagios PROBLEM warning messages...
       if ( $ticket->Subject =~ /\*\* PROBLEM (\w+) - (.*) (\w+) \*\*/ ) {
           if ($2 eq $problem_desc){
               # Aha! Found the Problem TICKET corresponding to this RECOVERY
               # ticket
               $id = $ticket->Id;
               # Nagios may send more then one PROBLEM message, right?
               $RT::Logger->debug("Merging ticket " . $self->TicketObj->Id . " into $id because of OA number match.");
               $self->TicketObj->MergeInto($id);
               # Keep looking for more PROBLEM tickets...
           }
       }
   }
   
   $id || return 1;
   # Auto-close/resolve this whole thing
   $self->TicketObj->SetStatus( "resolved" );
   1;
   

Extension from Sunnaby Edit

RT-Extension-Nagios

Kamil's version for Nagios3 and newer Edit

by Kamil Srot (kamil.srot at nLogy dot com) 26/03/2010

First of all - sorry for my coding, I don't know Perl at all :-( Feel free to upgrade the script and let me know :-)

I use Nagios3 and it comes with nice macro defined making integration with RT much easier. Here is example of notification, defined in Nagios (commands.cfg):

# 'notify-host-by-rtemail' command definition
   define command{
         command_name    notify-host-by-rtemail
         command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nEventID: $HOSTPROBLEMID$\nLastEventID: $LASTHOSTPROBLEMID$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
         }
   
   # 'notify-service-by-rtemail' command definition
   define command{
         command_name    notify-service-by-rtemail
         command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nEventID: $SERVICEPROBLEMID$\nLastEventID: $LASTSERVICEPROBLEMID$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
         }
   
   
   

Note the $HOSTPROBLEMID$, $LASTHOSTPROBLEMID$, $SERVICEPROBLEMID$ and $LASTSERVICEPROBLEMID$ macros.

The *PROBLEMID is new unique ID for the first time, a problem appears and is constant till final RECOVERY. RECOVERY has everytime *PROBLEMID eqal to 0 and LAST*PROBLEMID is the *PROBLEMID or all previous notifications.

I use code like this, to process incoming emails and close open tickets and merge the corresponding ones:

# ziskej telo mailu
   my $T_Obj = $self->TicketObj;
   my $AttachObj = $self->TransactionObj->Attachments->First;
   my $content = $AttachObj->Content;
   
   # extract EventID and LastEventID
   my $val = 0;
   my $EventID = undef;
   my $LastEventID = undef;
   if( $content =~ m/^\QEventID:\E\s*(\S+)\s*$/m ) {
    $EventID = $1;
   }
   if( $content =~ m/^\QLastEventID:\E\s*(\S+)\s*$/m ) {
    $LastEventID = $1;
   }
   
   if($EventID == 0) {
    $val = $LastEventID;
   } else {
    $val = $EventID;
   }
   
   # Hledej ticket se stejnym EventID
   my $TicketsObj = RT::Tickets->new($RT::SystemUser);
   $TicketsObj->LimitQueue(VALUE => 'Monitoring');
   $TicketsObj->LimitCustomField(CUSTOMFIELD => 'NagiosProblemID', OPERATOR => '=', VALUE => $val);
   
   if ($TicketsObj->Count > 0) {
    # nalezeno!
    my $id = undef;
    my $ticket;
    while ($ticket = $TicketsObj->Next) {
     next if $self->TicketObj->Id == $ticket->Id;
     $id = $ticket->Id;
     last;
    }
    if ( $id ) {
     # ...merge into
     $self->TicketObj->MergeInto($id);
     # kdyz je EventID = 0 zavirame parent ticket
     if($EventID == 0) {
      $self->TicketObj->SetStatus('resolved');
     }
     # ...and exit
     return 1;
    }
   }
   
   # hmm, novej ticket.
   # nechame ho propadnout do fronty
   
   # a nastavit NagiosProblemID
   $self->TicketObj->AddCustomFieldValue( Field => 'NagiosProblemID', Value => $val, RecordTransaction=>0 );
   # pokud je to recovery, tak nastavit na resolved
   if($EventID == 0) {
    $self->TicketObj->SetStatus('resolved');
   }
   

Around Wikia's network

Random Wiki