#!/usr/bin/perl -w

#
# cdispd	Configuration Dispatch Daemon
#
# Copyright (c) 2003 EU DataGrid.
# For license conditions see http://www.eu-datagrid.org/license.html
#
# #
# Current developer(s):
#   Luis Fernando Muñoz Mejías <Luis.Munoz@UGent.be>
#   Michel Jouvin <jouvin@lal.in2p3.fr>
#   Nick Williams <nick.williams@morganstanley.com>
#

# #
# Author(s): Germán Cancio Meliá, Marco Emilio Poleggi
#


=pod

=head1 NAME

cdispd - Configuration Dispatch Daemon

=head1 SYNOPSIS

cdispd [--interval secs] [--ncd-retries n] [--ncd-timeout n]
[--ncd-useprofile profile_id] [--noaction] [--cfgfile <file>]
[--logfile <file>] [--help] [--version] [--verbose]
[--debug <level>] [--quiet | -D] [--pidfile <file>]

=head1 DESCRIPTION

The Configuration Dispatch Daemon waits for new incoming configuration
profiles by monitoring the CCM cache. In case of changes with respect to
the previous profile, cdispd will invoke the affected components via
the 'ncd' program.

A component's configuration declares which subtrees of the node
configuration profile it is interested in (see below). If the
configuration information in or below anyone of these subtrees
changes/appears/disappears, then cdispd will add the component to the
list of components to be invoked (except when the component itself is
removed). The check for changed configurations is done using the
element checksum provided by the CCM.

There is a /software/components/<componentX>/register_change subtree,
where one can set a list of all configuration paths on which the
ncm-cdispd should trigger the execution of <componentX>.

By default a component gets registered in its configuration, and in
its software package.

The dispatch of a component can be skipped by setting the 'dispatch' 
flag to false

 "/software/components/spma/dispatch" = false;

If /software/components is not defined, cdispd logs a error message
and continues to run.

In order to ensure consistency, the following rules apply:

=over 4

=item *

If NCM execution (ncm-ncd) fails, cdispd will not update the reference
configuration. This means that subsequent profile changes will
be compared to the profile as it was when NCM failed. In such a way,
failed components are not "forgotten" about (unless they are
deactivated in the new profile).

=item *

In the particular case that the last NCM run reported errors 
AND no NCM relevant changes between the reference configuration
and a new profile are detected, ncm-ncd is invoked for all active
components.

=back

On startup, ncm-cdispd will make an initial run of all
components where the 'dispatch' flag is set to 'true'.

=head1 EXAMPLE

Take as example the NCM configuration component for the SPMA:

 #
 # run SPMA comp if anything below /software/packages has changed
 # (eg. new packages)
 #
 "/software/components/spma/register_change/1" = "/software/packages";


=head1 OPTIONS

=over 4

=item --state path

Path for a directory in which to record state.  If no state path is
specified (the default), then no state is maintained.

If a state directory is provided, then ncm-cdispd will touch
(i.e. create and/or update modified time) files corresponding to
component names within that directory. These files will be updated
whenever it is determined that a component needs configuration. Note
that the 'dispatch' flag is not referred to when creating these state
files, only the 'active' flag is observed. If a component becomes
inactive, then any state file for that component will be removed. See
the corresponding "state" option within ncm-ncd.

=item --interval secs

Check for new profiles every 'secs' seconds.

=item --cache_root path

Path for the cache root directory. If no path is specified, the
default cache root directory (provided by CCM) will be used.

=item --ncd-retries n

This option will be passed to B<ncd>: try 'n' times if locked
(a value of 0 means infinite).

=item --ncd-timeout n

This option will be passed to B<ncd>: wait 'n' seconds between
retries.

=item --ncd-useprofile profile_id

This option will be passed to B<ncd>: use 'profile_id' as
configuration profile ID (default: latest)

=item --noautoregcomp

Do not automatically register the path of the component.

=item --noautoregpkg

Do not automatically register the path of component package.

=item --noaction

Compute and show the operations, but do not execute them.

=item --cfgfile <file>

Use <file> for storing default options.

=item --logfile <file>

Store and append B<cdispd> logs in <file>. The default is
/var/log/ncm-cdispd.log

=item --D

Becomes a daemon and suppress application output to standard output.

=back

=head2 Other Options

=over

=item --help

Displays a help message with all options and default settings.

=item --version

Displays cdispd version information.

=item --verbose

Print verbose details on operations.

=item --debug <1..5>

Set the debugging level to <1..5>.

=item --facility <f>

Set the syslog facility to <f> (Eg. local1).

=item --quiet

Becomes a daemon, and suppress application output to standard output
(this option is equivalent to --D option).

=back

=head1 CONFIGURATION FILE

A configuration file can keep site-wide configuration settings. The
location of the configuration file is defined in the --cfgfile
option. A default configuration file is found in /etc/ncm-cdispd.conf.

=head1 SIGNAL HANDLING

ncm-cdispd handles four signals: B<INT>, B<TERM>, B<QUIT> and B<HUP>.
All signals except B<HUP> cause ncm-cdispd to exit. B<HUP> causes
ncm-cdispd to reinitialize (configuration reset to values in the configuration
file and command line options ignored, reference profile set to last
profile available).

B<HUP> and B<TERM> signals are not processed immediately if they occur when `ncm-ncd` 
is run but they are remembered and the relevant action is done at the end of
ncm-ncd run. At any other moment, they are processed immediately.

The delayed processing of `TERM` may lead to a suprising situation if you do a  
restart (start + stop) when ncm-ncd is running: the new ncm-cdispd process is started
before the current one is really stopped and you end up with 2 (or even more 
depending on the number of restart you did!) ncm-cdispd process. In fact this is harmless because
ncm-ncd creates a lock that forbid new processes to actually do anything until the original 
one has exited, at the end of `ncm-ncd`.

=head1 MORE INFORMATION

NCM specification document

https://edms.cern.ch/document/372643

=head1 AUTHOR

Rafael A. Garcia Leiva <angel.leiva@uam.es>

Bruno Merin <bruno.merin@uam.es>

Universidad Autonoma de Madrid

=head1 VERSION

$Id: ncm-cdispd.cin,v 1.16 2008/05/29 13:41:33 vero Exp $

=cut

#
# Standard Common Application Framework beginning sequence
#

#
# Beginning sequence for EDG initialization
#
BEGIN {

    # use perl libs in /usr/lib/perl
    unshift( @INC, '/usr/lib/perl' );
    unshift( @INC, '/opt/edg/lib/perl' );

}


#############################################################
# cdispd program and its functions
#
# Note about debug level used:
#    - level 1: main loop and initialization
#    - level 2: compare_profiles() and add_component()
#    - level 3: utility functions used to compare profiles
#               (very verbose!)
#############################################################

package main;

use strict;
use POSIX qw(setsid);
use Readonly;

use LC::Exception qw ( throw_error);
use EDG::WP4::CCM::CacheManager;
use CDISPD::Application;
use CDISPD::Utils qw(COMP_CONFIG_PATH compare_profiles add_component clean_ICList);
use CAF::FileEditor;

use constant CONFIG_ROOT => "/";
use constant NCD_EXECUTABLE => "/usr/sbin/ncm-ncd";

our $this_app;
our %SIG;


################################################
# Functions used by main function (main loop)
################################################


#
# Configure signal handling for delayed processing of some signals
#
# Arguments: none
#
# Return value: none
#

sub delay_signals {

    $SIG{'INT'}  = \&signal_terminate;
    $SIG{'TERM'} = \&register_signal;
    $SIG{'QUIT'} = \&signal_terminate;
    $SIG{'HUP'}  = \&register_signal;

}

#
# Configure signal handling for immediate processing of all signals
#
# Arguments: none
#
# Return value: none
#

sub immediate_signals {

    $SIG{'INT'}  = \&signal_terminate;
    $SIG{'TERM'} = \&signal_terminate;
    $SIG{'QUIT'} = \&signal_terminate;
    $SIG{'HUP'}  = \&signal_reinitialize;

}

#
# Keep track of the signal received but do not process it immediately.
# This is used to postpone signal processing during the exeuction of
# ncm-ncd.
#
# Arguments
#    - Signal to register
#
# Return value: none
#

sub register_signal {

    my $signal = shift;
    unless ( $signal ) {
        $this_app->error("register_signal(): missing argument");
        return;
    }

    $this_app->{WAITING_SIGNAL} = $signal;
    $this_app->info("Signal $signal registered for processing after ncm-ncd completion");

}

#
# Process a delayed signal.
# This function does nothing if called when there is no delayed signaled.
#
# Arguments: none
#
# Return value: none
#

sub process_signal {

    unless ( $this_app->{WAITING_SIGNAL} ) {
        $this_app->debug(1,"process_signal(): no delayed signal to process");
        return;
    }

    $this_app->info("Processing delayed signal ".$this_app->{WAITING_SIGNAL});

    # Reset the delayed signal first
    my $signal = $this_app->{WAITING_SIGNAL};
    $this_app->{WAITING_SIGNAL} = undef;

    if ( $signal eq 'HUP' ) {
        signal_reinitialize($signal);
    } elsif ( $signal =~ /INT|TERM|QUIT/ ) {
        signal_terminate($signal);
    } else {
        $this_app->error("Unsupported delayed signal $signal");
    };

}

#
# Proccess signals INT, TERM and QUIT: terminate cdispd daemon
#
sub signal_terminate {

    my $signal = shift;
    unless ( $signal ) {
        $this_app->error("signal_terminate(): missing argument");
        return;
    }

    $this_app->warn("signal handler: signal $signal received");
    $this_app->warn('terminating ncm-cdispd...');

    exit(-1);

}

#
# Proccess HUP signal: reinitialize cdispd daemon
#
sub signal_reinitialize {

    my $signal = shift;
    unless ( $signal ) {
        $this_app->error("signal_reinitialize(): missing argument");
        return;
    }

    $this_app->warn("signal handler: received signal: $signal");
    $this_app->warn("reinitializing daemon...");

    # re-read config file
    $this_app->{CONFIG}->file($this_app->option('cfgfile'));
    $this_app->warn("... daemon options reset to those in ".$this_app->option('cfgfile'));

    my $cred = 0;
    my $cm   = EDG::WP4::CCM::CacheManager->new( $this_app->option('cache_root') );
    $this_app->{OLD_CFG}   = $cm->getLockedConfiguration($cred);
    $this_app->{OLD_CKSUM} = $this_app->{OLD_CFG}->getElement(CONFIG_ROOT)->getChecksum();
    $this_app->{OLD_CFID}  = $this_app->{OLD_CFG}->getConfigurationId();
    $this_app->warn("... previous profile set to last one");

    return;

}

#
# Exception handling
# Use of LC::Exception with CAF
#
sub exception_handler {

    my ( $ec, $e ) = @_;
    unless ( $ec && $e ) {
        $this_app->error("exception_handler(): missing argument");
        return;
    }

    $this_app->error("fatal exception:");
    $this_app->error( $e->text );
    if ( $this_app->option('debug') ) {
        $e->throw;
    } else {
        $e->has_been_reported(1);
    }
    $this_app->error("exiting ncm-cdispd...");
    exit(-1);

}

#
# daemonize()
#
# Become a daemon. Perform a couple of things to avoid
# potential problems when running as a daemon.
#
sub daemonize {

    my $logfile = $this_app->option('logfile');

    if ( !chdir('/') ) {
        $this_app->error("Can't chdir to /: $! : Exiting");
        exit(-1);
    }

    if ( !open( STDIN, '/dev/null' ) ) {
        $this_app->error("Can't read /dev/null: $! : Exiting");
        exit(-1);
    }

    if ( !open( STDOUT, ">> $logfile" ) ) {
        $this_app->error("Can't write to $logfile: $! : Exiting");
        exit(-1);
    }

    if ( !open( STDERR, ">> $logfile" ) ) {
        $this_app->error("Can't write to $logfile: $! : Exiting");
        exit(-1);
    }

    my $pid = fork();
    if ( !defined($pid) ) {
        $this_app->error("Can't fork: $! : Exiting");
        exit(-1);
    }
    exit if $pid;

    # Save the PID.
    if ( $this_app->option('pidfile') ) {
        if ( !open( PIDFILE, ">" . $this_app->option('pidfile') ) ) {
            $this_app->error( "Cannot write PID to file \""
                  . $this_app->option('pidfile')
                  . "\": $! : Exiting" );
            exit(-1);
        }
        print( PIDFILE "$$" );
    }

    if ( $this_app->option('state') ) {
        my $dir = $this_app->option('state');
        if ( !-d $dir ) {
            mkdir( $dir, 0755 )
              or $this_app->warn("Cannot create state dir $dir: $!");
        }
    }

    if ( !setsid() ) {
        $this_app->error("Can't start a new session: $! : Exitting");
        exit(-1);
    }

    return;

}

#
# init_components()
#
# Perform an initial call of all components with dispatch property = true
#
# Arguments: none
#
# Return value: 0 if success, else a non-zero value.
#
sub init_components {

    # credentials are undefined
    my $cred = 0;

    my $cm = EDG::WP4::CCM::CacheManager->new($this_app->option('cache_root'));
    $this_app->{OLD_CFG} = $cm->getLockedConfiguration($cred);

    #
    # Call the list of components
    #

    my $status = 0;   # Assume success
    if ( $this_app->{OLD_CFG}->elementExists(COMP_CONFIG_PATH) ) {
        my $comp_config = $this_app->{OLD_CFG}->getElement(COMP_CONFIG_PATH)->getTree();

        foreach my $component (keys(%$comp_config)) {
            $this_app->debug(2, "Adding component $component if it is eligible to run");
            add_component($comp_config,$component);
        }
        $status = launch_ncd();
    } else {
        $status = 1;
        $this_app->status("Path ".COMP_CONFIG_PATH." is not defined: no components to configure");
    }

    # current configuration profile is this one
    $this_app->{OLD_CKSUM} = $this_app->{OLD_CFG}->getElement(CONFIG_ROOT)->getChecksum();
    $this_app->{OLD_CFID}  = $this_app->{OLD_CFG}->getConfigurationId();

    return $status;

}


#
# launch_ncd()
#
# Launch the 'ncd' program, with the ncd arguments passed
# to cdispd, and with the contents of @ICList as the component list.
# Not that processing of some signals is delayed during ncm-ncd run.
#
# Return value: ncm-ncd exit status if run else 0 (success) or 1 (ncm-ncd missing)
#
sub launch_ncd {

    my $result = 0;    # Assume success

    my $p = CAF::Process->new([NCD_EXECUTABLE, '--configure'] , log => $this_app );
    if (   defined( $this_app->{ICLIST} ) && scalar(@{$this_app->{ICLIST}}) ) {
        # At this point, ICLIST should contain only components present in the last profile received.
        # The only case where a component may be in the list without being part of the configuration 
        # is the following:
        #   - Profile n is deployed succesfully (ncm-ncd returns a success)
        #   - Profile n+1 add a new component X that fails (reference config to compare next profile with remains n)
        #   - Profile n+2 remove component X but the profile comparison occurs between n and n+2 (because
        #     X failed with profile N+1) and thus X removal is not detected.
        # As a result, X remains on the list of component to run. This should be harmless as ncm-ncd will ignore it.
        # This is probably rare enough to avoid complex processing to handle this in ncm-dispd.
        $p->pushargs(@{$this_app->{ICLIST}});
    } else {
        $this_app->info("no components to be run by NCM - ncm-ncd won't be called");
        return (0);
    }

    # ncd options
    if ( $this_app->option('state') ) {
        $p->pushargs("--state", $this_app->option('state'));
    }
    if ( $this_app->option('ncd-retries') ) {
        $p->pushargs("--retries", $this_app->option('ncd-retries'));
    }
    if ( $this_app->option('ncd-timeout') ) {
        $p->pushargs("--timeout", $this_app->option('ncd-timeout'));
    }
    if ( $this_app->option('ncd-useprofile') ) {
        $p->pushargs("--useprofile", $this_app->option('ncd-useprofile'));
    }

    if ( $this_app->option('noaction') ) {
        $this_app->info( "would run (noaction mode): $p");
    } else {
        $this_app->info( "about to run: $p");
        if ( $p->is_executable() ) {
            # Delay processing of some signals
            delay_signals();

            # Execute ncm-ncd and report exit status
            sub act {
                my ($logger, $message) = @_;
                $logger->verbose($message);
            }
            my $errormsg = $p->stream_output(\&act, mode => 'line', arguments => [$this_app]);
            
            my $ec = $?;
            my $msg = "ncm-ncd finished with status: ". ($ec >> 8) . " (ec $ec";
            my $log_level = 'info';
            if ( $ec ) {
                log_failed_components();
                $log_level = 'warn';
                $msg .= ", some configuration modules failed to run successfully)";
                $result = 1;
            } else {
                $msg .= ", all configuration modules ran successfully)";
            }
            $this_app->$log_level($msg);

            # Process delayed signal if any and reestablish
            # immediate processing of signals
            process_signal();
            immediate_signals();

        } else {
            $this_app->error("Command ". ${$p->get_command()}[0] . " not found or not executable");
            $result = 1;
        }
    }
    return $result;
}


# log_failed_components()
#
# Scan the ncm-ncd component state directory and for each failed component
# (component with an entry in the directory), print a line with the name of the
# component and the failure reason.
#
# Arguments: none
#
# Return value: none
#
sub log_failed_components {

    unless ( $this_app->option('state') ) {
        $this_app->debug(1,"No component state file defined: cannot list failed components");
        return;
    }

    my $comp_state_dir = $this_app->option('state');
    if ( opendir(my $dh, $comp_state_dir) ) {
        my @comps = grep { -f "$comp_state_dir/$_" } readdir($dh);
        $this_app->warn("No failed component found in the component state directory ($comp_state_dir)") if ( @comps == 0 );
        foreach my $component (sort(@comps)) {
            my $fh = CAF::FileEditor->new("$comp_state_dir/$component");
            my $comp_msg = "$fh";
            chomp $comp_msg;
            $comp_msg = "(no message)" unless $comp_msg;
            $this_app->warn("Component $component failed with message: $comp_msg");
            $fh->close();
        }
        closedir $dh;
    } else {
        $this_app->error("Failed to open component state directory ($comp_state_dir)");
    }

}


#------------------------------------------------------------
# main loop
#------------------------------------------------------------

#
# cdispd main()
#

# Minimal path
$ENV{"PATH"} = "/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin";

umask(022);

#
# initialize the CAF::Application
#
unless ( $this_app = CDISPD::Application->new( $0, @ARGV ) ) {
    throw_error("cannot start application");
}

#
# Set Execption handler
#
( LC::Exception::Context->new )->error_handler( \&exception_handler );

# become a daemon if --D
if ( $this_app->option('quiet') ) {
    $this_app->debug( 1, "quiet option enabled, become a daemon" );
    daemonize();    # become a daemon
} else {
    $this_app->debug( 1, "no quiet option, do not become a daemon" );
}

$this_app->debug( 1, "initializing program" );

# Configure signal handling
immediate_signals();

# list of components to be invoked
$this_app->{ICLIST} = ();

# credentials are undefined
my $cred = 0;

my $cm   = EDG::WP4::CCM::CacheManager->new( $this_app->option('cache_root') );

# perform an initial call of all components
$this_app->info( 'ncm-cdispd version '
      . $this_app->version()
      . ' started by '
      . $this_app->username() . ' at: '
      . scalar(localtime)
      . ' pid: '
      . $$ );
$this_app->info('Dry run, no changes will be performed (--noaction flag set)')
  if ( $this_app->option('noaction') );

$this_app->info("initalization of components");
# $last_ncd_status keeps track of the previous execution of ncm-ncd. It is a
# usual exit code, with 0=success.
my $last_ncd_status = init_components();
$this_app->debug( 1, "Initializing \$last_ncd_status to init_components status ($last_ncd_status)");
my $ref_cid = $this_app->{OLD_CFID};
$this_app->debug( 1, "CID of reference configuration set to $ref_cid" );

# wait for a new configuration profile

while (1) {

    # Wait for a new profile
    $this_app->debug( 1, "checking for new profiles ..." );
    $this_app->debug( 3, "CID of last profile processed: " . $this_app->{OLD_CFID} );

    while ( $cm->getCurrentCid() == $this_app->{OLD_CFID} ) {
        $this_app->verbose('no new profile found, sleeping');
        $this_app->debug( 1, "sleep for " . $this_app->option('interval') . " seconds" );
        sleep( $this_app->option('interval') );
    }
    $this_app->info("new profile arrived, examining...");

    $this_app->{NEW_CFG}   = $cm->getLockedConfiguration($cred);
    $this_app->{NEW_CKSUM} = $this_app->{NEW_CFG}->getElement(CONFIG_ROOT)->getChecksum();
    $this_app->{NEW_CFID}  = $this_app->{NEW_CFG}->getConfigurationId();
    $this_app->debug( 3, "new profile cid=" . $this_app->{NEW_CFID} );

    # check if the profile is different
    my $ncd_status = 0;
    $this_app->debug( 3, "old profile checksum: " . $this_app->{OLD_CKSUM} );
    $this_app->debug( 3, "new profile checksum: " . $this_app->{NEW_CKSUM} );
    if ( $this_app->{OLD_CFID} ne $this_app->{NEW_CFID} ) {
        $this_app->verbose( "new profile detected: cid=" . $this_app->{NEW_CFID} );
        if (   ( $this_app->{OLD_CKSUM} ne $this_app->{NEW_CKSUM} )
            || ( $last_ncd_status != 0 ) )
        {
            if ( $this_app->{OLD_CKSUM} ne $this_app->{NEW_CKSUM} ) {
                $this_app->info("new (and changed) profile detected");
                # Clear the list of component to run only if last execution of
                # ncm-ncd was successful
                clean_ICList() if $last_ncd_status == 0;
            } else {
                $this_app->info( "new profile identical but re-running ncm-ncd since last execution reported errors");
            }

            # Not really needed when re-running after a previous run error 
            # (as ICList is not cleared) but harmless
            compare_profiles();

            # Run ncm-ncd and check status.
            # If ncm-ncd returned an error, do not update the reference configuration to be used
            # next time: this will force any component with errors to run again.
            $ncd_status = launch_ncd();
            $this_app->debug( 1, "launch_ncd() exit status = $ncd_status" );
            unless ($ncd_status) {
                $ref_cid = $this_app->{NEW_CFID};
                $this_app->debug( 1, "ncm-ncd executed successfully: update base configuration to CID $ref_cid");
                $this_app->{OLD_CFG}   = $this_app->{NEW_CFG};
                $this_app->{OLD_CKSUM} = $this_app->{NEW_CKSUM};
                $last_ncd_status       = 0;
            } else {
                $this_app->debug( 1, "ncm-ncd reported errors: base configuration kept at CID $ref_cid");
                $last_ncd_status = $ncd_status;
            }

        } else {
            $this_app->info( "new profile has same checksum as old one, no NCM run");
        }
    } else {
        $this_app->verbose("no new profile found");
    }

    $this_app->{OLD_CFID} = $this_app->{NEW_CFID};

}

exit(0);

