archie/release/base/manpages/arcontrol.n
2024-05-28 17:59:32 +02:00

353 lines
9.7 KiB
Plaintext

.\" Copyright (c) 1994, 1996 Bunyip Information Systems Inc.
.\" All rights reserved.
.\"
.\" Archie 3.5
.\" August 1996
.\"
.\" @(#)arcontrol.n
.\"
.TH ARCONTROL N "August 1996"
.SH NAME
.B arcontrol
\- perform automated updating routines on Archie catalogs
.SH SYNOPSIS
.B arcontrol \-u | \-p | \-r
[
.BI \-M \ <dir>
] [
.BI \-h \ <dir>
] [
.BI \-m \ <maxcount>
] [
.B \-U
] [
.B \-n
] [
.BI \-T \ <timeout>
] [
.BI \-Z
] [
.B \-t
.I <dir>
] [
.B \-v
] [
.B \-l
] [
.B \-L
.I <logfile>
]
.SH DESCRIPTION
.LP
The
.B arcontrol
program is normally invoked automatically by the
.BR cron (8)
daemon. The program initiates the processes necessary to acquire, process
and incorporate new data into the various Archie catalogs.
.SH "OPTIONS"
.PP
One of the following options must be supplied:
.RS
.TP
.B \-r
Process data files with the
.B .retr
suffix, deposited in the holding (temporary) directory by the retrieval phase.
.TP
.B \-p
Process data files with the
.B .parse
suffix, created by the data aquisition phase
.TP
.B \-u
Process data files with the
.B .update
suffix, created by the parse phase
.RE
.PP
In addition, the following options are available:
.RS
.TP
.BI \-M " <dir>"
The name of the master Archie database directory. If not given,
the program tries to look in the directory
.B ~archie/db
and, failing that, defaults to
.BR ./db .
.TP
.BI \-h " <dir>"
The name of the Archie host database directory. If not
supplied the program will default first to
.B ~archie/db/host_db
and failing that, to
.BR ./host_db .
.TP
.BI \-t " <dir>"
Sets the name of the directory used for temporary files.
If not given, the program uses
.BR ~archie/db/tmp .
.TP
.BI \-m " <maxcount>"
The maximum number of date files to process in any given invocation. This
is especially useful when there are many date files and a limit of how
many to process simultaneously is desired. There is an internal
default of 30 data files in retrieval mode, which may be raised or
lowered by this option. By default in update or parse mode, as many files
as are available will be processed. The special value 0 may be supplied
as an argument to this option and has the meaning of overriding the internal
default maximum: as many files as are available will be processed.
.TP
.B \-n
Do not modify the compression status of the temporary data files. By
default data stored temporary on disk throught the Update Cycle is stored
in a compressed state. However, this data must be uncompressed before
being used. This option tells the system to perform the least amount of
processing in order to use the data. This option requires that there be
more disk space for the uncompressed data.
.TP
.B \-U
Actively uncompress temporary data. Data that is obtained in compressed
form should be uncompressed before writing temporary files. This may
speed processing at certain stages of the update cycle. This option
requires that there be more disk space for the uncompressed data.
.TP
.BI \-T " <timeout>"
Set the timeout on the retrieval phase of the Update Cycle. If the
retrieval connection has been idle for more than the timeout value the
retrieval is terminated and an error generated.
.I <timeout>
is specified in minutes. This value is passed directly to the data acquisition
process. The default is 10 minutes.
.TP
.B \-Z
If in retrieval mode, then the retrieval process will automatically look
for an indexing file (this is defined in the retrieval program's
configuration file).
.TP
.B \-v
Verbose mode. Will tell you what it is doing.
.TP
.B \-l
Write any user output to the default log file
.B ~archie/logs/archie.log.
If desired, this can be overridden with the
.B \-L
option. Errors will by default be written to
.IR stderr .
.TP
.BI \-L " <logfile>"
The name of the file to be used for logging information.
Note that debugging information is also written to the
log file. This implies the
.B \-l
option, as well.
.RE
.SH "NAMING CONVENTIONS"
The subprocesses spawned by
.B arcontrol
follow a well-defined naming convention:
.IP
.IR "<phase prefix>" _ <dbname> _ <special>
.PP
where
.I <phase prefix>
is one of
.RS
.TP
.B retrieve
For the data aquistion phase of the cycle
.TP
.B parse
For the parse phase
.TP
.B update
For the update phase
.RE
.PP
and
.I <dbname>
is the name of the catalog associated with the data being processed.
.PP
In certain cases, it is nessesary to process data destined for the same
Archie catalog in different ways, depending on their source. For example,
UNIX and VMS anonymous FTP listings are significantly different in form
and are parsed differently. Therefore
.I <special>
could apply to, among other things, operating systems.
.TP
Example:
.RS
.PP
.B parse_anonftp
is responsible for parsing the data for the anonftp
catalog. This program then spawns
.IP
.PD .1v
.B parse_anonftp_unix_bsd
.PP
or
.IP
.PD 1v
.B parse_anonftp_vms_std
.PP
depending on the operating system of the source data host. The
information required to determine which program to use is read from the
header record associated with all data files.
.br
.PP
The current convention for naming data files during the
update cycle is:
.IP
\fI<site name>\fR\(em\fI<dbname>\fR_\fI<cntl num>\fR.\fI<phase suffix>\fR[\fI<tmp suffix>\fR]
.PP
where
.RS
.TP
.I <site name>
is the name of the source host for this data
.TP
.I <dbname>
is the name of the Archie catalog with which this data is
associated
.TP
.I <cntl num>
is a number whose function is to distinguish different
sets of data from the same site and for the same catalog.
Note that this number is arbitrarily determined and may
change after undergoing any given phase of the update
cycle
.TP
.I <phase suffix>
is one of `.retr', `.parse' or `.update' depending on which phase of the cycle
the data is destined for.
.TP
.I <tmp suffix>
is usually `_t'. This is used as a temporary name for data files currently
undergoing processing.
.RE
.PP
Example:
.RS
.PD .1v
.PP
The retrieval phase may generate a file with the name
.IP
.PD .1v
.sp
\fCarchie.mcgill.ca-anonftp_69.parse\fP
.sp
.PP
during the processing. The file may be called
.sp
.IP
.PD 1v
\fCarchie.mcgill.ca-anonftp_23.parse_t\fP
.sp
.PP
.TP
upon completion.
.RE
.SH "DATA PROCESSING"
.PP
Data aquisition, processing and update provide the basis for the Archie
system model and operate under the direction of
.B arcontrol.
.PP
The Archie system temporary directory (by default
.B ~archie/db/tmp
unless overridden by the
.B \-t
option) is first scanned for the data files whose
filename suffixes are appropriate for the mode in which the program was
invoked. The header record for each file is then read to determine the
actions which are to be taken. A pre-process pass is taken over each
data file which may modify it to conform to the correct format for the
next processing phase. For example, a compressed data file may be
uncompressed.
.B arcontrol
is also responsible for coordinating the processing operations so that
for example, no more than one processing program is operating on any
given data file concurrently.
.PP
.B Data Acquisition Phase
.RS
.PP
All retrieval is performed asynchonously. That is, all retrieval
processes are launched without the control process waiting for them
to return immediately. They are monitored after all have been
launched.
.PP
The connection on which the retrieval is taking place is monitored by the
retrieval process responsible. If the connection has been idle for more
than a preset limit, the connection is closed. Since arcontrol is
responsible for running the appropriate retrieval process in normal
operation this idle interval may be set with the
.B \-T
switch, with units in minutes.
.PP
All programs in the retrieval phase generate data files with the `.parse'
suffix.
.RE
.PP
.B Parse Phase
.RS
.PP
Parsing is performed synchonously, each file in turn. This phase generates
data files with the `.update' suffix.
.RE
.PP
.B Update Phase
.RS
.PP
Updating is performed synchronously.
.B arcontrol
waits for the return of the appropriate update process after launching it.
This mechanism aims to prevent the concurrent updating of any of the Archie
catalogs by more than one process.
.RE
.SH "STOPPING PROCESSING"
If for some reason it is necessary for the Archie administrator to terminate
the program before it has completed processing the current batch of files the
file
.B ~archie/etc/process.stop
should be created. After the completion of processing each file, the arcontrol
program checks for the existence of this file. If it exists, processing
terminates and log and mail entries are generated (if they are being
requested). Creation of this will will also prevent further continuation of
update cycles and thus the file should be removed when no longer needed.
.PP
.B Note:
While this functionality is useful, files that would have been processed
before the program has terminated will be left with the `_t' suffix and will
not be picked up by subsequent invocations of the arcontrol program and have
to be removed or renamed (without the `_t' suffix) manually by the
administrator.
.SH BUGS
.LP
Files are preprocessed as a batch operation at the start of the program rather
than one at a time as needed. As a result, if the process terminates before
completing its tasks, files with the `_t' suffix will be left in the temporary
directory and have to be removed manually.
.LP
Sites that change their primary host names between updates
are currently not correctly handled.
.SH FILES
There are no configuration files currently associated with this program.
.LP
The only compression format currently implemented is Lempel-Ziv with
.BR compress (1)
.
.SH "SEE ALSO"
.BR retrieve_* (n),
.BR parse_* (n),
.BR update_* (n),
.SH AUTHOR
Bunyip Information Systems.
.br
Montr\o"\'e"al, Qu\o"\'e"bec, Canada
.sp
Archie is a registered trademark of Bunyip Information Systems Inc., Canada,
1990.