archie/release/base/etc/manpage.ascii

1108 lines
38 KiB
Plaintext
Raw Permalink Normal View History

2024-05-28 17:59:32 +02:00
ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L)
NAME
archie(tm) - Internet archive server listing service
SYNOPSIS
archie
DESCRIPTION
This manual page describes Version 3 of the archie system.
This Internet information service allows the user to query a
catalog containing a list of files which are available on
hosts connected to the Internet. Software located through
this service can be obtained by means of ftp(1); for hosts
with access to BITNET/NetNorth/EARN, it can be obtained by
electronic mail through the Princeton bitftp (1L) service.
Send mail to
bitftp@pucc.princeton.edu
Other Internet users who are not directly connected may use
the services of various ftp-by-mail servers including
ftpmail@decwrl.dec.com
Some archie systems track archive sites globally, others
only track the archive sites in their country, region or
continent in order to reduce the load on trans-oceanic
links. There are a number of archie hosts serving different
continental user communities. The servers command will list
the most up-to-date information on archie servers worldwide.
archie.au Australia
archie.edvz.uni-linz.ac.at Austria
archie.univie.ac.at Austria
archie.uqam.ca Canada
archie.cs.mcgill.ca Canada
archie.funet.fi Finland
archie.univ-rennes1.fr France
archie.th-darmstadt.de Germany
archie.ac.il Israel
archie.unipi.it Italy
archie.wide.ad.jp Japan
archie.hana.nm.kr Korea
archie.sogang.ac.kr Korea
archie.uninett.no Norway
archie.rediris.es Spain
archie.luth.se Sweden
archie.switch.ch Switzerland
archie.ncu.edu.tw Taiwan
archie.doc.ic.ac.uk United Kingdom
archie.hensa.ac.uk United Kingdom
archie.unl.edu USA (NE)
archie.internic.net USA (NJ)
archie.rutgers.edu USA (NJ)
archie.ans.net USA (NY)
archie.sura.net USA (MD)
archie can be accessed interactively, via electronic mail or
through archie client programs available widely on the
Internet.
Using the Interactive (telnet) Interface
In order to use the interactive system you should use the
following procedure:
1) telnet to the archie system closest to you. Do not use
ftp for this, it will not work.
2) Login as user archie no capitals, no password is
required. The system should print a banner message and
status report before presenting you with the command
prompt. Some newer operating systems will prompt for a
password. Just hit the return key and continue.
3) Type help for complete information on the system.
For full details, refer to the section entitled ARCHIE COM-
MANDS which appears below.
Using the Electronic Mail Interface
In order to use the email interface, send requests to:
archie@<archie_server>
where <archie_server> is one of the hosts listed above, or
one returned by the servers command. Send the word help in
a message to obtain a list of available commands and
features. This is a completely automated interface, acting
without human intervention.
For full details, refer to the section entitled ARCHIE COM-
MANDS which appears below.
Using the archie clients
The source code as well as machine executables for a variety
of archie client programs can be obtained via anonymous
ftp(1) from many of the archie server hosts listed above.
They are usually stored in the archie/clients or
pub/archie/clients directories. These clients communicate
via the Prospero
distributed file system protocol with archie servers, which
perform the specified queries and return the results to the
user. Currently there are Unix and VMS command line, curses
and X window clients as well as Mac and PC Windows versions.
For more information on Prospero send your queries to info-
prospero-request@isi.edu
Communicating with the Database Administrators
Mail to archie administrators at a particular archie server
should be sent to the address
archie-admin@<archie_server>
where <archie_server> is one of the hosts listed above.
To send mail to the implementors of the archie system,
please send mail to
archie-group@bunyip.com
The archie server system is a product of Bunyip Information
Systems.
Requests for additions to the set of hosts surveyed for the
catalog, modifications to the Software Description Catalog,
or other administrative matters, should be sent to:
archie-admin@bunyip.com
ARCHIE COMMANDS
In the archie system version 3 the telnet and email clients
accept a common set of commands. Additionally, there are
specialized commands specfic to the particular interfaces.
See THE INTERACTIVE INTERFACE and THE EMAIL INTERFACE sec-
tions below for a list of these commands.
Note that some archie server sites may disable some of the
commands for reasons particular to their site. As well some
sites limit the number of concurrent interactive (telnet)
sessions to better utilize limited resources.
Commands
Arguments to commands shown in square brackets '[]' are
optional; all others are mandatory.
find <pattern>
prog <pattern>
This command produces a list of files matching the pat-
tern <pattern>. The <pattern> may be interpreted as a
simple substring, a case sensitive substring, an exact
string or a regular expression, depending on the value
of the search variable. The output normally contains
such information as the file name that was matched, the
directory path leading to it, the site containing it
and the time at which that site was last updated. The
format of the output can be selected through the
output_format variable. The results are sorted accord-
ing to the value of the sortby variable, and are lim-
ited in number by the maxhits variable.
prog is identical to find. It is included for backward
compatibility with older versions of the system.
help [<topic> [<subtopic>] ...]
Invokes the help system and presents help on the speci-
fied topic. A list of words is considered to be one
topic, not a list of individual topics. Thus,
help set maxhits
requests help on the subtopic maxhits of topic set, not
on two separate topics. After help is presented the
user is placed in the help system at the deepest level
containing subtopics.
For example, after typing
help set maxhits
and being shown the information for that topic the user
is placed at the level set in the help hierarchy.
list [<pattern>]
Produce a list of sites whose contents are contained in
the archie catalog. With no argument all the sites are
listed. If given, the <pattern> argument is interpreted
as a regular expression (See "REGULAR EXPRESSIONS"
below) against which to match site names: only those
names matching are printed. The format of the output
can be selected through the output_format variable.
Note that the numerical (IP) address associated with a
site name was valid at the last time the site was
updated in the archie catalog but may have been changed
subsequently. Furthermore, the listed IP address is
the primary address as listed in the Domain Name System
(secondary addresses are not stored).
Example:
list
lists all sites in the catalog, while
list .de$
lists all German sites.
mail <address>
Mail the result of the last command that produced out-
put (eg. find, whatis, list) to <address>. This must be
a vaid email address.
manpage [ roff | ascii ]
Display the archie manual page (this file). The
optional arguments specify the format of the returned
document. roff specifies UNIX troff(1) format while
ascii specifies plain, preformatted ASCII output. With
no arguments it defaults to ascii.
domains
Asks the current server for the list of the archie
pseudo-domains that it supports. See the entry for the
match_domain variable below. This command takes no
arguments.
Example:
domains
requests the list of pseudo-domains from the server.
The result looks (in part) something like this:
africa Africa za
anzac OZ & New Zealand au:nz
asia Asia kr:hk:sg:jp:cn:my:tw:in
centralamerica Central America sv:gt:hn
easteurope Eastern Europe bg:hu:pl:cs:ro:si:hr
mideast Middle East eg:.il:kw:sa
northamerica North America usa:ca:mx
scandinavia Scandinavia no:dk:se:fi:ee:is
southamerica South American ar:bo:br:cl:co:cr:cu:ec:pe
usa United States edu:com:mil:gov:us
westeurope Western Europe westeurope1:westeurope2
world The World world1:world2
The first column gives the names of pseduo-domains sup-
ported by the server. The second gives the "natural
language" description of the pseudo-domain and the
third column is the actual definitions of those
domains. Thus here the "asia" domain is comprised of
the Domain Name System country codes for Korea ("kr"),
Hong Kong ("hk"), Singapore ("sg") etc. Pseudo-domains
may also be constructed from other pseudo-domains: thus
one component of the the "northamerica" domain is
itself constructed from the "usa" pseudo-domain.
motd Re-display the "message of the day", which is normally
printed when the user initially logs on to the client
(in the case of the interactive interface) or at the
start of the returned message (in the email interface).
servers
Display a list of all publicly accessible archie
servers worldwide. The names of the hosts, their IP
addresses and geographical locations are listed.
set <variable-name> [<value>]
Set the specified variable. Variables are used to con-
trol various aspects of the way archie operates; the
interpretation of <pattern> arguments, the format of
output from various commands, etc. See the section
below on variables for a description of each one as
well as the entries for unset and show.
show [<variable-name> ...]
Without any argument, display the status of all the
user-settable variables, including such information as
its type (boolean, numeric, string), whether or not it
is set and its current value (if its type requires a
value). Otherwise show the status of each of the
specified arguments.
Example:
show maxhits
site <sitename>
This command is currently unimplemented under version 3
of the archie system.
unset variable
Remove any value associated with the specified vari-
able. This may cause counter-intuitive behavior in
some cases; for example, if maxhits is not defined by
the user, the find command will print the internal
default number of matches rather than an unlimited
number of matches.
version
Print the current version of the client.
whatis <substring>
Search the Software Description Catalog for the given
substring, ignoring case. This catalog consists of
names and short descriptions of many software packages,
documents (like RFCs and educational material), and
data files stored on the Internet.
Example:
whatis uucp
in part gives as a result:
findpath.sh UUCP Pathfinder
logfile-stats UUCP LOGFILE analyzer
mapstats UUCP map statistics pro-
gram
Variable Types
The behavior of archie can be modified by certain variables,
the values of which may be changed using the set command, or
removed entirely by the unset command. There are three
variable types:
boolean (Set or unset)
numeric (Integer within a defined range)
string (String of characters which may or may not be
restricted).
If the value of a string variable should con-
tain leading or trailing spaces then it
should be quoted. Two ways of quoting text
are to surround it with a pair of double
quotes (`"'), or to precede individual char-
acters with a backslash (`\'). (A double
quote, or a backslash may itself be quoted by
preceding it by a backslash.) The resulting
value is that of the string with the quotes
stripped off.
Numeric Variables
maxhits
Allow the find command to generate at most the speci-
fied number of matches (permissible range: 0-1000;
default: 100).
Example:
set maxhits 100
halts prog after 100 matches have been found in total.
maxhitspm
Across all the anonymous FTP archives on the Internet
(and even on one single anonymous FTP archive) many
files will have the same name. For example, if you
Sun Release 4.1 Last change: 12 Apr 1994 7
ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L)
search for a very common filename like "README" you can
get hundreds even thousands of matches. You can limit
the number of files with the same name through this
variable. For example,
set maxhitspm 100
tells the system only 100 files with the same name.
Note that the overall maximum number of files returned
is still controlled with the 'maxhits' variable.
maxmatch
This variable will limit the number filenames returned.
For example, if maxmatch is set to 2 and you perform a
substring search for the string "etc", and the catalog
contains filenames "etca", "betc" and "detc" only the
filenames "etca" and "betc" will be returned. However,
depending on the values of maxhitspm and maxhits you
will get back a number of actual files with those
names. Example:
set maxmatch 20
max_split_size
Approximate maximum size, in bytes, of a file to be
mailed to the user. Any output larger than this will
be split in pieces of about this size. This can be set
by the user in the range 1024 to ~2Gb with a default of
51200 bytes.
String Variables
compress
The kind of data compression the user can specify when
mailing back output. Currently allowed values are none
and compress (standard UNIX compress(1),withadefaultof
encode
The type of post-compression encoding the user can
specify when mailing back output. Currently allowed
values are none and uuencode, with a default of none.
Note that this variable is ignored unless compression
is enabled (via the compress) variable.
language
Allows the user to specify the language in which the
help, etc. is presented. Currently the default value
is english.
mailto
If the mail command is issued with no arguments, mail
the output of the last command to the address specified
by this string variable. Initially this variable is
unset.
Example:
set mailto user@frobozz.com
Conventional Internet addressing styles are understood.
BITNET sites should use the convention:
user@sitename.bitnet
UUCP addresses can be specified as
user@sitename.uucp
match_domain
This variable allows users to restrict the scope of
their search based upon the Fully Qualified Domain
Names (FQDN) of the anonymous FTP sites being searched.
In this way, the user can specify a colon-separated
list of domain names to which all returned sites must
match. Each component in the list is taken as the
rightmost part of the FQDN. For example,
set match_domain ca:internic.net:harvard.edu
means that the names of all returned sites must end in
"ca" (Canada), "internic.net" (sites in the Internet
NIC) or "harvard.edu" (sites at Harvard University).
While these are all real domain names, listing all pos-
sible combinations for say, the USA, would quickly
become tedious (and if you think that is bad, try list-
ing all the countries on the Internet in Europe). To
aid in this problem, the archie system has the concept
of pseudo-domains to allow users to use a shorthand
notation when using this facility. These pseudo-domains
are defined on a server-by-server basis and you can use
the domains command to query your current server for
its list of predefined pseudo-domains.
A pseudo-domain is a list of real DNS domain names
and/or a list of other pseudo-domains. For example, the
archie administrator on the server could define the
pseudo-domain
"usa"
to be
"edu:mil:com:gov:us"
If this definition existed on the server, then you
could
set match_domain usa
which would be the same as saying
set match_domain edu:mil:com:gov:us
In addition, the server administrator may define
"northamerica"
to be
"usa:ca:mx"
meaning that "northamerica" is composed of the pseudo-
domain "usa" and the real domains "ca" (Canada) and
"mx" (Mexico). This process can be repeated for 20 lev-
els (more than sufficient for any naming scheme). By
using the domains command you can determine what
pseudo-domains your current server supports.
match_path
Sometimes you only would like your search (using the
find command) to look for files or directories with a
certain set of names in their full path.
For example, many anonymous FTP site administrators
will put software packages for the MacIntosh in a path
containing the name "mac" or "macintosh". Another exam-
ple is when a document exists in several formats and
you are only looking for the PostScript version. You
can guess that the file may end in ".ps" or it maybe in
a directory called "ps" or "PostScript".
This is usually guesswork, but is is useful to have the
archie system only look for files or directories with
particular components in their path name.
This variable allows you to do this. The arguments are
a colon-separated list of possible path name com-
ponents. In the last example above, saying
set match_path ps:postscript
will restrict the search only to match those files or
directories which have the strings "ps" or "postscript"
in their path.
The comparison is always case-insensitive (regardless
of the value of the match variable) and there is a log-
ical OR connecting the components so that the above
statement says: "find only files which have 'ps' OR
'postscript' in their path". If either component
matches then the condition is satisfied.
output_format
Affects the way the output of find and list is
displayed. User settable, with valid values of machine
(machine readable format), terse and verbose, with a
default of verbose.
search
The type of search done by the find (or prog) command.
User settable with a range of exact, regex, sub, sub-
case, exact_regex, exact_sub and exact_subcase with a
default of sub. (The exact_<x> types cause it to try
exact first, then fall back to type <x> if no matches
are found). The values have the following meanings:
exact
Exact match (the fastest method). A match occurs
if the file (or directory) name in the catalog
corresponds exactly to the user-given substring
(including case).
For example, this type of search could be used to
locate all files called xlock.tar.Z
regex
Allow user-specified (search) strings to take the
form of ed(1) regular expressions.
Note: unless specifically anchored to the begin-
ning (with ^) or end (with $) of a line, ed(1)
regular expressions (effectively) have ``.*''
prepended and appended to them. For example, it
is not necessary to type
find .*xnlock.*
because
find xnlock
suffices. In this instance, the regex match is
equivalent a simple substring match. Those unfam-
iliar with regular expressions should refer to the
section entitled REGULAR EXPRESSIONS which appears
below.
sub Substring (case insensitive). A match occurs if
the file (or directory) name in the catalog con-
tains the user-given substring, without regard to
case.
Example:
The pattern:
is
matches any of the following:
islington
this
poison
subcase
Substring (case sensitive). As above, but taking
case as significant.
Example:
The pattern:
TeX
will match:
LaTeX
but neither of the following:
Latex
TExTroff
server
the Prospero server to which the client connects when
find or list commands are invoked. User settable, with
a default value of localhost.
sortby
Set the method of sorting to be applied to output from
the find command. Typing the keyboard interrupt char-
acter (generally Cntl-C on UNIX hosts) aborts a search.
This will also dequeue the request from the server.
Unlike previous versions of the archie system, version
3 does not allow partial results. The output phase may
be aborted by typing the abort character a second time.
The five permitted methods (and their associated
reverse orders) are:
none Unsorted (default; no reverse order, though rnone
is accepted)
filename
Sort files/directories by name, using lexical
order (reverse order: rfilename)
hostname
Sort on the archive hostname, in lexical order
(reverse order: rhostname)
size Sort by size, largest files/directories first
(reverse order: rsize)
time Sort by modification time, with the most recent
file/directory names first (reverse order: rtime)
THE INTERACTIVE (TELNET) INTERFACE
The interactive interface accepts the following commands and
variables in addtion to those listed above.
Commands
stty [[<option> <character>] ...]
This command allows the user to change the interpreta-
tion of specified characters, in order to match their
particular terminal type. At the moment only erase is
recognized as an <option>. (Typically, <character> is
a control character and may be specified as a pair of
characters (e.g. control-h as the pair '^' followed by
'h'), the character itself (literal), or as a quoted
pair or literal.
Without any arguments the command displays the current
values of the recognized options.
mail [<address>]
The output of the previous successful command (i.e. an
invocation of find, list or whatis that produced out-
put) is mailed to the specified electronic mail
address. If no <address> is given the contents of the
mailto variable are used. If this variable is not set
then an error occurs, and nothing is mailed, although
the output is still available to be mailed.
Example:
mail user1@hello.edu
Conventional Internet addressing styles are understood.
BITNET sites should use the convention:
user@sitename.bitnet
UUCP addresses can be specified as
user@sitename.uucp
pager
This command is included only for backward compatibil-
ity. It has the same effect as set pager. Its use is
discouraged and it will be removed in a future release.
nopager
This command is included only for backward compatibil-
ity. It has the same effect as unset pager. Its use
is discouraged and it will be removed in a future
release.
Variables
autologout
Set the length of idle time (in minutes) allowed before
automatic logout (permissible range: 1-300; default:
60).
Example:
set autologout 45
logs the user out after 45 minutes of idle time.
pager
Filter all output through the default pager (default:
unset). When using the pager you may also want to set
the term variable to your terminal type (see term vari-
able).
Example:
set pager
status
When set this variable will cause the system to report
the position in the queue of your request on the
server. In addition, it will display the estimated time
to completion of your request. This estimate is based
in an average of the amount of times similar queries
have taken in the past several minutes. The variable
also controls the display of a "spinner" during the
catalog search, which indicates that we are awaiting
results from the Prospero server. Set by default.
term Specify the type of terminal in use (and optionally,
its size in rows and columns). This information is
used by the pager.
The usage is:
set term <terminal-type> [<#rows> [<#columns>]]
The terminal type is mandatory, but the number of rows
and columns is optional; specify either rows only, or
both rows and columns (default: 24 rows, 80 columns).
The default value for this variable is dumb. However it
may be set automatically through the telnet protocol
negotiation.
Examples:
set term vt100
set term xterm 60
set term xterm 24 100
THE EMAIL INTERFACE
The archie email interface currently accepts the following
commands in addition to those listed in the COMMANDS section
above.
path <address> is an alias for
set mailto <address>
quit Ignore any further lines past this point in the mail.
This is generally not needed, but can be used to
prevent the system from interpreting signatures etc. as
archie commands.
The Subject: line in incoming mail is processed as if it
were part of the main message body.
A message not containing a valid request will be treated as
a help request.
REGULAR EXPRESSIONS
Regular expressions follow the conventions of the ed(1) com-
mand, allowing sophisticated pattern matching. In the fol-
lowing discussion, the string containing a regular expres-
sion will be called the ``pattern'', and the string against
which it is to be matched is called the ``reference
string''. Regular expressions imbue certain characters with
special meaning, providing a quoting mechanism to remove
this special meaning when required.
The rules governing regular expression are:
c A character c matches itself unless it has been
assigned a special meaning as listed below. A special
character loses its special meaning when preceded by
the character '\'. This does not apply to '{', which
is non-special until it is so treated. Thus, although
'*' normally has special meaning, the string '\*'
matches itself.
Example:
The pattern
acdef
matches any of the following:
s83acdeffff
acdefsecs
acdefsecs
but neither of the following:
accdef
aacde1f
Example:
Normally the characters '*' and '$' are special, but
the pattern
a\*bse\$
acts as above. Any reference string containing:
a*bse$
as a substring will be flagged as a match.
. A period (known as a wildcard character) matches any
character except the newline character.
Example:
The pattern
....
will match any 4 characters in the reference string,
except a newline character.
^ A caret (^) appearing at the beginning of a pattern
requires that the reference string must start with the
specified pattern (an escaped caret, or a caret appear-
ing elsewhere in the pattern, is treated as a non-
special character).
Example:
The pattern
^efghi
The pattern will match only those reference strings
starting with efghi; thus, it will match either of the
following:
efghi
efghijlk
but not:
abcefghi
$ A dollar sign ($) appearing at the end of a pattern
requires that the pattern appear at the end of a refer-
ence string (an escaped dollar sign, or a dollar sign
appearing elsewhere, is treated as a regular charac-
ter).
Example:
The pattern
efghi$
Will match either of the following:
efghi abcdefghi
but not:
efghijkl
[string]
Match any single character within the brackets. The
caret (^) has a special meaning if it is the first
character in the series: the pattern will match any
character other than one in the list.
Example:
The pattern
[^abc]
Will match any character except one of:
a
b
c
To match a right bracket (]) in the list, put it first,
as in:
[]ab01]
A caret appearing anywhere but the in first position is
treated as a regular character.
The minus (-) character is special within square brack-
ets. It is used to define a range of ASCII characters
to be matched. For example, the pattern:
[a-z]
matches any lower case letter. The minus can be made
non-special by placing it first or last within the
square brackets. The characters '$', '*' and '.' are
not special within square brackets.
Example:
The pattern
[ab01]
matches a single occurrence of a character from the
set:
a
b
0
1
Example:
The pattern
[^ab01]
will match any single character other than one from the
set:
a
b
0
1
Example :
The pattern
[a0-9b]
matches one of the characters:
a
b
or a digit between 0 and 9, inclusive.
Example :
The pattern
[^a0-9b.$]
matches any single character which is not in the set:
a
b
.
$
or a digit between 0 and 9, inclusive.
* Match zero or more occurrences of an immediately
preceding regular expression.
Example:
The pattern
a*
matches zero or more occurrences of the character:
a
Example:
The pattern
[A-Z]*
matches zero or more occurrences of the upper case
alphabet.
\{m\}
Match exactly m occurrences of a preceding regular
expression, where m is a non-negative integer between 0
and 255 (inclusive).
Example:
The pattern
ab\{3\}
matches any substring in the reference string consist-
ing of the character `a' followed by exactly three `b'
characters.
\{m,\}
Match at least m occurrences of the preceding regular
expression.
Example:
The pattern
ab\{3,\}
matches any substring in the reference string of the
character `a' followed by at least three `b' charac-
ters.
\{m,n\}
Match between m and n occurrences of the preceding reg-
ular expression (where n is a non-negative integer
between 0 and 255, and n>m).
Example:
The pattern
ab\{3,5\}
matches any substring in the reference string consist-
ing of the character `a' followed by at least three but
at most five `b' characters.
Tips for Using Regular Expressions
1) When matching a substring it is not necessary to use
the wildcard character to match the part of the refer-
ence string preceding and following the substring.
Example:
The pattern
abcd
will match any reference string containing this pat-
tern. It is not necessary to use
.*abcd.*
as the pattern.
2) In order to constrain a pattern to the entire reference
pattern, use the construction:
^pattern$
3) The '[]' operator provides an easy mechanism to obtain
case insensitivity. For example, to match the word:
hello
regardless of case, use the pattern:
[Hh][Ee][Ll][Ll][Oo]
THE ARCHIE DATABASE
The archie catalog subsystem maintains a list of about 1200
Internet anonymous ftp(1) archive sites of approximately 2.5
million unique filenames themselves containing 200 Gigabytes
(that is, 200,000,000,000 bytes) of information. The current
catalog requires about 400 MB of disk storage.
SEE ALSO
bitftp (1L), ftp(1), telnet(1), archie(1), xarchie(1)
AUTHORS
Bunyip Information Systems Inc., Montreal Canada
(info@bunyip.com). Original manual page by R. P. C. Rodgers,
UCSF School of Pharmacy, San Francisco, California 94143
(rodgers@maxwell.mmwb.ucsf.edu), Nelson H. F. Beebe
(beebe@math.utah.edu), and Alan Emtage (bajan@bunyip.com).
Partial funding contributed by Trevor Hales
(hales@mel.dit.cicsiro.au)
archie is a registered trademark of Bunyip Information Systems, Inc.