             TRANSLAT - computer translation of Botanical Latin

                              Peter D. Bostock

  Queensland Herbarium, Department of Environment, Queensland, Australia.

  Notes accompanying Freeware version 4.09, December 1998

BACKGROUND 
For some years now, I have been interested in the use of
personal computers in botanical research. The obvious uses are mostly
well-served by "off-the-shelf" packages; these include methods for
taxonomic and ecological data storage and analysis using variously applied
data-bases, including DELTA and DECODA, spreadsheets and other statistical
tools for the analysis and graphical display of uni- and multi-variate
data, and mapping software for analysing and displaying geographically
based data.

In contrast, the arena of language translation is generally very poorly
served, although it always seemed to me to be a rich field for the
application of computers. Bilingual dictionaries are available, in printed
form, and increasingly, on computer. These often ignore case, declension
and other real-world problems of translation. They also usually ignore the
idiomatic nature of language. My interest in botanical Latin was fostered
by W.T. Stearn's famous tome, and led me to explore the possibility of
translation of the relatively well-structured language as employed in
descriptions and diagnoses. This interest has culminated in the computer
program, TRANSLAT, described below.

TRANSLAT uses indexed on-disk databases of verbs, adjectives, nouns,
pronouns, phrases and adverbs, (including conjunctions and prepositions),
to match stems and terminations (flexions or endings), or the whole word,
if indeclinable, of botanical Latin words to provide both a
literal/figurative English meaning, and an optional associated statement of
the `grammar' i.e. the gender, number and case (together with an indication
of the mood and tense of verbs or the degree of comparison of adjectives
and adverbs).

The translation method is best described as `informed brute force'! The
program employs a three-word buffer, to allow one-word look-behind and
look-ahead; this allows, for example, for the inverted position of versus,
and also facilitates contextual modification of English phraseology
(dropping the implied English prepositions with/ by/in, for example, after
a Latin preposition governing the ablative). However, the actual
translation process is simply one of trying every likely Latin word (and
all of its valid endings) until a match occurs.

Each database is interrogated in turn by reference to sets of indexed keys,
loaded into memory when the program starts; initial matching consists of
only the first letter, or the first and second letters, of the unknown
Latin word. The subset of matching database entries is then cycled in an
extended matching process on the stem and/or nominative singular (this
description is clearly inadequate to describe the processing of verbs!).
Each `stem' which matches at this level is then sequentially declined,
until a final match occurs on the full unknown word. Shortcuts are
available, of course, where the endings are known to be common to all
numbers or genders etc.

If no match is found during this process, the program then looks for
prefixes (see list below) and a limited set of suffixes (primarily for
comparative, superlative and diminutive adjectives and comparative and
superlative adverbs) and re-interrogates the adjective database looking for
a match.

Commonly-used phrases are also pre-programmed (e.g. plus minusve), as are
abbreviations such as diam., cm., dm. etc. Numbers are ignored. Recognised
pronouns include the "doubly-declined" compounds such as quicumque and
utercumque (see Appendix). The trailing -que (and), -ve (or) are
recognised, as is the particle -ne.

Adjectival prefixes (with a few exceptions) are not recognised if separated
by a hyphen, but compound adjectives must be hyphenated viz. hemisphaericus
vs. flavo- virescens. The program also recognises those nouns and
adjectives which are available only in one number i.e. singular only or
plural only.

The processes involved in translating verbs were among the most onerous
programming tasks I have ever attempted. Most moods are covered, except the
imperative and some parts of the verb infinite (specifically the perfect,
pluperfect and future infinite). Gerunds and gerundives are also
translated. Anomalous verbs including eo (and compounds), fio, fero, and
possum are more or less covered, as are deponents. The impersonal third
person usage of certain verbs is recognised. The actual meanings produced
for the various tenses are a subset of accepted meanings and may require
liberal re-translation on the part of the user of this program! If you
require examples of this, and are feeling particularly adventurous, try
translating at random any discussion from Ferdinand von Mueller's
Fragmenta Phytographiae.

The speed of the program may be increased by foregoing the translation of
verbs (although adjectival forms such as present and past participles are
always included). A run-time parameter is used to invoke this option
(/NOV). I have limited the number of meanings of most verbs to those few
(usually 2, rarely up to 8) which seem to be most applicable (the choice
was entirely mine) and hence some English translations of verbs will appear
very clumsy. An occasional error to which I seem prone (caused by leaving
out a hyphen during compresion of the English data as in "lov-e, -es, -ed,
-ed, -ing" but stored in error as "love-s, es, -ed, -ed, ing") will give
meanings such as "he es" instead of "he loves", or "I was ing" instead of
"I was loving"!. Please let me know if you find examples of this.

A conservative estimate of the number of distinct words recognised by the
program is in the region of 350,000, but the actual number of words
recorded in the databases is currently about 6,550. The projected figure
above does not cover the multiplying factor, almost impossible to
calculate, of the prefixes, diminutives, comparatives and superlatives
which can be applied to most adjectives.

As far as speed goes, the brute-force method is moderately successful,
returning an average translation time per Latin word of about 0.38-0.48
seconds on a 12MHz AT (80286) with no disk-caching and a 24 mS (average
access) hard disk. On an 80486DX33 with full disk-caching and a hard disk
of about 15 mS access time, the time per word is of the order of 0.07-0.14
seconds, while on a Pentium 133 under Win95, the rate is 20 milliseconds
(0.02 secs) per word. TRANSLAT provides statistics on the number of words
translated and the average time per word at the end of each input file!

TRANSLAT has been tested under NT4, NT3.51 SP5, Win95, Win98, WINDOWS 3.1x
(with DOS 3.x and above and it should also behave under DR DOS 6.x); it can
be run in a graphics window in WINDOWS 3.1, if 386-enhanced mode is used,
because the menu system is text- not graphics-based. Some icon files (eg.
TRANSLAT.ICO) and PIF files (TRANSBIN.PIF, TRANSLAT.PIF etc) are included
for use under Windows. These will be converted by Win95/98.

Memory Requirements

TRANSLAT requires about 460kb of memory at present. The program must be run
from a hard disk, although the Latin text files could be on floppy disk
(not recommended). The Latin input files must be saved in DOS text format.
Output from the program is also in DOS text format.

TRANSLAT is also quite well-behaved in extended memory machines under DOS
5.0/6.xx when DOS is loaded high i.e. there is no necessity to resort to
the use of LOADFIX. TRANSLAT is written in Microsoft Quick Basic 4.5, with
language extensions from Crescent Software (PDQ version 3.0) and TOOLBOX by
Mark Goodwin (MIS Press, 1989).

NOTE: TRANSLAT is not able to decline Latin words which are not stored in
its databases, although it can make a guess (via option /GUESS). Hence, if
you find that Latin words for your special group of plants are not
recognised, and your words are correctly spelt, compile a list (including
suggested English meanings), or send sample .LAT files, and I will issue
updated databases periodically. In a future release, I intend to provide
the necessary programs bundled with TRANSLAT to create and modify the
databases.

ACKNOWLEDGEMENT: My interest in Botanical Latin, and the stimulus for this
program, both arose after I received a copy of the encyclopaedic Botanical
Latin by Professor William T. Stearn, and I gratefully acknowledge this
fact. Other sources are listed in the program, by pressing the function key
F4.

DISCLAIMER: Neither the Author (Peter D. Bostock) or the Author's employers
(Queensland Environmental Protection Agency) accept any liability should any
person incur expense or damage resulting from the use of this program.

APPENDIX

1. INSTALLATION

The programs and data files are installed via a self-extracting Zip file
TRANINST.EXE (see below). If a different directory structure is required,
use XTree or similar to rename/relocate the files. The only file which
needs to be modified, if the default directory name and structure is not
followed, is the file TRANSLAT.SET (see below for details). NB the file
hosted at Geocities mirror is a simple ZIP file (it costs money to store
.EXE files at this site).

The default structure is:

     C:\
     +---LATIN
          +-----DATA
          +-----TEXT
          +-----WIN31X (used only for storage of Windows 3.1x PIF files)

If a drive letter other than C: is used, edit TRANSLAT.SET (with a text
editor, as it is an ASCII data file) to reflect the new drive letter. If
different subdirectory names are required, similarly edit the relevant
entries in TRANSLAT.SET. The PIF files will also need to be altered
accordingly.

2. RUN-TIME CONSIDERATIONS

Run the program by typing TRANSLAT /BIN at the DOS prompt (or by double-
clicking on the icon in a Windows 3.1 group - see below). If the options
/NOV (no verb translation), /NOA (no adverbial phrase translation) or
/GUESS (guess unknown word grammar) are required, then run the program by
typing TRANSLAT /NOV etc, or set up an alternative .PIF file for Windows
3.1 by use of the PIF editor. Two standard PIF files are provided - one to
invoke /BIN (TRANSBIN.PIF), the other for normal operation (TRANSLAT.PIF).
Note these are set up for drive C, and directories \LATIN etc.

On virtually all computers, subsequent execution of TRANSLAT can be invoked
without the /BIN option. This uses the fast-load of the binary files
created by the /BIN run. I have found only a couple of machines which
always require the /BIN command.

The menu system is fairly straight forward - it is based on defaults (the
double-lined box around one of the choices). The required option on such
the menus can be selected by moving the double-line box with arrow keys or
by pressing the space bar, and then pressing <ENTER> or the option can be
chosen by the highlighted letter (usually the first letter of the word or
phrase). The initial information screens allow any key to be pressed to
move to the next stage of the program.

The difference between Description and Diagnosis methods is not great - the
Diagnosis method attempts to cater for the slightly different usage of the
Ablative by modifying the implied "with/in (the)" to "by/in (the)".

The choice between GRA(mmar) and TRA(nslation) output formats is really
dependant on whether you require full justification for the program's
translation - the GRA output describes in some detail the type of word, its
case, gender, number, tense etc, while the TRA output tries to replace the
Latin word with its English equivalent(s) without additional padding or
punctuation. See 4. below for more information.

Generally, an ESCape keystroke will bring up a box requesting "Quit the
program? Y/N", at most points during the initial questioning; the actual
translation process itself can be interrupted by pressing CTRL+C ie Press
and hold Control key, and press key C. The same "Quit" box will be
displayed.

3. PREFIXES IMBEDDED IN TRANSLAT

aequi, atro, austro*, bi*, crassi, di, extra, e, ex, hemi, hypo, infra,
intra, in, multi, pachy, palaeo, pauci, per, pinnati, pluri, poly, prae,
pseud, pseudo*, quadri*, quadr, quinque, quinqui, quinqu, semi*, sesqui*,
sub, supra*, tripli, tri*, uni*.

Those prefixes marked with an asterisk will also be recognised if separated
from the associated word by a hyphen.

In additiion, the following compounds (acting as prefixes) will be
recognised only if followed by a hyphen: porphyr-, porphyro-.

4. COMPOUND PRONOUNS RECOGNISED BY TRANSLAT

aliqui, aliquis, alteruter, ecquis, quicumque, quicum, quidam, quilibet,
quisnam, quispiam, quisquam, quisque, quisquis, quivis, uterlibet, uterque.

NOTE: compounds between adjectives and pronouns are not covered - in
particular unusquisque - you may get around this by entering the adjective
separately viz. unus quisque or uno quoque etc.

5. ADDITIONAL INFORMATION ABOUT THE FILES REQUIRED

TRANINST.EXE - self-extracting ZIP file (PKUnzip 2.04g) (compatible with
XTGold 3.0 for DOS, ALT+F4 key combination or WinZip 6). Run this `program'
with the mandatory parameter "-d" (for full directory structure), the
optional parameter "-f" (to freshen existing files) or -o (to overwrite all
files) and finally, the destination drive "D:\" or "C:\" (place on drive D
or C respectively) as required.

e.g to install on D, run as follows: A:\>TRANINST -d -o D:\

OR C:\>A:TRANINST -d [-f] D:\

(this gives subdirectories D:\LATIN\, D:\LATIN\DATA\ etc).

OR copy TRANINST.EXE to root directory ie C:\, then

C:\>TRANINST -d -o

(this version gives installation on C:)

TRASMALL.EXE - same as TRANINST.EXE but lacking 2 large files: ENGLISH.TXT
and EXAMPLES.EXE

MISCDATA.EXE - the missing files from TRASMALL.EXE. These unzip in the same
way as for TRANINST.EXE.

TRANSLAT.EXE - the program itself!

Run TRANSLAT /? or TRANSLAT /H for a help screen (identical to the initial
welcome screen when running the program normally).

Remember: the first time it is run, use: TRANSLAT /BIN or the .TRS files
will not be created correctly.

TRANSLAT.SET - details the path and name of the data-files and the default
location of *.LAT, *.GRA and *.TRA files (collectively known as TEXT files
to TRANSLAT). This file is user-editable, although the numbers on the last
line must NOT be altered unless you have made changes to ADVERBS.DAT,
PHRASES.DAT or ENDINGS.DAT. Change "C:\" to "D:\" etc if you are using a
different hard disk letter, and alter the second last line if you prefer a
default sub-directory other than C:\LATIN\TEXT for the storage of your .LAT
files. NB: This file must be in the default directory i.e. preferably the
same one as TRANSLAT.EXE. It will not be found if placed in another
directory, even if that directory is in the PATH statement of AUTOEXEC.BAT.

Sample as supplied (ignore the asterisks):

   * "C:\LATIN\DATA\NOUNS.DB"
   * "C:\LATIN\DATA\ADVERBS.DAT"
   * "C:\LATIN\DATA\ADVERBS.DB"
   * "C:\LATIN\DATA\ADJECTIV.DB"
   * "C:\LATIN\DATA\VERBS.DB"
   * "C:\LATIN\DATA\VERBSTEM.DAT"
   * "C:\LATIN\DATA\LATDECLN.DAT"
   * "C:\LATIN\DATA\PHRASES.DAT"
   * "C:\LATIN\DATA\ENDINGS.DAT"
   * "C:\LATIN\TEXT"
   * 1912, 52, 1509, 2577, 375, 1068, 130, 280, 16
   * (any subsequent lines are ignored - used for comments etc)

[sequence of the above numbers is: nouns, prepositions,
adverbs/conjunctions, adjectives, verbs, verb-stems, phrases, endings
(/Guess command), pronouns.

ADJ.EXE - a utility program to display the contents of the ADJECTIV.DB
database, including full declensions. To pull down a menu (either "Show
declined word" or "Quit Program"), hold the Alt key and press either S or
Q. An escape keypress also exits the various levels of the program. This
program expects to find ADJECTIV.DB in C:\LATIN\DATA subdirectory - if this
is not the case, a command line entry must be used giving the new path:

e.g. >ADJ D:\LATIN

*.LAT - user-entered text files (MUST be in DOS Text format). At least one
Carriage Return/Line Feed combination must be present in such files. See
EXAMPLES.EXE (a self-extracting PKZIP Version 2.04g zip file) for examples.
Extract files from EXAMPLES.EXE simply by typing `EXAMPLES -d C:\' at the
DOS prompt, when logged into subdirectory \LATIN\TEXT. Files can also be
viewed in EXAMPLES.EXE by typing EXAMPLES -v at the DOS prompt.

*.GRA - TRANSLAT-produced translation file (DOS text format), with one
Latin word per line, and associated meanings (including abbreviations for
part of speech, case/number/person/tense etc) on the same line, separated
by % symbols. A <TAB> precedes the first % symbol of each new word.

*.TRA - TRANSLAT-produced brief translation file (DOS Text format), lacking
the case/number/person indicators which are present in the .GRA files. This
file contains Line Feeds in accordance with the input .LAT file.

*.TRS files - these are binary images of QuickBasic arrays, mostly involved
in indexing the database (*.DB) files. Do not attempt to change them! They
are created during /BIN runs. Similar files (called *.TBL) are produced by
LATIN.EXE during its /BIN process.

LATDECLN.DAT - endings for nouns, adjectives, verb forms and pronouns. NOT
user-editable! Make Read-only for safety.

PHRASES.DAT - details the 2-word phrases which are recognised by TRANSLAT.
It is user-modifiable, but remember to change relevant entry (count of
items) in TRANSLAT.SET.

ADVERBS.DAT - prepositions (first 52 entries) followed by adverbs,
conjunctions and indeclinable words. Note the first 53 entries must be in
alphabetic sequence, followed by the remaining words in alphabetic
sequence. Only one entry is allowed for each word in each part of the file,
although prepositions (1-52) can be duplicated with alternative meaning in
the adverb/conjunction area (54 onwards). Again, user editable.

*VERBSTEMS.DAT - index file for verbs - NOT user editable.

*.DB files - database files for the storage of nouns, adjectives, adverbs
and verbs. Again, these can only be altered by programs not supplied with
this version of TRANSLAT.

LATIN.EXE - program to aid translation from English to Latin. The program
declines Latin words (using the same databases as TRANSLAT.EXE). Although
the user has to determine the basic Latin word (e.g. nominative singular
for a noun, nom. sing. masc. for adjectives and pronouns, and 1st person
present indicative active for verbs, the program supplies the correct
ending as prompted by the user. It has the advantage of remembering
previous usage, and setting defaults accordingly. Try it - LATIN /BIN on
first run only, subsequently just type LATIN. The program prompts for a
file name to use for the Latin output. Remember pressing the Escape key at
any stage during the menus will either cancel a current operation, or
prompt a "Do you wish to exit?" question. Requires approx. 400kb free
memory to execute successfully..
