NAME

viat.pl - The Voynich interlinear transcription archive tool


SYNOPSIS

perl viat.pl [OPTIONS]


DESCRIPTION

This program is an extraction tool for the Voynich interlinear transcription archive. It allows you to select interesting parts from that file for your analysis.

The main purpose of the program is extraction, not formatting. So do not expect it to do fancy formatting. If you want to get a nice formatted HTML document, pipe the output into the vhtml.pl tool.

The program is intended to be used in a pipe, which is the most natural way of data processing in unixoid systems. So the program output is written to the standard output.


OPTIONS

There are a plenty lot of options. If you want to see some easy to use examples, take a look at EXAMPLES.

Please note that all options and parameters are case-sensitive, the -t option is different from the -T option, and specifing a transcriber -T c will silently select nothing instead of the Currier transcription.

You can give the options in two styles. The short, one-letter options preceeded by a single dash in the Un*xoid style, and the long options preceeded by a double dash in the GNU style. Both styles are equivalent.

If you use the long options, you can abbreviate them as long they are unique.

Getting information

-v or --version
Display the version of the program.

-I or --show_illustration_codes
Show all codes for illustration types and exit.

-T or --show_transcriber_codes
Show all code for transcribers and exit.

General options

-h or -? or --help
Show an option overview as a short help.

-F path or --file=path or transcription=path
Specify the path of the interlinear transcription file to use. By default, the program looks for a file named 'text16e6.evt' in the current directory.

Output options

If you do not specify any output options, the extracted text is given to the output in raw format, without any kind of processing.

In most non-trivial cases, you want to do some processing.

-c or --comment_keep
Keep all comment lines in the output. By default, they are suppressed.

-s or --space_convert
Convert all spacing (interpunctation, line, percent) characters to real spaces.

-w or --weirdo_convert
Replace the weirdos by EVA codes whereever possible. Not every transcription has the correct weirdo notation.

-C or --comment_suppress
Remove all comments from the output text. Every text in curly brackets which is not eventually replaced as an explicit weirdo is handled as a comment.

This feature may ease the job of processing the transcription.

-L or --locator_suppress
Do not output the line locator in front of a line of text.

-M or --meta_comments
Output some meta information for every page of the manuscript. Every line of the meta information starts with the sequence #-, to make the job of parsing easy. By convention, the # is handled as a comment line marker, and so the meta information is extracted as a comment.

-S or --dubious_suppress
Remove all dubious word breaks in the output. If two Voynich ``words'' are separated only by a dubious space, they will be written to the output as one word.

Text selection options

If you do not specify any selection options, you will get the whole text from the archive, only the comments are removed. This could be done much easier, so you probably want to select some text for the extraction.

The selection options gives you a way to extract folios, transcriptions, Currier languages and hands.

-f folio-range or --folios=folio-range
Show only the text from the given folio range. The range specification can be given in the following ways (the letters n and m represents the folio numbers):
n
If you specify a single folio number, only this folio will be printed to the output.

n:m
You can specify a range by using a colon between the folio numbers. If your first page number is greater than your second one, this type will be corrected silently.

:n
If your specification starts with a colon, all folios from the first up to the given folio number will be printed.

n:
If your specification ends with a colon, all folios beginning from the given folio up to the last one will be printed.

-i illustration-codes or --illustration=illustration-codes
Select only the pages with the given illustrations.

If you specify an unknown illustration code, this is not an error. It selects silenty nothing.

-l language-code or --language=language-code
Select only the pages with the given Currier language code. The language code is A or B.

There is not a Currier language associated to every page. If you use this option, all pages without a known language will be hidden.

If you specify an unknown language code, it selects silently nothing.

-m hand-code or --hand=hand-code
Select only the pages with the given Currier hand code. The hand codes are 1, 2, 3, 4, 5, A or B. You can combine more than one hand code.

There is not a Currier hand associated to every page. If you use this option, all pages without a known hand will be hidden.

-t transcriber-codes or transcriber=transcriber-codes
Select only the text from the given transcriber. You can specify more than one transcriber. The transcriber are given by single characters with the following meaning:
C
Currier

D
Currier, second choice

F
First study group

G
First study group, second choice

H
Takeshi Takahasi

I
Jim Reed, second choice

J
Jim Reed

K
Karl Kluge

L
Don Latham

M
Don Latham, second choice

N
Gabriel Landini

P
Father Th. Petersen

Q
Karl Kluge, second choice

R
Mike Roe

T
John Tiltman

U
Jorge Stolfi

V
John Grove

X
Denis V. Mardle

Z
Rene Zandbergen

It is not an error, if you specify an unknown transcriber code. It just does not select anything. If all given transcriber codes are unknown, you get an empty result.

By default, all transcribers are included in the result.


EXAMPLES

Most examples are given in the short option syntax, which is fast to type and hard to read. While doing text analysis on the Voynich manuscript and trying different approaches, this is most probably the way you will type options on the prompt.

But if you write some more complex scripts, which you want to share with others, please use the much better readable long options. It will help you and other people to understand what happens.

Extract only the Currier transcription.

viat.pl -t C

Extract only the Currier transcription, but do not include the line locators into the extraction. Calling the program this way, you get the plain transcription text.

viat.pl -t C -L

Extract only the Currier transcription for all pages with the Currier language A.

viat.pl -t C -l A

Extract only the Currier transcription, suppress the line locator, replace all spacing characters by real spaces and replace all known weirdos by the EVA character codes.

viat.pl -t C -L -s -w

Do the same thing, but use the long option syntax.

viat.pl --transcriber=C --locator_suppress --space_convert

Select all pages with currier language B from the T. Takahashi transcription.

viat.pl -t H -l B

Do the same thing, but use the long option syntax.

viat.pl --transcriber=H --language=B


FILES

text16e6.evt
The file containing the interlinear transcription. This file is expected to be in your current directory by default, but you can specify any path and any file name using the -F option.


BUGS

At the moment, there are no known bugs. But there is a strange behaviour, which does not affect the functionality of the program:

If you choose to include meta information comments in the extracted file, you will get the information at the begin of every page, even if there is no selected Voynich text in the page. It is not a bug, because it only creates comments, not contents. But it is strange and should be done better some day.

There is no good error checking in the program. If, for example, the parameter for an option is expected to be numeric, but you give it some non-numeric value, the program silenty malfunctions. It should not be a real problem, but it can cause confusion, especially if the program is used in a pipe and the error gets not visible immedieatly.


TODO

I wrote this program as a personal replacement for the VTT tool, which is hard to read, hard to bugfix or change and hard to compile. There was no thought on giving it away, while I was writing it.

But in the mailing list I found that other people experiences problems with the VTT program too. So I wrote some documentation and prepared the program for publishing.

I expect that other users will miss some features that I do not need. Feel free to contact me via mail, if you want an additional feature - I will implement it, if the implementation is easy. But the feature should be a text extraction feature, not a formatting or analysis feature.

Please keep in mind that I am not a native speaker of the English language and do not expect me to understand every nuance of that language. I hope, that my documentation is understandable to everyone who want to use the program. If my english is clumsy or wrong, feel free to correct me.


AUTHOR

viat.pl was written by Michael Winkelmann, michael@weltretter.de