viat.pl - The Voynich interlinear transcription archive tool
perl viat.pl [OPTIONS]
This program is an extraction tool for the Voynich interlinear transcription archive. It allows you to select interesting parts from that file for your analysis.
The main purpose of the program is extraction, not formatting. So do not expect it to do fancy formatting. If you want to get a nice formatted HTML document, pipe the output into the vhtml.pl tool.
The program is intended to be used in a pipe, which is the most natural way of data processing in unixoid systems. So the program output is written to the standard output.
There are a plenty lot of options. If you want to see some easy to use examples, take a look at EXAMPLES.
Please note that all options and parameters are case-sensitive,
the -t
option is different from the -T
option, and specifing a
transcriber -T c
will silently select nothing instead of the
Currier transcription.
You can give the options in two styles. The short, one-letter options preceeded by a single dash in the Un*xoid style, and the long options preceeded by a double dash in the GNU style. Both styles are equivalent.
If you use the long options, you can abbreviate them as long they are unique.
-v
or --version
-I
or --show_illustration_codes
-T
or --show_transcriber_codes
-h
or -?
or --help
-F path
or --file=path
or transcription=path
If you do not specify any output options, the extracted text is given to the output in raw format, without any kind of processing.
In most non-trivial cases, you want to do some processing.
-c
or --comment_keep
-s
or --space_convert
-w
or --weirdo_convert
-C
or --comment_suppress
This feature may ease the job of processing the transcription.
-L
or --locator_suppress
-M
or --meta_comments
-S
or --dubious_suppress
If you do not specify any selection options, you will get the whole text from the archive, only the comments are removed. This could be done much easier, so you probably want to select some text for the extraction.
The selection options gives you a way to extract folios, transcriptions, Currier languages and hands.
-f folio-range
or --folios=folio-range
n
n:m
:n
n:
-i illustration-codes
or --illustration=illustration-codes
If you specify an unknown illustration code, this is not an error. It selects silenty nothing.
-l language-code
or --language=language-code
There is not a Currier language associated to every page. If you use this option, all pages without a known language will be hidden.
If you specify an unknown language code, it selects silently nothing.
-m hand-code
or --hand=hand-code
There is not a Currier hand associated to every page. If you use this option, all pages without a known hand will be hidden.
-t transcriber-codes
or transcriber=transcriber-codes
It is not an error, if you specify an unknown transcriber code. It just does not select anything. If all given transcriber codes are unknown, you get an empty result.
By default, all transcribers are included in the result.
Most examples are given in the short option syntax, which is fast to type and hard to read. While doing text analysis on the Voynich manuscript and trying different approaches, this is most probably the way you will type options on the prompt.
But if you write some more complex scripts, which you want to share with others, please use the much better readable long options. It will help you and other people to understand what happens.
Extract only the Currier transcription.
viat.pl -t C
Extract only the Currier transcription, but do not include the line locators into the extraction. Calling the program this way, you get the plain transcription text.
viat.pl -t C -L
Extract only the Currier transcription for all pages with the Currier language A.
viat.pl -t C -l A
Extract only the Currier transcription, suppress the line locator, replace all spacing characters by real spaces and replace all known weirdos by the EVA character codes.
viat.pl -t C -L -s -w
Do the same thing, but use the long option syntax.
viat.pl --transcriber=C --locator_suppress --space_convert
Select all pages with currier language B from the T. Takahashi transcription.
viat.pl -t H -l B
Do the same thing, but use the long option syntax.
viat.pl --transcriber=H --language=B
At the moment, there are no known bugs. But there is a strange behaviour, which does not affect the functionality of the program:
If you choose to include meta information comments in the extracted file, you will get the information at the begin of every page, even if there is no selected Voynich text in the page. It is not a bug, because it only creates comments, not contents. But it is strange and should be done better some day.
There is no good error checking in the program. If, for example, the parameter for an option is expected to be numeric, but you give it some non-numeric value, the program silenty malfunctions. It should not be a real problem, but it can cause confusion, especially if the program is used in a pipe and the error gets not visible immedieatly.
I wrote this program as a personal replacement for the VTT
tool, which
is hard to read, hard to bugfix or change and hard to compile. There was
no thought on giving it away, while I was writing it.
But in the mailing list I found that other people experiences problems
with the VTT
program too. So I wrote some documentation and prepared
the program for publishing.
I expect that other users will miss some features that I do not need. Feel free to contact me via mail, if you want an additional feature - I will implement it, if the implementation is easy. But the feature should be a text extraction feature, not a formatting or analysis feature.
Please keep in mind that I am not a native speaker of the English language and do not expect me to understand every nuance of that language. I hope, that my documentation is understandable to everyone who want to use the program. If my english is clumsy or wrong, feel free to correct me.
viat.pl was written by Michael Winkelmann, michael@weltretter.de