NAME
    viat.pl - The Voynich interlinear transcription archive tool

SYNOPSIS
    "perl viat.pl [OPTIONS]"

DESCRIPTION
    This program is an extraction tool for the Voynich interlinear
    transcription archive. It allows you to select interesting parts
    from that file for your analysis.

    The main purpose of the program is extraction, not formatting. So do
    not expect it to do fancy formatting. If you want to get a nice
    formatted HTML document, pipe the output into the vhtml.pl tool.

    The program is intended to be used in a pipe, which is the most
    natural way of data processing in unixoid systems. So the program
    output is written to the standard output.

OPTIONS
    There are a plenty lot of options. If you want to see some easy to
    use examples, take a look at the section on "EXAMPLES".

    Please note that all options and parameters are case-sensitive, the
    "-t" option is different from the "-T" option, and specifing a
    transcriber "-T c" will silently select nothing instead of the
    Currier transcription.

    You can give the options in two styles. The short, one-letter
    options preceeded by a single dash in the Un*xoid style, and the
    long options preceeded by a double dash in the GNU style. Both
    styles are equivalent.

    If you use the long options, you can abbreviate them as long they
    are unique.

  Getting information

    "-v" or "--version"
        Display the version of the program.

    "-I" or "--show_illustration_codes"
        Show all codes for illustration types and exit.

    "-T" or "--show_transcriber_codes"
        Show all code for transcribers and exit.

  General options

    "-h" or "-?" or "--help"
        Show an option overview as a short help.

    "-F path" or "--file=path" or "transcription=path"
        Specify the path of the interlinear transcription file to use.
        By default, the program looks for a file named 'text16e6.evt' in
        the current directory.

  Output options

    If you do not specify any output options, the extracted text is
    given to the output in raw format, without any kind of processing.

    In most non-trivial cases, you want to do some processing.

    "-c" or "--comment_keep"
        Keep all comment lines in the output. By default, they are
        suppressed.

    "-s" or "--space_convert"
        Convert all spacing (interpunctation, line, percent) characters
        to real spaces.

    "-w" or "--weirdo_convert"
        Replace the weirdos by EVA codes whereever possible. Not every
        transcription has the correct weirdo notation.

    "-C" or "--comment_suppress"
        Remove all comments from the output text. Every text in curly
        brackets which is not eventually replaced as an explicit weirdo
        is handled as a comment.

        This feature may ease the job of processing the transcription.

    "-L" or "--locator_suppress"
        Do not output the line locator in front of a line of text.

    "-M" or "--meta_comments"
        Output some meta information for every page of the manuscript.
        Every line of the meta information starts with the sequence #-,
        to make the job of parsing easy. By convention, the # is handled
        as a comment line marker, and so the meta information is
        extracted as a comment.

    "-S" or "--dubious_suppress"
        Remove all dubious word breaks in the output. If two Voynich
        "words" are separated only by a dubious space, they will be
        written to the output as one word.

  Text selection options

    If you do not specify any selection options, you will get the whole
    text from the archive, only the comments are removed. This could be
    done much easier, so you probably want to select some text for the
    extraction.

    The selection options gives you a way to extract folios,
    transcriptions, Currier languages and hands.

    "-f folio-range" or "--folios=folio-range"
        Show only the text from the given folio range. The range
        specification can be given in the following ways (the letters n
        and m represents the folio numbers):

        "n" If you specify a single folio number, only this folio will
            be printed to the output.

        "n:m"
            You can specify a range by using a colon between the folio
            numbers. If your first page number is greater than your
            second one, this type will be corrected silently.

        ":n"
            If your specification starts with a colon, all folios from
            the first up to the given folio number will be printed.

        "n:"
            If your specification ends with a colon, all folios
            beginning from the given folio up to the last one will be
            printed.

    "-i illustration-codes" or "--illustration=illustration-codes"
        Select only the pages with the given illustrations.

        If you specify an unknown illustration code, this is not an
        error. It selects silenty nothing.

    "-l language-code" or "--language=language-code"
        Select only the pages with the given Currier language code. The
        language code is A or B.

        There is not a Currier language associated to every page. If you
        use this option, all pages without a known language will be
        hidden.

        If you specify an unknown language code, it selects silently
        nothing.

    "-m hand-code" or "--hand=hand-code"
        Select only the pages with the given Currier hand code. The hand
        codes are 1, 2, 3, 4, 5, A or B. You can combine more than one
        hand code.

        There is not a Currier hand associated to every page. If you use
        this option, all pages without a known hand will be hidden.

    "-t transcriber-codes" or "transcriber=transcriber-codes"
        Select only the text from the given transcriber. You can specify
        more than one transcriber. The transcriber are given by single
        characters with the following meaning:

        C   Currier

        D   Currier, second choice

        F   First study group

        G   First study group, second choice

        H   Takeshi Takahasi

        I   Jim Reed, second choice

        J   Jim Reed

        K   Karl Kluge

        L   Don Latham

        M   Don Latham, second choice

        N   Gabriel Landini

        P   Father Th. Petersen

        Q   Karl Kluge, second choice

        R   Mike Roe

        T   John Tiltman

        U   Jorge Stolfi

        V   John Grove

        X   Denis V. Mardle

        Z   Rene Zandbergen

        It is not an error, if you specify an unknown transcriber code.
        It just does not select anything. If all given transcriber codes
        are unknown, you get an empty result.

        By default, all transcribers are included in the result.

EXAMPLES
    Most examples are given in the short option syntax, which is fast to
    type and hard to read. While doing text analysis on the Voynich
    manuscript and trying different approaches, this is most probably
    the way you will type options on the prompt.

    But if you write some more complex scripts, which you want to share
    with others, please use the much better readable long options. It
    will help you and other people to understand what happens.

    Extract only the Currier transcription.

    "viat.pl -t C"

    Extract only the Currier transcription, but do not include the line
    locators into the extraction. Calling the program this way, you get
    the plain transcription text.

    "viat.pl -t C -L"

    Extract only the Currier transcription for all pages with the
    Currier language A.

    "viat.pl -t C -l A"

    Extract only the Currier transcription, suppress the line locator,
    replace all spacing characters by real spaces and replace all known
    weirdos by the EVA character codes.

    "viat.pl -t C -L -s -w"

    Do the same thing, but use the long option syntax.

    "viat.pl --transcriber=C --locator_suppress --space_convert"

    Select all pages with currier language B from the T. Takahashi
    transcription.

    "viat.pl -t H -l B"

    Do the same thing, but use the long option syntax.

    "viat.pl --transcriber=H --language=B"

FILES
    text16e6.evt
        The file containing the interlinear transcription. This file is
        expected to be in your current directory by default, but you can
        specify any path and any file name using the -F option.

BUGS
    At the moment, there are no known bugs. But there is a strange
    behaviour, which does not affect the functionality of the program:

    If you choose to include meta information comments in the extracted
    file, you will get the information at the begin of every page, even
    if there is no selected Voynich text in the page. It is not a bug,
    because it only creates comments, not contents. But it is strange
    and should be done better some day.

    There is no good error checking in the program. If, for example, the
    parameter for an option is expected to be numeric, but you give it
    some non-numeric value, the program silenty malfunctions. It should
    not be a real problem, but it can cause confusion, especially if the
    program is used in a pipe and the error gets not visible
    immedieatly.

TODO
    I wrote this program as a personal replacement for the "VTT" tool,
    which is hard to read, hard to bugfix or change and hard to compile.
    There was no thought on giving it away, while I was writing it.

    But in the mailing list I found that other people experiences
    problems with the "VTT" program too. So I wrote some documentation
    and prepared the program for publishing.

    I expect that other users will miss some features that I do not
    need. Feel free to contact me via mail, if you want an additional
    feature - I will implement it, if the implementation is easy. But
    the feature should be a text extraction feature, not a formatting or
    analysis feature.

    Please keep in mind that I am not a native speaker of the English
    language and do not expect me to understand every nuance of that
    language. I hope, that my documentation is understandable to
    everyone who want to use the program. If my english is clumsy or
    wrong, feel free to correct me.

AUTHOR
    viat.pl was written by Michael Winkelmann, michael@weltretter.de

