NAME

wordseq.pl - Word sequence analysis


SYNOPSIS

perl wordseq.pl [filespec]


DESCRIPTION

The input for the program must be given as words, every word in a single line. The program creates a simple word sequence statistic, each word is followed by the following words and their frequencies. It is written for the study of the Voynich manuscript.


OPTIONS

--nosingles or -s
Do not include single occurencies of a word in the list.


OUTPUT FORMAT

For each word in the input a line of output will be generated.

The word is the first token, it is followed by a colon.

After the colon there is an entry for each following word, sorted by decreasing frequency of occurence. The entry contains

- The word
- The frequency of the word occurences in curly brackets
- A dash as end marker for the word

The line is closed by an equal sign.

After the equal sign follows the number of displayed words and, if you use the -s option, the number of suppressed words in curly brackets.

The last character of the line is the newline character.

The output format is generated in a way which allows you to pipe it to the showeva.pl tool, so you can read it in the EVA font. For this usage the frequencies are formatted as EVA inline comments.


EXAMPLE

perl viat.pl -t H | perl vword.pl | perl wordseq.pl | perl showeva.pl -

Create a word sequence statistic for the VMS transcription of T. Takahasi and show the statistic in the showeva.pl tool.


BUGS

The short source is far away from being an example of nice software engeneering. Do not read it, use it...


AUTHOR

wordseq.pl was written by Michael Winkelmann, michael@weltretter.de