EKT Hypothesis by Dennis J. Stallings September 4, 1997 (c) by Dennis J. Stallings, 1997 My hypothesis is that the concealment system for the VMs is a homophonic word game or orthographic system. I have devised a system called EKT that would account for the presence of Voynich A and B, the low variety of digraphs (the low second-order entropy of the text), and the (relative) absence of long repeated phrases. *Word Games* Pig Latin is the best-known word game in English, but it is not a good example of what I've got in mind, since it involved a transposition as well as addition of nulls. A better example is Opish. In Opish, you add "op" before each vowel. "The sunflower is a marvellous plant with powerful virtues that must needs be concealed from the ignorant and uninitiated." becomes: "Thope sopunflopowoper opis opa moparvopellopous plopant wopith popowoperfopul vopiropues thopat mopust nopeeds bope coponcopeald fropom thope opignoporopant opand opunopinopitopiopatoped." The system that interests me the most is called King Tut. One makes the following substitutions: A - a I - i R - rur B - bub J - jug S - sus C - cut K - kam T - tut D - dud L - lul U - u E - e M - mum V - vuv F - fuf N - num W - wuv G - gug O - o Y - yec H - hush P - pup Z - zuz Q (as is) X (as is) - from *The Cat's Elbow and other Secret Languages*, collected by Alvin Schwartz and pictures by Margot Zemach, 1982; a children's book, although it also has good scholarly references. King Tut or Double Dutch, p. 45-47. "The sunflower is a marvellous plant with powerful virtues that must needs be concealed from the ignorant and uninitiated." becomes: "Tuthushe susunumfuflulowuverur isus a mumarurvuvelullulousus puplulanumtut wuvituthush pupowuverurfufulul vuvirurtutuesus tuthushatut mumusustut numeedudsus bube cutonumcutealuledud fufruromum tuthushe igugnumoruranumtut anumdud unuminumitutiatutedud." Of course, the fact that English represents the single phonemes th and sh with two letters adds some confusion to this system. This system interests me because: 1) There's enough variation in the substitutions that it's not easily detectable. 2) The format substitutes C with CVC, preserving the normal CVCV alternation of natural language. This keeps the entropy low, even with greater variety in the substituted syllables compared to other word games. 3) The repetition of short "words" in the VMs is very obvious; it leaps off the page at you. Common trigraphs or short words would be expanded to longer strings that would look like words. For example: "for" becomes "fuforur", "the" becomes "tuthushe", "that" becomes "tuthushatut" "are" becomes "arure" *Extended King Tut (EKT)* With modifications, the King Tut system can account for other properties of the Voynich text. I shall call this modified system Extended King Tut (EKT). A homophonic system, one that allows multiple alternatives for the substitution tags (from now on I'll just call them "tags"), could lead to the presence of Voynich A and B. A and B would have different preferences for alternatives they would use, and this would account for the statistical differences between Voynich A and B. This phenomenon is well known from wartime experience with code clerks. Each clerk had his/her personal preferences for which code group to use where there was a choice, and this fact helped enemy codebreakers. However, a homophonic system leads to another problem. A homophonic system, one offering multiple alternatives, would decrease predictability and therefore *increase* entropy. However, it is the second-order entropy of the Voynich text that is low. A system that did not widen the digraph distribution, (that is, the digraphs that we see, the Currier digraphs) would not affect our readings of second- order entropy. Suppose the tags only contained a few different base digraphs, much fewer than the total number of tags. If the tags were constructed from a limited pool of base digraphs, there might not be a net increase in second-order entropy in comparison to a similar non- homophonic system (like unmodified King Tut). *Sample EKT System* To illustrate this, I will construct an Extended King Tut that would show roughly the same differences in the Currier letter frequencies (in English!) that Voynich A and B do. Please bear in mind that this system is only an example illustrating the principle involved and is undoubtedly much simpler than what might really be in use in the Voynich Manuscript. For vowels: A is equally frequent in A and B. E is more frequent in B. O is more frequent in A. I,U are rare (< 0.1%). For consonants: Z,X, and B are equally frequent in A and B. N,F, and C are more frequent in B. P,Q,R,S,M, and J are more frequent in A. Y,W,V,U,T,K,L,G,H, and D are rare (< 0.5%). So, for the base digraphs- A only uses: OP, OR, OS B only uses: EC, EF, EN A and B both use: AB, AX, AZ I then use only these 9 base digraphs to construct the tags for Extended King Tut. Adding only 9 distinct base digraphs in large numbers will surely result in a low second-order entropy for the text. A B A B Char Only Either Only Char Only Either Only ---- ---- ------ ---- ---- ---- ------ ---- A N nor nax nef B bop bab bec O C cor cax cef P pos paz pen D dos daz den Q E R rop rab rec F fop fab fec S sor sax sef G gor gax gef T tos taz ten H hos haz hen U I V vop vab vec J jop jab jec W wor wax wef K kor kax kef X L los laz len Y yos yaz yen M mop mab mec Z zop zab zec "The sunflower is a marvellous plant with powerful virtues that must needs be concealed from the ignorant and uninitiated." A might write this sentence as: "Toshaze saxunorfablazoworerab isor a moparopvopelazlazousor pazlosanaxtos waxitoshos posoworerabfopulos vopirabtosuesaz toshosataz mabusaxtaz naxeedossax bope coronorcaxealazedaz fabropomop toshose igornaxoropanaxtos anaxdaz unorinoritaziatazedaz." B might write this sentence as: "Tenhene soruneffeclazoweferec isef a mabarabvabelenlazousef pazlazanaxten wefitazhaz pazowaxerabfabulen vabirabtazuesax tazhenataz mabuseften nefeedensef bece cefonaxcaxealazeden fabrabomab tenhaze igefneforabanaxtaz anefden unefinefitaziatazedaz." In reality, of course, the tags probably are short words or related to short words to give them mnemonic value. Remember, too, that the art of memory was much more highly developed in those days when books were scarce and writing techniques inconvenient. The system could easily be more complex than this. A and B could both have used all three alternatives, merely having different relative preferences. There could have been more than three tags for each phoneme. *Word Divisions* The word games that increase predictability and decrease entropy also make words longer. Voynich words are rather short. However, the exact meaning of the "word divisions" remains unsolved. Since EKT words are so long, under this hypothesis the divisions in the text could not be word divisions. Some possible explanations: 1) Voynichese could be a monosyllabic language, like Chinese. I know of no monosyllabic languages that have ever been spoken in or around Europe, so this seems unlikely. 2) The word divisions could in fact be syllable divisions. I have never seen examples of a phonemic writing system that did this, but there's always a first time! A language that does not make clear word divisions in speech seems like a more likely candidate for this. Examples are medieval and modern spoken French, spoken Japanese, and many Native American languages. French is the only one of these to have been spoken in or around Europe. 3) Word divisions could be due solely to the orthography. On Thu, 27 Mar 1997, Jacques Guy wrote: "They [word breaks] congregate near spaces, ends of lines, and ends of paragraphs (or starts). Those are not necessarily word boundaries. Thus, for instance, if I write "head" in Arabic, I have to write it: r a s. But if I write "river" I write it: nhr (no spaces). This is merely because some letters can connect to the left, some to the right, some left and right, and some not a all. Note that in Voynichese most letters that occur at "word" breaks feature a flourish. Is the flourish a variant of the letter when it occurs word- finally, or is it what prevents the next letter to connect to it? We don't know." *EKT as a Homophonic Orthographic System.* EKT could just as well be conceived as an orthographic system as an oral word game. Consider that in English one may see the unvoiced palatal fricative written in at least four different ways: sh - normally sch - in German words and names sz - in Polish names ch - in French words and names. Thus there are four choices, which share various subunits - s, h, and ch. This is conceptually like the EKT word game. Finally, the EKT substitution tags are probably not all the same length. My example showed tags that were all three letters long, but they could well vary from 1 to 5 or even more letters. *Labels* Under EKT, words from common European languages would become rather long, while the labels in the VMs are rather short. Labels could be abbreviations or numerals.