May 26, 2004. Added more Case 2 statistics.
May 25, 2004. Added Case 2.
May 14, 2004. Added comments on base case and added Case 1.
May 12, 2004. Opened statistics page.
Placed statistics for base case.
Here are statistics on the diary transcription.
although these do not consider the # mark for uncertain characters.
The Sukhotin/HMM comparison table shows that in fact the Sukhotin vowel algorithm and the Hidden Markov Model used in Stamp's paper might be saying the same thing. For standard and phonemic English the HMM placed consonants in state 0 and vowels in state 1. If we assume however that the Sukhotin algorithm does just the opposite, they are in complete agreement, if one makes a further assumption.
The assumption is that a model value ratio of 1.6 or more is sufficient to definitely place a result in either of the two states. Stamp stated that a ratio of 10 would be necessary, but these results make one question that. The samples of English were very large (around 6 million characters or phonemes) compared to the Hamptonese sample here (about 29,000 graphemes). That may well make the results less definite. Stamp did state that a sample of 10,000 characters is sufficient to get valid results for English, but at what requirements?
The other thing is the degree to which the /Y3 vv/ digram dominates the distributions. Further investigation shows that even the /Y3 vv Y3 vv/ string is rather dominant. Also, 90% of the occurrences of /Ki/ are in the digram /Ki /Ki/. We shall therefore assume that we need to treat these groups as single graphemes, which leads to Case 1.
/Y3 vv/ --> /Y3v/ /Y3 vv Y3 vv/ --> /Y6/ /Ki Ki/ --> /Ki2/ {/KiKi/ is too ambiguous}However, this did not include the same graphemes as with the Base Case LTCT and VFQ results, as well as 13 , qL , and HH.
Here are the resulting statistics:
/qL3 vv/ --> /qLv/However, this excludes the same graphemes as in Case 1, and also J.
This is the result:
For later.
END