phonetic speech in SimpleText

Phonemic representations

The following list explains the phonemic symbols you can input after the "[[inpt PHON]]" command.

vowels	consonants
AE as in cat EY as in fate AO as in ought AX as in about IY as in meet EH as in let IH as in fit AY as in site IX as in roses AA as in hot UW as in rude UH as in look UX as in mud OW as in rope AW as in mountain OY as in boy	b as in boy D as in them C as in church f as in fun g as in gag h as in hat J as in jump k as in cat l as in lip m as in mud n as in nag N as in sing p as in pet r as in rub s as in sat S as in shin t as in tap T as in thick v as in vet w as in wet y as in yet z as in zebra Z as in pleasure

pauses

% silence
@ breath intake

prosody

1 primary lexical stress
2 secondary lexical stess
= syllable break
~ destress
_ normal stress
+ emphatic stress
/ pitch rise
\ pitch fall
> lengthen duration of phoneme
< shorten duration of phoneme

Notes: Stress symbols are placed before the vowel that they apply to. /,\, >, and < go before the phoneme they apply to. They work better if applied to, but are not limited to vowels.

Embedded commands

These commands are normally delimited with "[[" and "]]" The name of the command follows the "[[" followed by a blank, then followed by the parameter[s] to the command. Multiple commands may appear between the same delimiters, separated by semicolons.

pbas <pitch> or pbas <delta>

sets the baseline pitch to a number between 1 and 127 or increments or decrements the current baseline pitch. Each number is a half-step in music. Middle-C is 60. If you insert "[[pbas 60]]" in your text, the central pitch will be middle-c. If you later insert "[[pbas -12]]" the central pitch will drop one octave.

pmod <depth>

controls the amount of change in pitch. In normal speaking we vary our pitch and the synths mimic this in their prosody routines. "[[pmod 0]]" causes everything following it to be spoken at the baseline pitch (monotone) with no variation. The range of values for depth are in question. Some sources say 100 is the maximum, others say 127, and we have experimented with values up to 200. In real life, I mainly use "pmod 0" to make it sing a syllable at a particular pitch.

inpt TEXT or inpt PHON

The default mode of the synth is inpt TEXT. It seems that if you start out with inpt PHON, your first character needs to be a "~" or a "_" for some reason.

rate <wpm>

alters the average number of words per minute. Normal rate of speech is about 110 words per minute.

volm <volume>

sets the volume between 0.0 (silence) and 1.0 (the maximum volume of the speaker setting.)

slnc <msec>

inserts silence for the indicated number of milliseconds.

rset 0

returns the synth to default settings.

char NORM or char LTRL

toggles between whether the synth spells out the letters and digits in every word or number. Default is NORM.

nmbr NORM or nmbr LTRL

toggles between whether numbers are read as "one thousand, three hundred and ninety two" or "one, three, nine, two" Default is NORM.

dlim <startdelim> <enddelim>

changes the delimiters for embedded commands from their normal "[[" or "]]". You would do this if you needed it to actually talk about "[["

vers <vers>

allows possible future versions to be backwards compatible, just in case Apple ever changes the format of the embedded commands. Current version is the hex number "$0001".

Note: This text is borrowed from "How to Get More out of Hypercard: Digitized and Synthesized Speech" by Colleen Dick. Many thanks.

Hosted by www.Geocities.ws