Build a Speech Synthesizer for VIC-20



Introduction:

Speech Synthesizers for legacy systems are getting harder and harder to find. Back in the day, there were the high-end units that featured text-to-speech translation processors. The Cadillac systems were the Votrax "Type-N-Talk" and "Personal Speech System." Then there were the low-end units, requiring manual translation of allophones or phonemes from tables in manuals, combined with PEEKs and POKEs, to form words and sentences. The purpose of this project is to simulate the high-end units of the time.

The biggest challenge today is finding modern parts that are willing to communicate at 1200 baud. For example, the SpeakJet allophones synthesizer, combined with a 8-bit microprocessor programmed with letter-to-sound rules for text-to-speech (such as the TTS256), popular in today's robotics, will only operate at 9600 baud. That is too fast for poor, old VIC!

These days it is actually easier (and cheaper) to dedicate an entire computer and software to the task versus a purely silicon approach. The dedication of a computer to a specific task as part of a larger system is not so different than the intelligent peripherals of the day, like disk drives and printers, where processing was offloaded to the device. Today this is common place. We're surrounded by dedicated systems interconnected in highly flexible ways. Even the "walled gardens" of our cell phones, tablets and consumer appliances have full-fledged operating systems underneath their slick user interfaces.

So, this solution does expose one to some really cool things: Raspberry Pi (University of Cambridge Computer Laboratory); Debian Linux configuration; hardware-level general purpose input/output (GPIO); TTL serial communications; logic level converters; the Festival (offline manual in PDF format) text-to-speech synthesis system (University of Edinburgh's Centre for Speech Technology Research and Carnegie Mellon University) which has a Scheme-based (SIOD) command interpreter for control; basic soldering techniques and more!

This project can easily be completed in a weekend, and done together with a child or friend. Only a Raspberry Pi, simple components and basic soldering are required. What you will have in the end is a unit that operates very much like the high-end Votrax systems of the day. ... You OPEN a command channel for writing and PRINT the sentences and words you want spoken. Now you're talking!

Theory of Operation:

Communication is only one way: Vic talks and Pi listens. For serial communication out of the User Port only two wires are required: CB2 and GROUND. The CB2 line flutters between +5v and 0v as bits of data flow down the wire one after another. The rate is fixed at 1200 bits per second. We set that rate when we open the channel: OPEN2,2,1,CHR$(8).

For Pi to listen to serial communications on its GPIO port only two wires are required: RXD and GROUND. Pi looks for oscillation between +3v and 0v on the RXD pin to identify each bit. The Pi has to know to listen at the same rate as the bits are flowing in, 1200 bits per second, so that it can reconstruct the data. The data is framed as 8 data bits separated by 1 stop bit. So long as the sender and receiver know how the data is packaged, and the rate it is flowing, the communication can happen.

The problem is the voltages, because we don't want to fry Pi by screaming at it with +5v. It is designed for +3v instead. That is where the logic level converter comes in. It lowers the line voltage out of Vic from +5v down to +3v by offering a step-down transformer between its RXI pin on the high voltage side and its RXO pin on its low voltage side. (It could also offer the service of a step-up transformer for data flowing the other direction, but we don't need it.)

That's about it on the hardware. All of the rest is software.

Parts needed:

 
Links to Sources
Raspberry Pi ($35) and The Festival Speech Synthesis System (free)
Offline Festival manual in PDF format.
Online Scheme (SIOD) documentation and information.
User Port Connector ($3.00)
Logic Level Converter ($2.00)
3 x Mini-Clip Jumpers ($9.00)
Wire (trivial cost)

Instructions:

1. Setting up a Raspberry Pi for the first time is outside the scope of this document. So, assuming you have a working system connected to the Internet, open a terminal and type:

sudo apt-get install festival

Try out Festival with:

echo "hello world" | festival --tts

2. Configure the serial port on Raspberry Pi:

3. Wire up the system:

The secret sauce to this whole recipe is actually the cheapest component, the Logic Level Converter. The Logic Level Converter is divided into two halves: a Low Voltage (LV) side and a High Voltage (HV) side. The Raspberry Pi will be connecting to the LV side. The VIC-20 will be connecting to the HV side.

The Logic Level Converter is further divided into three sections: Upper (Chan1), Middle (for power) and Lower (Chan2). You can use either Chan1 or Chan2 for this project. We will only be focusing on the RX line, RXI and RXO, where "I" is for INPUT and "O" is for OUTPUT. When referring to the Logic Level Converter in the table below I will specify which side of the board you are targeting / which hole you are targeting. For example, "LV/RXO," means, "the RXO hole on the LV side." It is assumed that you will be staying in either Chan1 or Chan2, and not using an RXO hole in one channel and an RXI hole in another channel.

Raspberry Pi
GPIO
Level Converter
LV
Level Converter
HV
VIC-20
User Port
P1-01
(3v3 Power)
LV/LV
HV/HV
2
P1-06
(Ground)
LV/GND
HV/GND
1
P1-10
(RXD)
LV/RXO
HV/RXI
M

You can use the 3 x Mini-Clip Jumpers for this project to target your Paspberry Pi GPIO pins. Solder everything else.

4. Power everything up.

5. Set the baud rate to 1200 on the Raspberry Pi, and fire up Festival. Open a terminal and type:

sudo stty -F /dev/ttyAMA0 1200
sudo festival --tts /dev/ttyAMA0
6. On the VIC-20, open the User Port command channel for writing at 1200 baud:

OPEN2,2,1,CHR$(8)
7. Test out your new VIC-20 Speech Synthesizer by printing to the logical file:

PRINT#2,"HELLO WORLD.  I AM THE VIC-20 BY COMMODORE."+CHR$(13)+"."
8. Clean-up task on the VIC-20; close the logical file:

CLOSE2

So what is going on in the Scott Adams adventure games?

The adventure games don't work because Commodore is sending data preceeded by an escape code or a phoneme delimiter. Festival doesn't know how to resolve them because they were unique to Votrax Type-N-Talk. My hope (wish) was that Commodore had chosen to send only text (printable ASCII characters not preceeded by an escape code or phoneme delimiter). My belief is that only a Votrax Type-N-Talk, and not even the Votrax Personal Speech System, will be the only way to get voice from these adventures. Bummer!

Of course, the Festival speech system is highly extensible. It has an embedded Scheme interpreter, API's and all kinds of other cool stuff to define new modules and work with language. Check it out to see what I mean.

It is theoretically possible that if someone invested enough time in understanding their framework, the Type-N-Talk might be able to be re-created in software with lookup tables for the escape codes and phonemes to be translated.

There is no info on the web to suggest this has been done alredy, but we could reach out to them and see if they know anything. It might be rather trivial to someone well versed in their system. Research and innovation is always about what is next rather than what once was, so maybe nobody has thought about it.

An argument could be made that the effort is worthwhile because it helps to preserve historical software artifacts. The Computer History museum in Mountain View has a number of volunteers and supporters in Computer Science and research fields, such as Gordon Bell, who might me inclined to take up the challenge.


This page is part of the Commodore VIC-20 Tribute site.
Rick Melick, South San Francisco, California, USA. Designed with Notepad. Powered by www.geocities.ws.