HomePage   Delphi Library  

 

WAVE for Telephony continued

(c) Copyright 1996 Bob Edgar, all rights reserved.

Speaking Phrases

Phrases involving variables, such as "your balance is one thousand and five dollars and six cents" are constructed by playing pre-recorded pieces in the appropriate order. This example phrase might be spoken by playing the following segments: "your balance is", "one thousand", "and", "five", "dollars" "and", "six", "cents". A typical system might have one or two hundred pre-recorded segments stored in sound files ready to play this kind of phrase. An application using WAVE might choose to store each segment in a separate WAVE file, but this would have several disadvantages. There would be significant overhead involved in continually opening, seeking and closing this large number of files. This solution is also more difficult to administer for the application designer and developer since the number of files to be installed and distributed becomes much greater.

The proprietary Dialogic API allows an application to build a phrase by providing an array of file handles and file positions (which might or might not all be in the same file), a single function call to dx_play can play the whole phrase in a seamless fashion. The Dialogic driver takes care of the required buffering and read-ahead to accomplish this. So-called "Indexed Prompt Files" or "VBASE40" files for Dialogic data formats store many segments inside a single file for optimal generation of phrases. Neither Microsoft nor Dialogic appears to have recognized the need to propose a new WAVE-derived file format standard for this type of file. We believe that a new Windows multi-media file format should be defined for this purpose so that WAVE data, a title ("Main Menu"), the full script for each segment, and other information (language, dialect, male/female voice, name of person recording...) can be stored. Then, applications from different vendors and third-party utilities such as VOX Studio-plus and Voice Information Systems' VFEdit would all support the same file format, with benefits to developers and end-users alike. Current TAPI/WAVE tools (including Parity's) use a crude derivative of the VBASE40 format, which is in our opinion inadequate as a long-term standard.

Parity Software has developed a proposal for an enhanced WAVE file format standard to meet these objectives. The proposed standard is called Segmented WAVE (SWV). Computer telephony hardware and software vendors are invited to comment on the standard, which is available for download from Computer Telephony Magazine's Web page http://www.computertelephony.com or from Parity Software at http://www.Parity.com. Following industry feedback, Parity will submit the standard for registration with Microsoft in January 1997.

Speed and Volume Control

TAPI does provide one special hook to the WAVE API, the lineSetMediaControl function. The application specifies a list of digits which, if detected on the line, are to control the speed and volume of the sound data being played (if supported by the WAVE driver). The following features may be assigned to digits: rewind, fast-forward, speed-up, slow-down, volume-up, volume-down, reset speed, reset volume, pause and resume. These could be done by trapping digit messages and using WAVE API functions such as waveOutSetPlaybackRate, but having the TSP implement this directly allows better response times (there might be unacceptably slow response in a busy system if the application were responsible for implementing these features). The Dialogic API provides for similar features. If supported by the TSP, the lineSetMediaControl function provides a (not very obvious) means to implement a common feature: touch-tone interruption and type-ahead through menus.

Summary

TAPI/WAVE applications developers will need to bear the following points in mind.

WAVE API makes simple functions (play or record file) difficult to implement.

There are many different WAVE file formats, some are proprietary vendor formats with a WAVE header.

Just because a telephony sound file follows the WAVE specification does not mean that desktop sound cards and WAVE editors understand the file.

WAVE API and WAVE file format lacks features for hardware-independent support of highly compressed voice files and for efficient and flexible building of phrases.

WAVE API does not provide full information on the types of WAVE data which a telephony card supports.

Return to White Papers page.

Main: Developer Resources: White Papers

Hosted by www.Geocities.ws

1