Audio DSP - Terminology

DSP Terminology

A signal can be defined as the output of a function with two or more parameters. For the purposes of this paper, a signal is defined as the relationship of amplitude and time.

DSP has its own set of terminology that conveys the various concepts. It is important to understand how they relate to one another. Digital signal processing is always discrete processing, even when it is pretending to be continuous. To be fair, it is continuous in the environment in which it exists. The real issue is that we are analog creatures, and that is how we want to view the world around us. Our senses are continuous in our environment as well, but our range is limited. Let's define continuous signals as ones that are not constrained by time and that are infinitely variable. Continuous signals cannot be perfectly modeled because modeling requires quantifying parameters. Our definition disallows quantifying either parameter. The process of sampling in the time domain whether we are inputting or outputting is called quantization. In signal processing we are concerned with two parameters: time and amplitude. Conversion of one type of signal into the other requires that we quantify both of the variables. One of these variables is fixed, the other is independent within a fixed range and at a fixed resolution.

Therefore, begin by understanding that since digital signals are discrete (i.e. They are a finite dataset.) they have some issues which you must resolve. Still, DSP is several orders of magnitude better than the continuous analog equivalent in terms of signal decomposition and synthesis. Another huge advantage is that digital systems can be created that perform several operations at the same time. Typical audio applications that are normally performed by separate modules can be applied at the same time. For example, wow and flutter, 60hz line filtering and digital noise reduction can all be applied to a signal in the frequency domain, and the signal reconstructed in one step.

The nature of the difference between discrete signals and continuous ones is such that discrete signals are subject to a phenomenon known as aliasing. Like many aspects of discrete signal processing, aliases have multiple sources, and show themselves in multiple ways. Aliases can be separated into two general categories: those that are based on the fixed variable and those based upon the quantized independent variable. The aliases appear in the fixed variable's domain. What that means in the case of audio is that the aliases appear in the frequency domain, and consist of some frequencies being shifted and also there is some false signal amplitude at some frequencies. Aliases can be minimized in the analog and digital domains. Aliases that remain after digitizing the signal, as well as those created in the digital domain are referred to as artifacts. Artifacts that appear as a result of improper sampling cannnot be completely eliminated because they cannot be quantified, nor necessarily distinguished from the original signal. Processing artifacts are known quantities. There is an additional set of signals that appear in sampled data that represent something in the input signal path. These are often lumped in with artifacts, or aliases because they all represent types of noise, but these signals are ultimately the easiest to quantify and eliminate. (Or even ignore. They often appear outside of the band of interest or as an easily ignored noise component.) Some types of data acquisition systems do not recognize one or more of these components and fail as a result. This is not trivial engineering by any stretch. You may not need to understand the math, but that in no way obviates the need to use it. Oversampling is guaranteed to create aliases, the trick is to create them where they do not matter. Oversampling also requires more processing in the frequency domain.

Computers are base-2 entities, so all DSP is best undertaken using powers of two. It will prove itself to be faster, more accurate, and easier to understand. The audio system in Windows was not designed as a power of 2 (or any) based sampling system. We will use it as such, and in cases where it is necessary we will pad or rate convert.

DSP can be reduced to three types of operations:

Data Conversion
Signal Decomposition
Signal Composition

The sampled data can exist in either the time domain or the frequency domain. The domain that the data is in when an operation is applied is of great import. Many operations can only be applied in one domain. Reverb, for example is a function of the time domain. The DFT algorithm can be applied to data in either domain, but in the frequency domain, it is hundreds of times faster. Digital filters work in the frequency domain. It is possible to create very sharp transitions when working in the frequency domain. Our system needs to be able to move data between the two domains if it is to be of any sort of practical use. Decomposition of time ordered sampled data into the frequency domain is accomplished by reordering the samples according to a simple formula. This formula is based on the sample rate, and is implemented as a bit reversal sort on a computer when the sample rate is an integral power of two. Practically speaking, it is an equal division of the samples into two sets until it cannot be divided further, a sort of turning inside out, if you will. It is commonly referred to as bit reversal sorting, interlace decomposition, butterflying, and any number of other terms depending on the application at hand. It is an important deterministic concept in computer science, and its application is accordingly quite broad in reduction of force.

Domains

In DSP there are two ways of looking at data, and two ways to represent it. When you sample audio data using a sound card in Windows, you are collecting time domain data. (It is a set of instantaneous voltage readings taken at fixed and equal intervals of time.) The analog equivalent might be observing the waveform on an oscilloscope. The same data could then be passed through a discrete Fourier transform, which moves it into the frequency domain. The analog equivalent of that would be observing the waveform on a spectrum analyzer. In the time domain, the signal is amplitude vs. time. In the frequency domain it is amplitude vs. frequency. It should be immediately apparent that any sort of processing that involves the frequency response of the system is best done in the frequency domain, and any processing that is based on, or affects time is best done in the time domain. Once you grasp that concept the rest will follow more naturally. Operations done in one domain have a different effect in the reciprocal domain. DSP algorithms exist for convolution, summation, differentiation, and integration. Operations done in the time domain can be most accurately categorized by the term amplitude modulation. While it is true that frequency modulation can be done in the time domain, it is much more difficult, slow and less accurate in practice. In this case, the reciprocal is also true, namely frequency modulation is best applied in the frequency domain, that is the term that most accurately describes it, and operations are similarly affected. The two most powerful techniques that exist in the time time domain are interpolation and decimation. Both are powerful tools in all aspects of signal processing. Use of these techniques is often called multirate processing. The primary use of multirate processing is reduction of aliases by low-pass filtering. It is most useful in establishing a sharp transition at the upper end of the signal's frequency range, which is 1/2 of the sample rate.

Sampling Theory

Remember aliases? They are signals not present in the input that appear as the result of digitization. The two factors that affect the generation of aliased information are the sample rate and the range of the converter. Let's examine the range first, since it is the more straightforward of the two. All input to a digital system must be within a predefined range of values. Values that are outside of that will be clipped, and amplitude information will be lost. A typical example might be an 8-bit ADC, which can assign 256 values to the input (2⁸ = 256) There are several ways to perform an analog to digital conversion, but all of them involve a comparison to a known value. That value is called a reference. The most straightforward sampling method is one called successive approximation. Successive approximation generally involves a series of comparisons. The ADC compares the input to the reference value, and turns bits on according to the result of the comparison. Then, it subtracts from the reference, and repeats the process until the LSB is reached. (Now can you see how an alias gets created when the input is between the last comparison's zero and one value?) We will examine a few solutions to this aspect of aliasing later. The resolution of the conversion system is most dependent on the width of the converter in bits. (i.e. It is 1/256 on an 8-bit system, 1/65536 on a 16-bit system.) As a straightforward example, the resolution of an 8-bit ADC with a reference voltage of 5.0 volts is 0.01953125 volts. As a bit of practical advice to the reader, I feel compelled to point out that there are two types of input and output being referred to in DSP, single-ended and double-ended. Single-ended I/O is DC; the input is not allowed to go below what is being used as ground. (Usually about 0VDC.) The more familiar double-ended I/O is AC, and is what we will be using. The ground is in the middle of the range in this case, so the input can swing 1/2 of the maximum in either direction. Excursions will still be clipped as they are out of range. In DSP, the resolution and accuracy are referred to in terms of the least significant bit. The LSB has the potential of creating the most bothersome aliases because it is on the end closest to what I refer to as the "infinite" zero. It is as if the ADC is chasing that value, but cannot ever reach it at the proper time. A great deal of engineering effort has been put into addressing this issue. Most ADCs have an accuracy of 1/2 LSB with a resolution of 1 LSB. It should now be apparent that a great deal more information can be conveyed by wider data paths, and that it will more closely approximate the input.

...Fine, But What About Nyquist?

Most discussions of DSP start out by mentioning the fact that you need to sample at twice the expected maximum frequency of interest to extract valid data. This theory was developed by Harry Nyquist initially, and later expanded by Claude Shannon. I leave examination of that as an independent exercise for the reader. For our purposes here, it suffices to take it as read that we will always follow that rule, since not following it creates an unuseable dataset. Still, Nyquist did alot of pioneering work much like what we will be doing, except he had no computer, or calculator. (In fact, let me be very blunt about it, we are only trying to duplicate and understand his efforts; the man is a legend.) While you are at it, examine Bell's Harmonic Telegraph, which was the basis of the telephone. Bell was all ready to make a mechanical telegraphic multiplexer in 1874! Also, it is no coincidence that Bell Labs is at the forefront of all sorts of technological advances, from the first transistor to the C progamming language and beyond. It is a shame that we view "Ma Bell" with such disdain; she has been very good to us.

Quantization

The process of digitizing a continuously variable signal is known as quantization. (You are quantifying the two parameters, time and amplitude.) The infamous aliases are often referred to by the term quantization errors. It is often beneficial to view quantization errors as digital noise, and there is alot of DSP theory dedicated to quantifying and eliminating such noise. As a practical matter, it can be largely ignored as long as you are not stretching the system you are using too much. (In most instances, the analog noise is far more problematic.) We will not ignore it entirely, but let's get a system going first, then we will evaluate it and improve upon that.

Math

It should be obvious at this point that this is not a paper for my Math class. Math is not among my high cards, I'm afraid. In fact, my understanding of trigonometry has only increased as my understanding of DSP has. So, if you think that your lack of math savvy is holding you back...let me take that excuse away from you right now. That is a valid way of understanding something: relating it to something you already know. That is perhaps the most important thing you will read in this document because it applies to any type of learning situation.

In DSP, it is more important to understand relationships than the mathematical theory behind them. (At least at first it is.) An sinewave can be defined as a pure fundamental frequency. A squarewave is the fundamental and an infinite number of odd harmonics. Sawtooth, and triangle waves contain various ordered harmonic content, as well. Now, we are nearly ready to make our first bit of Eureka! code: a function generator. The four waveforms that I just defined for you are all that there is. Everything else in the universe of DSP is noise.

DSP operates largely on the shoulders of the trigonometric identities sin and cosine. Therefore, it works in terms of sinewaves, the combination of sinewaves used to create the four types of waveforms possible, and noise. Remember that the noise is of various types, and also remember to ignore it for now. A nearly perfect sinewave can be created digitally using the sin plotted at fixed intervals. This process is called convolution because it combines two parameters to produce a controlled output. The first parameter is time, and it is set by the sample rate chosen. The next is amplitude, and it is varied within the range provided by multiplying the maximum allowable value by the sin of the reference angle. The concept of a reference angle is important to DSP algorithms because it reveals the where given the when. A sinewave can be said to travel through time in a circular manner from 0 to 359 degrees. (Its amplitude rotates evenly about 0 in exactly the same manner forever.) Its instantaneous value at any given point in time is directly related to the sin of the rotating angle at that time. Recalling that the value of sin is restricted to the range -1.0 to +1.0 allows us to use it to set the instantaneous amplitude within the allowable range by multiplying it by the maximum value. Now, let's discuss that value a little.

In our instance, we are using 16 bit wide double-ended I/O. The data is in the Two's-Complement format. This is by far the most prevalent situation in DSP. DC only signals are usually single-ended, and the data is in the range 0 to 2^{number of data bits}. Therefore, the use of a sign bit is not necessary, unless the signal extends below ground, which makes the signal double-ended anyway. Our function generator will never produce a clipped output because our convolution algorithm has built-in automatic gain control. The maximum non-clipping value in a Two's Complement based data acquisition system is 2^{number of data bits - 1} - 1, or 32767 in the case of a 16-bit system.

Creating a sinusoidal waveform consists of finding the sin of the reference angle, which changes according to the frequency of the desired output signal, and the sample interval. The angular rotation of the signal per sample interval is given the name delta in our convolution routine. The angle's value is given the name theta. The simplest way to express the algorithm is:

amplitude = sin(theta) * MAX_AMPLITUDE
theta = theta + delta

Delta is found by first multiplying the desired output frequency by 360. This number is the total angular rotation of the signal in one second. Next, to find the rotation per sample interval, divide the total rotation by the sample rate. Now, you have delta. Here is a sort of pseudocode version of a 16-bit sinewave generator:

	delta = output frequency * 360 / sample rate
	theta = 0
	volume = (2^data width - 1) - 1
	for i = 0 to sample rate - 1
		data word = sin(theta % 360) * volume
		write data word to buffer
		theta = theta + delta
		increment buffer pointer 
	next i

This sort of waveform synthesis is performed in the time domain. (The for/next loop is the changing time, and the amplitude is set within the loop for each sample.) There is plenty you can do in the time domain to synthesize signals, but in order to create complex waveforms easily it is necessary to work in the frequency domain. Also, it is very difficult to decode AC signals in the time domain, and audio filtering is impossible for all practical purposes. Certainly, working in both is advantageous. The next page will cover the audio function generator software.

Audio Function Generator

Hosted by www.Geocities.ws