
Keywords: MAXQ2000, DSP, Audio Filter, Digital Signal Processing, MAXQ, MAC
Related Parts


Audio Filtering with the MAXQ2000
By: 
Kris Ardis, Executive Director for Energy Products, Maxim Integrated 

Abstract: The combination of a multiplyaccumulate unit (MAC) and a singlecycle core make the MAXQ2000 a versatile microcontroller (µC). The MAXQ2000 has the performance and I/O peripherals ideal for many applications: alarm clocks, handheld medical devices, digital readouts—any application that requires low power, high performance, and numerous I/Os. With the integrated MAC, the MAXQ2000 ventures into the territory of the DSP (µC).
Just how much performance can the MAXQ2000 get out of its MAC? This application note explores that question with an audiofiltering example, and gives some quantitative guidelines on the performance supported by the MAXQ2000.
Software and Hardware Requirements
This application note features a simple demonstration of an audio filter. The audio data is a prerecorded message of the author saying "The pipe began to rust while new." This text was not selected randomly—it provides a decent mix of frequency components, accentuating the audible effects of simple filters (
http://www.cs.columbia.edu/~hgs/audio/harvard.html). The audio recording can be replaced by any 8kHz recording of appropriate length, but it is not required.
The hardware required for this application note includes the MAXQ2000 evaluation kit and a small circuit to interface to a computer speaker.
The
MAXQ2000 evaluation kit which is available, is a good tool for exploring the MAXQ2000's capabilities. It includes an LCD panel, LED bank, and access to all the I/O pins of the MAXQ2000 µC. It also includes a MAX1407 ADC/DAC, which can be used for audio output.
The second piece of hardware required can be easily breadboarded. The circuit that was used for this demonstration is shown in
Figure 1. It calls for a 1 x 8 female header to connect to the MAXQ2000 evaluation kit board at J7, and another connection to any ground (TP1 on the MAXQ2000 evaluation kit is a good choice). The speaker connector can be of any type—shown is a 3.5mm stereo jack, which makes it simple to connect to typical computer speakers. Note that the two input channels are tied together, as our demonstration application is only showing one audio channel (mono sound).
Figure 1. Additional hardware required for audio playback.
The software required to run this demonstration was built and debugged using the IAR embedded workbench. It provides a good debugging environment, making use of the hardware debugging support of the MAXQ2000. You can set breakpoints, set and read registers and memory, and see your call stack while running on the real hardware.
Running the Demo Application
The pushbuttons on the MAXQ2000 evaluation kit board are used to select a filter, and to play the audio sample passed through that filter. Use button SW4 to select a filter—the name of the filter will be displayed on the LCD screen (HI for high pass, LO for low pass, BP for band pass, ALL for all pass). Use the SW5 button to begin playing the audio through the selected filter. The filter can be changed during the middle of playback.
Designing a Simple FIR Filter
I developed a Java™ applet that would let me easily create new filters. Rather than use standard windowing techniques given filter parameters, I chose to 'design' my filter grossly by placing zeroes in a polezero plot, as shown in
Figure 2. The applet allows placing zeroes anywhere in the coordinate plane, and continually updates the FIR filter coefficients required by the demonstration application. Note, however, that the demonstration only supports allzero filters. Supporting IIR filters would not be too difficult—more is explained in the
Supporting IIR Filters section.
Figure 2. Using a polezero plot to generate a simple FIR filter.
A generic filter takes the form of a linear equation:
y(n) + Σb
_{K}y(k) = Σa
_{J}x(j)
where k represents the order of the feedback portion of the filter and j represents the order of the feedforward portion of the filter.
An example IIR filter could be something as simple as:
y(n) = 0.5y(n1) + x(n)  0.8x(n1)
Some filters are classified as FIR filters. They do not contain the feedback portion. In other words, there is no y portion in the characteristic filter equation:
y(n) = Σa
_{J}x(j)
y(n) = x(n)  0.2x(n  1) + 0.035x(n  3)
In either case, a filter boils down to a characteristic equation that is essentially a weighted average of some number of the past input and output values. The job of filter design is to produce those A
_{j} and B
_{k} values. In order to compute the output of a filter efficiently, we need hardware support that can multiply and sum signed numbers quickly. Enter the MAXQ2000's multiplyaccumulate unit.
Implementing a Filter with a MultiplyAccumulate (MAC) Unit
The applet from the previous section works by computing the filter coefficients given the zero coordinates in the plot. However, the coefficients computed are floatingpoint numbers, while our MAC works with purely 16bit integer math. In order to correct the problem, the demo application uses a fixedpoint number system, where between 0 and 15 bits of the coefficients are to the right of the decimal point (the 16th bit represents the sign magnitude). Once the operation is over, the 48bit result in the MAC's accumulator is shifted enough places to remove any fraction.
This solution is a tradeoff of accuracy for speed. In many cases, the error from this approach is negligible. For diagnostic purposes, the applet shows three plots of the calculated filter. The first plot shows the ideal filter behavior, using 64bit floatingpoint numbers. This plot, labeled "Ideal Transform", is shown in Figure 2.
Figure 3 illustrates the remaining plots produced by the applet. The first plot in Figure 3 shows the effective filter using 16bit, fixedpoint numbers. In many cases, the error is not apparent, so the last plot is an error indicator, showing the ideal behavior divided by the actual frequency response. Ideally, this is a straight line at Y = 1.
Figure 3. Actual transform and roundoff error (with virtually no error) for a 16Bit implementation of the filter.
For simplicity, the applet produces the floatingpoint coefficients required by the MAXQ® application, so that new filters can simply be cutandpasted into the source for the filter application (into the file data.asm). The applet also produces two other values—the order of the filter (the number of coefficients) and the shift count, so the application can shift the final result appropriately. This data appears in the text box at the bottom of the applet, and might look like this:
Zeroes:
dc16
dc16 12, 11, 0x1000, 0x26d3, 0x1e42, 0xf9a3, 0xecde, 0xff31, 0xa94,
0x2ae, 0xfd0c, 0xff42, 0xde
Shift amount: 12
Implementing a Filter in MAXQ Assembly Language
In order to get maximum performance and perform an accurate performance analysis, the actual filter will be implemented in assembly language. This will allow us to accurately count the number of cycles required to produce one output value, and thus estimate the performance for other data sets.
The MAX1407 has a 12bit ADC. However, the input data is 16bits wide, and our filter produces a 16bit result. So, while those 4 least significant bits (LSBs) are wasted for this application, we can safely analyze our performance as if working on and producing 16bit values (CDquality audio is 16 bits).
In this example, the filter coefficients are stored in code space in a table. When a filter is selected, the application finds the appropriate filter, reads the shift amount and number of taps, and is then ready to start filtering data. The following code applies the filter coefficients:
move MCNT, #22h ; signed, multaccum, clear regs first
zeroes_filterloop:
move A[0], DP[0] ; let's see if we are out of data
cmp #W:rawaudiodata ; compare to the start of the audio data
lcall UROM_MOVEDP1INC ; get next filter coefficient
move MA, GR ; multiply filter coefficient...
lcall UROM_MOVEDP0DEC ; get next filter data
move MB, GR ; multiply audio sample...
jump e, zeroes_outofdata ; stop if at the start of the audio data
djnz LC[0], zeroes_filterloop
zeroes_outofdata:
move A[2], MC2 ; get MAC result HIGH
move A[1], MC1 ; get MAC result MID
move A[0], MC0 ; get MAC result LOW
Before this code is executed, LC[0] is set up with the number of taps for the filter, DP[0] is set up with the address of the current input byte to the filter, and DP[1] points to the start of the filter coefficients. Therefore, DP[1] works through the filter coefficients in an incrementing manner, and DP[0] works through the input data in a decrementing manner (processing the most recent input first).
Since the MAC works in a single cycle, there is not a lot of code here to handle it. MCNT is set to 22h to denote the use of signed integers. In the main loop, the consecutive writes to MA, then MB triggers the multiplyaccumulate operation—the result is ready in the next clock cycle. Since our accumulator is 48 bits (and our multiply result is 32 bits), we need not worry about any overflow (unless we have 64,000 taps in our filter!).
Performance
The sample application works on mono 16bit audio data being output at 8kHz—not nearly enough to tire out the µC. Because we wrote the filter in assembly language, we can easily count the cycles being used to come up with an expression for the amount of time a FIR filter of length N takes to compute. We can then use this expression to find the maximum filtering rate using the algorithm previously listed.
We can split the function we use to produce an audio sample into three portions: initialization, filter computation loop, and result fixing. In our released example, the initialization takes 38 cycles, the filter computation loop takes 17 cycles per filter coefficient, and the result fixing takes 9 + (6 x S) cycles, where S is the shift amount. Normally, the shift amount is around 12, so we can estimate result fixing at 81 cycles. Therefore, it takes 119 + (17 x N) cycles to produce one filtered output value. At 20MHz, a MAXQ2000 could run a 100tap filter at close to 11kHz, which is good enough quality for voice data.
Let us go back and reanalyze our application to see where we could tighten it up. We will concentrate in the filter loop, since this is where most of our cycles are happening on any but the most trivial of filters.
There are a few critical improvements we can make to the loop code to improve efficiency. Remember that we are using prerecorded audio samples that are stored in code space. Lookups to code space require more time than lookups in data space because of the MAXQ's Harvard architecture. The functions called UROM_MOVEDP1INC and UROM_MOVEDP0DEC take 5 cycles each (2 cycles for the LCALL, then 3 cycles inside the function). These could each be replaced by two cycles each if we stored our filter in the RAM (one cycle to select the pointer, one to read from it), and if we were serving realtime input data that was stored in the RAM. If we were willing to donate 256 words of RAM to the filter, we could implement a circular buffer using BP[Offs] to store the input data. These changes reduce the loop time to 11 cycles instead of 17. Our filter loop now looks like this (cycle counts listed first in the comments):
zeroes_filterloop:
move A[0], DP[0] ; 1, let's see if we are out of data
cmp #W:rawaudiodata ; 2, compare to the start of the audio data
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move GR, @DP[1]++ ; 1, get next filter coefficient
move MA, GR ; 1, multiply filter coefficient...
move BP, BP ; 1, select BP[Offs] as our active pointer
move GR, @BP[Offs] ; 1, get next filter data
move MB, GR ; 1, multiply audio sample...
jump e, zeroes_outofdata ; 1, stop if at the start of the audio data
djnz LC[0], zeroes_filterloop ; 1
Once we have our filter and input data in the RAM, we can use another trick of the MAXQ architecture. The MAXQ instruction set is highly orthogonal—there are very few restrictions on what can be used as a source in any operation. Therefore, rather than reading the filter data and the input data into GR, we can write it directly to the MAC registers. This brings the loop down to 9 cycles.
zeroes_filterloop:
move A[0], DP[0] ; 1, let's see if we are out of data
cmp #W:rawaudiodata ; 2, compare to the start of the audio data
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move MA, @DP[1]++ ; 1, multiply next filter coefficient
move BP, BP ; 1, select BP[Offs] as our active pointer
move MB, @BP[Offs] ; 1, multiply next filter data
jump e, zeroes_outofdata ; 1, stop if at the start of the audio data
djnz LC[0], zeroes_filterloop ; 1
One final improvement can make this code really fly. Every time through the loop, we compare our current data pointer to the start of the audioinput data to see if we are going out of bounds (the MOVE A[0], DP[0] statement, the CMP compare statement, and the JUMP E statement). If we set the initial audio data (which we are now reading with the circular buffer pointed to by BP[Offs]) to all zeroes, we can simply remove these checks. The cost of initializing the RAM to 0 is negligible, as compared to the 4cycle savings over the next several thousand samples. Our new loop code is a slim 5 cycles.
zeroes_filterloop:
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move MA, @DP[1]++ ; 1, multiply next filter coefficient
move BP, BP ; 1, select BP[Offs] as our active pointer
move MB, @BP[Offs] ; 1, multiply next filter data
djnz LC[0], zeroes_filterloop ; 1
Before going back to our performance equation, let us look at the result computation. Our current way of shifting the 48bit result down seems wasteful.
move A[2], MC2 ; get MAC result HIGH
move A[1], MC1 ; get MAC result MID
move A[0], MC0 ; get MAC result LOW
move APC, #0C2h ; clear AP, roll modulo 4, autodec AP
shift_loop:
;
; Because we use fixed point precision, we need to shift to get a real
; sample value. This is not as efficient as it could be. If we had a
; dedicated filter, we might make use of the shiftby2 and shiftby4
; instructions available on MAXQ.
;
move AP, #2 ; select HIGH MAC result
move c, #0 ; clear carry
rrc ; shift HIGH MAC result
rrc ; shift MID MAC result
rrc ; shift LOW MAC result
djnz LC[1], shift_loop ; shift to get result in A[0]
move APC, #0 ; restore accumulator normalcy
move AP, #0 ; use accumulator 0
One possible solution is to use our MAC once again. Rather than shifting right by 12 (or any value between 0 and 16), we can shift left by 16 minus that amount (i.e. shift left by 4). This will put our result in the middle 16bit word of the MAC's registers. Note that our shift left will actually be accomplished with a multiplyby2 to some power (2
^{4}, in the case where our original shift right was supposed to be 12).
;
; don't care about high word, since we shift left and take the
; middle word.
;
move A[1], MC1 ; 1, get MAC result MID
move A[0], MC0 ; 1, get MAC result LOW
move MCNT, #20h ; 1, clear the MAC, multiply mode only
move AP, #0 ; 1, use accumulator 0
and #0F000h ; 2, only want the top 4 bits
move MA, A[0] ; 1, lower word first
move MB, #10h ; 1, multiply by 2^4
move A[0], MC1R ; 1, get the high word, only lowest 4 bits significant
move MA, A[1] ; 1, now the upper word, we want lowest 12 bits
move MB, #10h ; 1, multiply by 2^4
or MC1R ; 1, combine the previous result and this one
;
; result is in A[0]
;
This will let us take our result computation to 12 cycles, rather than 9 + (6 x S) cycles.
Now let us plug back into our earlier equation. Our new equation uses a conservative estimate of 40 cycles of overhead and 5 cycles per loop iteration. Using the same example as earlier of a 100tap filter, the MAXQ2000 could process 16bit, mono audio data at 37kHz, as seen in
Table 1.
Table 1. Maximum FIR Filter Sample Rates (20MHz MAXQ2000, Looping)
Filter Length (Taps) 
Max Rate (Hz) 
50 
68965.51724 
100 
37037.03704 
150 
25316.4557 
200 
19230.76923 
250 
15503.87597 
300 
12987.01299 
350 
11173.18436 
There is one more performance improvement that we can implement for applications in which a higher sampling rate is required and code space can be sacrificed. We can 'inline' the filter coefficients, which eliminates both the need to select active pointers and the need to loop (this technique is also known as loop unrolling). The price of this change is increased code space—previously, our 100point filter took 100 words to store; now it will take 300 words to store (2 words for each coefficient move, 1 word for each data value move). In a 16 kiloword device, this may be an insignificant price for the performance benefit. The new code might look something like this:
move BP, BP ; select BP[Offs] as our active pointer
zeroes_filtertop:
move MA, #FILTERCOEFF_0 ; 2, multiply next filter coefficient
move MB, @BP[Offs] ; 1, multiply next filter data
move MA, #FILTERCOEFF_1 ; 2, multiply next filter coefficient
move MB, @BP[Offs] ; 1, multiply next filter data
move MA, #FILTERCOEFF_2 ; 2, multiply next filter coefficient
move MB, @BP[Offs] ; 1, multiply next filter data
. . .
move MA, #FILTERCOEFF_N ; 2, multiply next filter coefficient
move MB, @BP[Offs] ; 1, multiply next filter data
;
; filter calculation complete
;
To calculate the performance benefit of this change, we again assume 40 cycles of overhead, but now there are 3 cycles per loop iteration, though we have truly eliminated the loop. The 100tap performance limit now sits at 58kHz (see
Table 2).
Table 2. Maximum FIR Filter Sample Rates (20MHz MAXQ2000, Unrolled Loop)
Filter Length (Taps) 
Max Rate (Hz) 
50 
105263.1579 
100 
58823.52941 
150 
40816.32653 
200 
31250 
250 
25316.4557 
300 
31250 
350 
27027.02703 
Supporting IIR Filters
Though this application note does not demonstrate the use of IIR filters, there is no reason the MAXQ2000 cannot support them. The changes involved would be to:
 Dedicate a segment of RAM to store the latest output samples (this would be most efficiently implemented as a circular buffer, using the BP[Offs] register in a manner similar to that described previously)
 Include the characteristic filter coefficients for the feedback (the 'y' portion) of the filter
 Add another loop that continues to accumulate products that are the result of the feedback portion of the filter
While adding another loop may sound like a performance hit, it might not necessarily be. Although it will require more time to compute one output of the filter, IIR filters often require fewer taps (a smaller value for N) to compute an output value.
Conclusion
The MAXQ2000's performance and peripherals make it a great, generalpurpose µC. It can be used anywhere a fast, versatile µC is required, especially in applications requiring user interaction. The effective use of the MAC gives the MAXQ2000 some digitalfiltering capability, making the MAXQ2000 one of the most versatile µCs around.
Relevant Links
Related Parts 
MAX1407 
LowPower, 16Bit Multichannel DAS with Internal Reference, 10Bit DACs, and RTC 
Samples

MAXQ2000 
LowPower LCD Microcontroller 

MAXQ2000KIT 
Evaluation Kit for the MAXQ2000 

© Mar 24, 2005, Maxim Integrated Products, Inc.

The content on this webpage is protected by copyright laws of the United States and of foreign countries. For requests to copy this content, contact us.
APP 3483: Mar 24, 2005
APPLICATION NOTE 3483,
AN3483,
AN 3483,
APP3483,
Appnote3483,
Appnote 3483
