Table Operations for the MAXQ Architecture
IntroductionThe MAXQ architecture describes a powerful, single-cycle RISC microcontroller based on the classic Harvard machine. A Harvard machine differs from the more commonly seen Von Neumann machine in an important design element: the Harvard machine's instructions and data are carried on separate busses. Because there is no contention for a single data bus, MAXQ instructions can execute in only a single cycle. The same operations would require multiple cycles on a traditional Von Neumann architecture.
The rigid separation between data and code in Harvard machines, however, presents its own set of challenges . Storing data tables in code space, a technique common in Von Neumann machines, is problematic for classical Harvard machines. Since only one operation can occur in a single cycle on a given bus, the CPU core cannot fetch an instruction on the code memory bus and fetch a memory operand from a data table in code space in the same cycle.
As a Harvard machine, one might assume that MAXQ microcontrollers also prohibit storage of data elements in code space. But because of ROM tools built into every MAXQ device, such table lookups are actually easy.
Table Lookup in Code SpaceReading a value from a MAXQ table in code space appears to be simple. However, programmers unfamiliar with the MAXQ architecture can become frustrated on their first attempt to do so.
IncorrectTableLookup: move dp, #w:StartOfTable move acc, @dp . . . ret . . . StartOfTable: dc16 01234h dc16 05678h dc16 098abh dc16 0cdefhThis above code will assemble perfectly, but after the second instruction the accumulator will almost certainly not contain 0x1234. The reason is simple. Unlike a Von Neumann machine in which there is only a single memory space and an instruction will take as many cycles as needed to complete, the MAXQ's move <reg>, @dp instruction implicitly accesses data space and completes in a single cycle with an instruction fetch from code space and a read from data space. The value loaded into the accumulator is whatever value was in data space at the same offset as StartOfTable in code space.
At first this problem seems intractable. After all, accesses to code space require a certain amount of time; the CPU core cannot squeeze two memory accesses into one clock cycle, even if the architecture permitted it. In the MAXQ architecture, however, small details about how the microcontroller maps physical memory blocks into the various memory spaces—and a few routines in the utility ROM—solve the problem.
First, in the MAXQ architecture the mapping of physical memory blocks into code and data spaces is not fixed, but changes depending on what physical memory block is being accessed. In general, programmers writing code to run in the flash memory space of most MAXQ microcontrollers will link their software to start at location 0 in code space. They will assume that RAM starts at location 0 in data space, and so it does.
But the MAXQ microcontroller has another piece of physical memory as well, the utility ROM. The utility ROM resides at location 0x8000 in code space in all MAXQ microcontrollers. User code can call routines in the 0x8000 page of the utility ROM to perform certain functions. Furthermore, as long as execution is being performed in the utility ROM, user code memory is remapped to a new location in data space.
When executing out of utility ROM, data RAM continues to be accessible at location 0x0000 in data space, but code memory is remapped to location 0x8000 in data space. Since code flash now appears in data space, the code running from the utility ROM can access information stored in user code as if it were data. Utility ROM functions are provided that simply read a value indirectly through the pointer registers and return.
Consequently, the routine given above would change somewhat:
BetterTableLookup: move dp, #w:StartOfTable + 08000h call UtilityROMGetDP0 . . . ret . . . StartOfTable: dc16 01234h dc16 05678h dc16 098abh dc16 0cdefhIn this example, the address to be read is adjusted to reflect the location to which flash is mapped during utility ROM execution, and is then loaded into DP. Rather than trying to read the data directly, a call is made to the utility ROM routine. Of course, instead of one cycle used in the direct data read, this operation takes four cycles: two cycles for a long call, one cycle to perform the read, and one cycle for the return operation.
A greater problem with this code example is that it will not assemble! The label UtilityROMGetDP0 is not defined, and for a good reason: the utility routines are not in the same place from one MAXQ microcontroller to another. The utility routines are, in fact, not guaranteed to remain in the same place from one version of a particular MAXQ device to another!
To resolve this issue, every MAXQ microcontroller's utility ROM contains an address table for the utility functions, and a pointer to the table at a well-known location: 0x800D. Specifically, the utility ROM contains the following code:
org 0800Dh dw UtilityFunctionTable . . . UtilityFunctionTable: . . . dw GetDP0 dw GetDP0Inc dw GetDP0Dec dw GetDP1 dw GetDP1Inc dw GetDP1Dec dw GetBP dw GetBPInc dw GetBPDecNote first, that the utility function table consists of addresses, not instructions. That is, the application programmer must retrieve the address and call it, not simply jump into the table. Second, note that the first memory reference function may not be the first entry in the table. Since each MAXQ microcontroller can contain different types and amounts of memory and different peripherals, each device will likely contain a different list of functions, at different relative offsets within the table.
Each of the three pointer registers has three associated functions, for a total of nine table lookup functions. The first function for each pointer register simply retrieves the value at the given address, while the latter two use the post-increment and post-decrement forms of the indirect load, respectively. In each case, the retrieved data is loaded into the GR register.
Now, our code looks like this:
CorrectTableLookup: move dp, #0800Dh ; Point to pointer to function table move acc, @dp ; acc now has pointer to ftable add #3 ; For 2000, GetDP0 move dp, acc ; Load ptr + offset to dp0 move a, @dp ; Get address of GetDP0 into A1 move dp, #StartOfTable + 08000h call a ; This will call GetDP0, finally! . . . ret . . . StartOfTable: dc16 01234h dc16 05678h dc16 098abh dc16 0cdefhNotice that once the address of the GetDP0 routine has been found, it can be stored away and reused. The first five instructions above only need to be performed once; after that, each table fetch takes only three cycles: the call, the read (executing from utility ROM), and the return (also running from utility ROM.)
Copying Tables from Flash to RAMA method for moving an entire table from flash to RAM can now be constructed from the table read functions. If the destination address is given in BP, for example, one method would be:
SlowTableMove: move dp, #0800Dh ; Point to pointer to function table move acc, @dp ; acc now has pointer to ftable add #4 ; For 2000, GetDP0Inc move dp, acc ; Load ptr + offset to dp0 move a, @dp ; Get address of GetDP0 into A1 move dp, #StartOfTable + 08000h move bp, #RAMDest ; Set this label to desired dest move offs, #0ffh ; Pre-decremented offset move lc, #4 ; Move four words TableMoveLoop: move dp, dp ; Set source pointer call a ; This will call GetDP0inc move @bp[++offs], gr ; Store retrieved word to dest djnz lc, TableMoveLoop . . . ret . . . StartOfTable: dc16 01234h dc16 05678h dc16 098abh dc16 0cdefhAs before, the first five instructions need only be executed once. After that, the table move can be performed as many times as needed, and the GetDP0inc subroutine address will remain in A1. The table move will require six cycles per iteration, plus setup overhead.
The move dp, dp instruction is required due to a quirk in the MAXQ architecture. As there is only a single address bus associated with data space, it must be set up at least one cycle ahead of any read performed on data space. In the table movement loop, a read is performed at the address given by DP, and then the write address is placed on the bus. But without the move dp, dp instruction, the write address would still be on the bus when it is the time to read the next location in the table. By inserting this apparently null instruction, the source operand address bus is refreshed in anticipation of the next read event.
There is, however, a better way to accomplish this task. The utility ROM includes a copyBuffer routine that performs this same function, but in fewer cycles. The copyBuffer routine is listed immediately below the table lookup routines in the utility ROM.
FasterTableMove: move dp, #0800Dh ; Point to pointer to function table move acc, @dp ; acc now has pointer to ftable add #12 ; For 2000, copyBuffer move dp, acc ; Load ptr + offset to dp0 move a, @dp ; Get address of GetDP0 into A1 move dp, #StartOfTable + 08000h move bp, #RAMDest ; Set this label to desired dest move offs, #0 ; No need to pre-decrement offset move lc, #4 ; Move four words call a ; This will call copyBuffer . . . ret . . . StartOfTable: dc16 01234h dc16 05678h dc16 098abh dc16 0cdefhThis copyBuffer routine reduces the cycle count per iteration to three, saving approximately half the time over the previous method. When the copyBuffer routine returns, LC will be zero, and the OFFS register will point to the next location after the last destination location written. Because OFFS is an 8-bit register, tables up to 256 words can be copied in this way.
An Example: String OutputA common task in many microcontroller-based projects is to output one of a number of canned messages to a console. Often, each message is given a number and a common routine must convert the number to the text of the message.
A commonly-used technique for this task is to zero-terminate each message string, . and provide a table that converts message numbers into the addresses at which each individual string resides. This technique is reliable and quick, but it means that two data structures must be established: the table of addresses and the strings themselves. A second technique simply places the zero-terminated strings into one large, contiguous memory space and searches linearly. While economical, this method is time-consuming at execution time because every character up to the intended string must be searched before the output begins.
A useful compromise uses length-delimiting instead of zero-delimiting. In this technique, the length of each string is given first, followed by the actual bytes of the message. In this way, unused messages can be quickly skipped, and the table itself is no longer than in the zero-delimited case. The only limitation to this compromise technique is that each string in the table is restricted to no more than 255 characters.
; ; Output String ; ; Enter with ACC=an index value (one based) indicating which ; string to output. ; ; On exit, LC0=0, DPC=0, ACC, A1, A2, DP0 used. ; output_string: move lc, acc ;Set LC0 to index of string move dpc, #4 ;Set DP0 to word mode move dp, #800dh ;Point to table of pointers move acc, @dp ;Get address of table add #3 ;Offset to GETDP0 routine move dp, acc ;Load pointer to table move a, @dp++ ;Get GETDP0 move a, @dp ;Get GETDP0INC move dpc, #0 ;Set DP0 to byte mode move dp, #string_table + 8000h str_search_loop: call a ;Get a string length djnz lc, next_str ;If not this string, go to next move lc, gr ;Otherwise, put len in LC0 move acc, @dp++ ;...and point past length out_loop: call a ;Get a char and bump pointer call char_out ;Output the character djnz lc, out_loop ;If more characters, loop ret ;Otherwise, we're done. next_str: move acc, gr ;GR contains len of this string add dp ;Add current ptr to current len... move dp, acc ;...to create a new pointer jump str_search_loop ;Jump back and test index again ; ; Each entry in the string table begins with the string length ; followed by the string characters. ; string_table: dc8 string1 - string_table dc8 "This is the first string." string1: dc8 string2 - string1 dc8 "This is a second example of a string" string2: dc8 string3 - string2 dc8 "A third string." string3: dc8 string4 - string3 dc8 "Finally, a fourth string in the array!!!" string4: