Welcome to this little bit of documentation about the Motorola DSP 56001 that can be found in the Falcon 030. This text contains : 1. The Registers. 2. The Memory organization. 3. The Instruction set. 4. The XBios commands. This text is for the Dnt-Paper and other interested people. It is a pre-release version and may not be distributed yet. The text is written by AvIvA of DFS (Dead Fish Society). I wish you a lot of fun coding this little beastie. Mention me in your demos! :-) Additionnal remarks, debugging, and updates: NulloS/DNT-Crew. 1. Registers : 23 0 23 0 ------------------------------------ | X0 | X1 | ------------------------------------ 47 X 0 ------------------------------------ (Y register looks the same) 23 8 7 0 23 0 23 0 ----------------------------------------------------------- XXXXXXXXXXXXXXXX| A2 | A1 | A0 | ----------------------------------------------------------- XXXXXXXXXXXXXXXX55 A 0 ----------------------------------------------------------- (B register looks the same) XXXXX : Means nothing when writing, but when reading, this contains the extension! All memory address(ing) registers (Rn, Nn, Mn), Loop Counters (LA and LC), Program Counter (PC) and on-chip system stack (SSH and SSL) look like this: 23 16 15 0 ---------------------------------- | EEEEEEEE | Register n | ---------------------------------- EEEEE : Means nothing when writing, zero when reading. The following registers are available for memory addressing : Address Register (Rn : n = 0 to 7) Offset Regsiter (Nn : n = 0 to 7) Modifier Register (Mn : n = 0 to 7) Hardware loop registers : Loop Address (LA) Loop Counter (LC) On-chip system stack memory : High : SSH 0 to 15 Low : SSL 0 to 15 SSH8 : Writes PC on JSR SSL8 : Writes SR on JSR SSH9 : Reads only PC on RTS SSH12 : Reads/Writes to/from LA on REP or DO_LOOP SSL12 : Reads/Writes to/from LC on REP or DO_LOOP Operating Mode Register (OMR): 23 8 7 6 5 4 3 2 1 0 ----------------------------------------- | EEEEEEEEEEEEE |EA|SD|EE|EE|EE|DE| M | ----------------------------------------- EA : External Memory Access Mode - 0 : None - 1 : Waitstates SD : Stop delay - 0 : 64k - 1 : 8 clock cycles delay on STOP DE : Data ROM enable - 0 : RAM - 1 : ROM M : Operating Mode - 00 : Single Chip non expanded - 01 : Special Bootstrap (56001 only) - 10 : Normal Expanded - 11 : Development Expanded Status Register (SR) : ------------------------------------------------- | MR | CCR | ------------------------------------------------- 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ------------------------------------------------- |LF|EE|T |EE| S | I |EE|L |E |U |N |Z |V |C | ------------------------------------------------- LF : Loop Flag T : Trace Mode S : Scaling Mode - 00 : No Scaling - 01 : Scale Down (Right Shift) - 10 : Scale Up (Left Shift) - 11 : Reserved I : Interrupt Mask (IPL0 - IPL3) L : Limit (latching overflow bit) E : Extension - 0 : A1/B1 is 0 or $FF U : Unnormalized N : Negative Z : Zero V : Overflow C : Carry Stack Pointer Register (SP) 23 6 5 4 3 2 1 0 ------------------------------ | EEEEEE |UF|SE| P0 - 4 | ------------------------------ UF : Underflow Flag SE : Stack Error Flag P0 - 4 : Stack Location (1 - 15) Notes : 111110 : Stack underflow condition after double PULL 111111 : Stack underflow condition 000000 : Stack empty after Reset (PULL causes underflow) 000001 : Stack location 1 001111 : Stack location 15 010000 : Stack overflow condition 010001 : Stack overflow condition after double PUSH 2. MEMORY ORGANISATION : 2.1. X Memory (16k ): $0000 - $00FF : Internal memory. $0100 - $01FF : ROM tangens table or external RAM $0200 - $3FFF : External RAM $FFC0 - $FFFF : On-Chip periphery 2.2. Y Memory (16k ): $0000 - $00FF : Internal memory. $0100 - $01FF : Rom sinus-tables or external RA< $0200 - $3BFF : External RAM $3C00 - $3FFF : TOS routines (for loading etc. ) 2.3. P Memory (32k ): $0000 - $01FF : Internal Memory or 32 * 24bit bootstrap ROM and 64 * 24 bit Interrupt commands. $0200 - $7FFF : External program RAM (overlaps X and Y memory) 3. The DSP56001 instruction set: All arithmetical instructions operates on the entire 56 bits precision, with an accumulator as destination operand. The following abbreviations have been used in the descriptions of the commands : // move : parallel move MSP : Most Strong Part (A1 or B1, bits 47-24) LSP : Less Strong Part (A0 or B0, bits 23-0) MS ; Most Strong bits (e.g, 8 MS bits of X0 are 23-16) LS : Less Strong bits (e.g, 8 LS bits of A are 7-0) #n : 6-bit value (immediate). #x : 8-bit value (sometimes also 24-bit, eg. with MOVE) #X : 12-bit Value. X : 48-bit X and y register. X0 : 24-bit X0/X1/Y0 and Y1 register (if X1 or Y0 are specified, this means X0 register only, of course!). ea : register pointed memory (Rn)-Nn;(Rn)+Nn;(Rn)-;(Rn)+;(Rn);(Rn+Nn);-(Rn) pp : 6-bit-absolute peripherical address aa : 6-bit-absolute RAM address xxx : 12 bits short adress xxxx : 16 bits long adress ar : All registers (though for a couple of commands some registers may not be used. Try! :-) ) X0;X1;Y0;Y1;A0;A1;A2;B0;B1;B2;A;B;Rn;Nn;Mn;SR;OMR;SP;SSH;SSL ;LA;LC + : Add or substract(-) for MAC/MACR/MPY/MPYR Condition Codes : CC/HS : Carry Clear / Higher Same : C = 0 CC/LS : Carry Set / Lower : C = 1 EC : Extension CLear : E = 0 EQ : Equal : Z = 1 ES : Extension Set : E = 1 GE : Greater Equal : N XOR V = 0 GT : Greater Than : Z + (N XOR V) = 0 LC : Limit Clear : L = 0 LE : Less Equal : Z + (N XOR V) = 1 LS : Limit Set : L = 1 LT : Less Than : N XOR V = 1 MI : Minus : N = 1 NE : Not Equal : Z = 0 NR : Normalized : Z + ((NOT U) AND (NOT E))= 1 PL : Plus : N = 0 NN : Not Normalized : Z + ((NOT U) AND (NOT E))= 0 Command : ABS D // move : Yes D : Accumulator. Description : This command turns the value currently in the Accumulator into an absolute value using 2's complement. Command : ADC S,D // move : Yes S : 48bit source value (X or Y). D : Accumulator. Description : The value S is added with the Carry and put into the Acc. This is for 96-bit-addition (quad' precision). Command : ADD S,D // move : Yes S : Acc., X or X0, for 56bit, 48bit and 24bit adding respectively. D : Accumulator. Description : S is added to the accumulator. Command : ADDL S,D // move : Yes S : Accumulator D : Accumulator Description : The accumulator is shifted left and added to the accumulator. D=D*2+S Command : ADDR S,D // move : Yes S : Accumulator D : Accumulator Description : The accumulator is shifted right and added to the accumulator. D=D/2+S Command : AND S,D // move : Yes S : X0 D : Accumulator Description : Logical AND on 24 bits. Changes only A1/B1 Command : ANDI S,D // move : No S : #x D : MR, CCR, OMR Description : Logical AND with direct data for Control Registers. Command : ASL D // move : Yes D : Accumulator Description : Left shift accumulator (entires 56 bits) Command : ASR D // move : Yes D : Accumulator Description : Right shift accumulator (entires 56 bits) Command : BCHG S,D // move : No S : #n D : ea, pp, aa, ar Description : Test and change n'th bit in ea,pp,aa or ar. Command : BCLR S,D // move : No S : #n D : ea, pp, aa, ar Description : Test and clear n'th bit in ea, pp, aa or ar. Command : BSET S,D // move : No S : #n D : ea, pp, aa, ar Description : Test and set n'th bit in ea, pp, aa or ar. Command : BTST S,D // move : No S : #n D : ea, pp, aa, ar Description : Test n'th bit in ea, pp, aa or ar. Command : CLR D // move : Yes D : Accumulator Description : D : Clear accumulator. Command : CMP S1,S2 // move : Yes S1 : A, X0 S2 : A Description : Compare S1 with S2 and set condition codes. Command : CMPM S1,S2 // move : Yes S1 : A,X0 S2 : A Description : Compare absolute value. Command : DIV S,D // move : Yes S : X0 D : A Description : Divide iteration... To get a complete division, this must be repeated 24 times. Here are three examples: ;Signed division, A is the 56 bits dividend value, X1 the 24 bits divisor. ;Result: 24 bits quotient in A abs a a,b and #$fe,ccr rep 24 div x0,a eor x0,b a0,a jpl ok neg a ok ;Unsigned division, A is the 56 bits dividend value, X1 the 24 bits divisor. ;Result: 24 bits quotient in A and #$fe,ccr rep 24 div x0,a move a0,a ;Xtended signed division. A is the 56 bits dividend, X1 the 24 bits divisor. ;Result: 48 bits quotient in A ;(Note: this process can be iterated to obtain various precision, but ; the following routine is very usefull for dx/dy algorithms, if you ; see what I mean...) abs a a,x1 and #$fe,ccr rep #24 div x0,a tfr a,b #<0,a0 rep #24 div x0,a tfr x1,b b0,a1 eor x0,b jpl ok neg a ok Command : DO S,D // move : No S : #X<,ea,aa,ar D : Label Description : Repeat S-times the commands between this one and label. Jumps or REP are not allowed to be the last command in the loop. The last command has restrictions and NOP's may be needed. Command : ENDDO // move : No Description : End hardware loop. The current hardware loop is killed, like a 'break' in C. Command : EOR S,D // move : Yes S : X0 D : Accumulator Description : Exclusive OR (24 bits, MSP of accumulator) Command : ILLEGAL // move : No Description : Illegal command, causes interrupt. Command : Jcc D // move : No D : xxx, xxxx, ea Description : Conditional jump based on flag registers. Command : JCLR #n,S,D // move : No S : ea, aa, ar, pp D : xxxx Description : Jump to label when the n'th bit is cleared. Command : JMP D // move : No D : ea, xxx, xxxx Description : Unconditional jump to label Command : JScc D // move : No D : ea, xxx, xxxx Description : Conditional jump to subroutine. Command : JSCLR #n,S,D // move : No S : ea, aa, pp, ar D : xxxx Description : Jump to subroutine if n'th bit is cleared. Command : JSET #n.S,D // move : No S : ea, aa, ar, pp D : xxxx Description : Jump if n'th bit is set. Command : JSR D // move : No D : ea, xxx, xxxx Description : Jump to subroutine. Command : JSSET #n,S,D // move : No S : ea, aa, ar, pp D : xxxx Description : Jump to subroutine if n'th bit is set. Command : LSL D // move : Yes D : Accumulator Description : Logical shift left accumulator (MSP, A1/B1). Fills with zeros, bit 47 in carry bit. Command : LSR D // move : Yes D : Accumulator Description : Logical shift right accumulator (MSP). Fills with zeros, bit 24 in carry-bit. Command : LUA ea,D // move : No D : Rn, Nn Description : Calculate and adapt address. Command : MAC +S1,S2,D // move : Yes S1,S2 : X0 D : Accumulator. Description : Signed multiply, result added/substracted to/from accumulator. X0*X0 and Y0*Y0 are not allowed. Command : MACR +S1,S2,D // move : Yes S1,S2 : X0 D : Accumulator Description : Rounded signed multiplay, result added/substracted to/from Accumulator. Same remark with X0 and Y0. Command : MOVE S,D // move : No!! Description : Copy. Rn and Nn aren't updated until next cycle. This command is equal to a parallel move with NOP. Here follows the different types of parallel move: . Immediate short data move: S : #x D : X0, A, Rn, Nn If D is A0, A1, A2, B0, B1, B2, Rn or Nn, the 8 bits short data is an integer, loaded in the 8 LS bits of the destination register (the remaining bits are zeroed). If D is X0, X1, Y0, Y1, A or B, the 8 bits short data is a signed fraction, loaded in the 8 MS bits of the destination register (the remaining bits are zeroed and / or sign extended). . Register to register S : ar D : ar Control registers are not allowed to be moved this way (i.e Mn, OMR, SR, CCR, SP, SSH, SSL, etc..). If S is an accumulator (56 bits value, A or B), its value is shifted/limited. When moving a 16 bits register in a 24 bits register, the 8 MS bits are zeroed. When moving a 24 bits register in a 16 bits register, the 8 MS bits are truncated. . Adress register update S : (Rn)-Nn, (Rn)+Nn, (Rn)-, (Rn)+ The specified adress register is updated according the ea used. This is a parallel move, and as always the new contents of Rn will not be available for use during the following instruction. The same remarks goes for all other parallel moves with Rn, Mn or Nn as destination register. . X memory data move S : X:xxx, X:xxxx, X:ea, X:aa, ar , ar , ar , ar , #X  D : ar , ar , ar , ar , X:ea , X:aa , X:xxx, X:xxxx, ar ar may be all but control registers. If S is an acumulator, its contents is shifted/limited according the scaling bits. . X memory and register data move These are combinations between previous // moves. Here are the valid ones: Memory move part / Register move part S: X:ea  D: A, B, X0, X1 / S: A, B  D: Y0, Y1 S: X:xxx  D: A, B, X0, X1 / same S: X:xxxx  D: A, B, X0, X1 / same S: A, B, X0, X1  D: X:ea / same S: A, B, X0, X1  D: X:xxx / same S: A, B, X0, X1  D: X:xxxx / same S: A, B  D: X:ea / S: X0  D: B, A S: A, B  D: X:xxx / same S: A, B  D: X:xxxx / same . Y memory data move Same as X memory data move, except Y: instead of X: !! . Y memory and register data move Same as X memory and register data move, but: Memory move part / Register move part S: Y:ea  D: A, B, Y0, Y1 / S: A, B  D: X0, X1 S: Y:xxx  D: A, B, Y0, Y1 / same S: Y:xxxx  D: A, B, Y0, Y1 / same S: A, B, Y0, Y1  D: Y:ea / same S: A, B, Y0, Y1  D: Y:xxx / same S: A, B, Y0, Y1  D: Y:xxxx / same S: A, B  D: Y:ea / S: X0  D: B, A S: A, B  D: Y:xxx / same S: A, B  D: Y:xxxx / same . Long memory data move The contents of the 48 bits register is moved from/to L: memory. The MSP is in X: memory, the LSP in Y: memory. If X, Y, AB or BA is the register(s) used, these are interpreted as two independant 24 bits fractions, shifted/ limited for AB or BA. If A10, B10, A or B is the register used, the value is a 48 bits quantity, shifted/limited for A or B. Note: A10, B10, Ab and BA may NOT be used in any other type of instruction or parallel move!!! . X and Y memory data move These are combinations of X and Y memory data move, with lotsa restrictions. X memory move part / Y memory move part S: X:eax  D: X0, X1, A, B / S: Y:eay  D: Y0, Y1, A, B S: X0, X1, A, B  D: X:eax / S: Y:eay  D: Y0, Y1, A, B S: X0, X1, A, B  D: X:eax / S: Y0, Y1, A, B  D: Y:eay S: X:eax  D: X0, X1, A, B / S: Y0, Y1, A, B  D: Y:eay eax and eay means (Rn)+Nn, (Rn)-, (Rn)+ or (Rn). But the two independant adress registers for X: and Y: memory must be (R0..R3) and (R4..R7) (e.g: eax=(R0) and eay=(R5) is ok, eax=(R6) and eay=(R2) is ok, but eax=(R0) and eay=(R1) is wrong, eax=(R5) and eax=(R7) is wrong). Examples: move x: