Lecture 8

Andreas Moshovos

Spring 2007


Loops and Arrays


In this section we will be implementing in assembly the following C pseudo-code:


short arr[5] = { 1, 2, 3, 4, 5 }; // an array of word values (16 bit)

short n = 5;            // the number of elements in the array

short sum = 0;


for (i = 0; i  < n; i++)

   sum = sum + arr[i];


How are arrays are implemented at the machine level? They are typically implemented by allocating each element one after the other in memory.

So, in our example, if arr[0] is at memory location $30000 then arr[1] will be at $30002, arr[2] at $30004 and in general arr[i] will be at $30000 + i x 2. Generally, if we have a unidimensional array A of elements TYPE, then element a[i] is at address &A[0] + sizeof(TYPE) x i.

Note that the C expression &A[0] is equivalent to the address of the first array element. Instead of &A[0], in C we can typically write A (note: in C if you write &A[0] + i, it will be automatically converted into &A[0] + I x sizeof(a[0]));


Let us first declare our variables:


            org $30000

arr        dc.w    1, 2, 3, 4, 5 ; note assembler doesn’t like spaces in-between the numbers and commas

n          dc.w    5

sum      dc.w    0


Focusing now on the code we should review what is the execution semantics of the C for statement. In general, a C statement comprises four parts:





and execution is done as follows:




if (NOT COND) exit the loop



                      go back to “start”


where start is label. In words: The INIT part is always executed once at the beginning. We then test the condition (COND). If the condition is not TRUE then this is the end of it we skip the for. Otherwise, if the condition is TRUE, we then execute the BODY portion, followed by the POST portion. We then return back to testing the condition and repeat the aforementioned steps until the condition stops being TRUE.


In our loop we have:

INIT à i = 0

COND à i < n

BODY à sum = sum + arr[i]

POST à i = i + 1


Let’s write each of them in turn.


For starters let us use a register for holding variable i. Let’s use D0 for this.

Then INIT becomes:


clr.l D0  à D0 = 0


Our condition requires comparing the current value of i and n. Assuming that n does not change in value while our loop executes (and it shouldn’t) then we have the code:


move.w n, D1 à keep n’s value in D1


so now we can test for the reverse condition using a cmp instruction and jump out of the loop as needed.


cmp.w d1, d0

bge endloop


Finally, the POST section becomes:


addq.w #1, d0 à d0 = d0 + 1


Notice that we use a variation of add called addq (q = quick). It too adds, however, it can only be used for small integer values. The advantage is that the whole instruction can be represented with just two bytes.


The almost complete part of our code is then as follows:


     org $20000


     clr.l     d0

     move.w    n, d1

loop cmp.w     d1, d0

     bge       endloop




     addq.w    #1, d0

     bra       loop



Let’s use register d3 to keep the running sum and at the end we will write this value into the sum variable in memory. So, we should initialize d3 to 0 in the beginning and at the end of the loop write its value into the sum variable.


Now we can focus on implementing the BODY part. In 68k the A registers are generally used for calculations on quantities that will be used as addresses and for accessing memory in ways other the ones we have seen.


We can rewrite sum=sum + a[i] as:


            tmp = arr[i];

            sum = sum + tmp;


Assuming that tmp will be held into a register now we have to devise a way of reading arr[i] into that register.

To implement the statement tmp=  arr[i] we need to be able to access all array elements one after the other. We have seen instructions that access memory but the used a fixed address (e.g., move.w arr, d0). We could use five of these instructions if we knew that the array will always have five elements, but what if the array can have 16K elements? Or what if the number of elements was not known upfront? To deal with we need a mechanism that will allow an instruction to access a program calculated memory location. One such way is as follows:


move.w (a0), d2


The “(a0)” expression signifies a memory access. The memory address is not fixed in the instruction itself. Instead it is taken from the value of register inside the parentheses. Thus, the aforementioned instruction does “d2 = Mem[a0]”.


To clarify things:


movea.l a0, d2             copies the value of a0 into d2

movea.l (a0), d2          first does a memory read using a0’s value as the address and then copies the four bytes read from memory into d2


Here’s one more example:


            org $20000

     dc.l 1, 2, 3

     org $10000

movea.l #$20000, a0     à a0’s contents become $20000

adda.l $4, a0           à a0 becomes $20004

move.l a0, d0           à d0 becomes $20004

move.l (a0), d0         à d0 becomes $2

suba.l #4, a0           à a0 becomes $20000

move.l (a0), d0         à d0 becomes $1


Back now to our for loop and the code for accessing array arr[i]


To access arr[i] we need to access the word at memory location “arr + i  x 2”. The code for that is:


movea.l #arr, a0   à   a0 = &arr[0], notes: we have to use movea when the destination register is an a register.

Since this is an address we must use long-word arithmetic

adda.l    d0, a0    à        a0 = &arr[0] + i

adda.l    d0, a0    à   a0 = &arr[0] + i + i = &arr[0] + 2 x i

add.w     (a0), d3 à   d3 = d3 + arr[i]


The complete code for the for loop is as follows:


     org $20000


     clr.l     d0        ; i = 0, we use the .l since i takes part in .l calculation

     clr.l     d3        ; sum = 0 (temporarily sum lives in d3)

     move.w    n, d1     ; d1 = n


loop cmp.w     d1, d0

     bge       endloop   ; if (i >= n) goto endloop


     movea.l   #arr, a0  ; a0 = &arr[0]

     adda.l    d0, a0   

     adda.l    d0, a0    ; a0 = &arr[0] + 2 x i

     add.w     (a0), d3  ; d3 = d3 + arr[i]


     addq.w    #1, d0    ; i = i + 1

     bra       loop      ; repeat loop



     move.w    d3, sum   ; write the sum into memory


Machine language support for loops


Because loops are so frequent, the 68k designers decided to provide an instruction that supports them directly. Note that this instruction does not provide functionality that cannot be synthesized otherwise. At the time it provided an advantage in processing speed and code density thus it was viewed as a good way of utilizing resources (transistors). Some modern architectures (e.g., PowerPC) include such instructions.


The instruction is the Decrement and Branch and takes the form:


DBcc Dn, destination


cc is any branch condition, Dn is a D register and destination in assembly will be label.


The semantics of the instruction are non-trivial:


            1. if (not cc) then Dn = Dn – 1 (word calculation only – only the lower 16 bits of Dn participate)

            2. if (Dn != -1) then PC = PC + displacement (essentially execution resumes at the destination)


In the first step, the instruction tests the condition and if the condition is FALSE then it decrements the corresponding D register by 1.

In the second step, the D register is compared to -1. Only if the D register is not -1 this instruction changes control flow. Otherwise, execution falls through. The destination, as in branches is specified as PC relative displacement.


If we always want to decrement the D register then we can use the “dbf” variation, where cc = f = false.


This instruction is useful for implementing do-while loops. It can also be used to implement other loops with the addition of a few instructions to handle the first iteration.


Let’s see an example similar to the one with the for loop:


            i = n - 1;

            do {

                        sum = sum + arr[i];

                        i = i – 1;

            } while (i >= 0);


One assembly implementation is:


     org $20000


     clr.l     d3        ; sum = 0 (temporarily sum lives in d3)

     clr.l     d1

     move.w    n, d0     ; d0 = n

     subq.w    #1, d0    ; d0 = n - 1




     movea.l   #arr, a0  ; a0 = &arr[0]

     adda.l    d0, a0   

     adda.l    d0, a0    ; a0 = &arr[0] + 2 x i

     add.w     (a0), d3  ; d3 = d3 + arr[i]


     dbf       d0, loop  ; cc = f = FALSE à always decrement



     move.w    d3, sum   ; write the sum into memory


Note that this code is not equivalent to the for code. The for code will not execute its BODY if n is 0.


For your reference, we could implement the same code without the DBF as follows:


     org $20000


     clr.l     d3        ; sum = 0 (temporarily sum lives in d3)

     clr.l     d1

     move.w    n, d0     ; d0 = n

     subq.w    #1, d0




     movea.l   #arr, a0  ; a0 = &arr[0]

     adda.l    d0, a0   

     adda.l    d0, a0    ; a0 = &arr[0] + 2 x i

     add.w     (a0), d3  ; d3 = d3 + arr[i]


     subq.w    #1, d0    ; d0 = n - 1

     bge       loop



     move.w    d3, sum   ; write the sum into memory