Lecture 2

Lecture 11

Andreas Moshovos

Spring 2007

Subroutines

Structured programming relies on subroutines. Restricting our attention to C, we could write a subroutine that accepts three numerical arguments and returns their sum. This subroutine can then be used (or called) from many different places in our program. That is we could for example write the following in C:

int add3 (int a, int b, int c)

{

return a + b +c;

}

int sum = 0;

main ()

{

sum += add3 (1, 2, 3);

sum += 10;

sum += add3 (10, 20, 30);

}

For C subroutines to work as intended we need the following functionality:

1. We should be able to call a subroutine from anywhere in our program. By “call” we mean being able to change control flow so that the routine is executed.

2. We should be able to pass parameters that may take different values across different calls.

3. A subroutine must be able to return a value.

4. A subroutine must be able to change control flow so that execution continues immediately after the point where it was called. Since a subroutine can be called from many different places this suggests that the routine should be able to differentiate between them and “return” to the right spot depending on where it was called from.

We can see these requirements at work in our example:

We can see that the add3 subroutine is called from two different places. The place a routine is called from is also referred to as a call site. The three parameters that add3 accepts take different values at each call site ((1,2,3) and (10,20,30)). Add3 “returns” a value which is the sum of its three arguments. So, every time add3 is called, it calculates a result and that result is returned as the subroutine’s value. This value can then be used in an expression (e.g., in add3(10,20,30) evaluates to 60). The first time add3 is called we expect execution to resume immediately after the statement that called it. So, once add3(1,2,3) returns execution will resume by adding the return value to sum. Then, the “sum += 10” statement will get executed. The second time add3 is called execution resumes immediately after the particular call.

Let’s review the way control flows in time with subroutines with aid of the add3 example:

This diagram also shows a call site, the caller and the callee. The call site is the place a subroutine is called. The caller is the subroutine that is making the call. The callee is the subroutine that is called.

We can now look at the list of required functionalities and since we are interested in how to implement subroutines in assembly we can add concerns/requirements that related to machine level constructs. Thus we need to provide answers for the following questions:

1. How does the subroutine returns immediately after the call site?

2. Where and how does it return a value?

3. Where and how are we passing arguments to a subroutine?

4. Where and how are we allocating storage for any local variables (i.e., variable that belong to the subroutine)

5. What happens to register values once a subroutine is called. Do we require that the subroutine preserves their values or is it OK to overwrite some registers.

We will address each of these issues in turn. For most machines there is a set of rules that all valid subroutines must follow. This set of rules is called the calling convention. This set of rules is not the only viable option for implementing subroutines. However, at some point someone decided on a particular solution. If we want a subroutine to interoperate correctly with other subroutines (possibly written by others) we have to follow these set of rules. This way someone else could also use our subroutines. We will be describing the calling convention used by gcc for the 68k family of processors. There are other conventions in use in 68k. At any given point of time one of them can be in use, or special code must be devised to translate from one form to the other (and we do not want that).

There are a few different options for providing the aforementioned functionality. We will present the solution used in 68k and once this is understood we will discuss some of the other options. Key to supporting subroutines in 68k is the use of a stack. This stack is used to provide the functionality explained in points 1 through 5 above. We first explain how a stack can be implemented in 68k machine code and then explain how the stack is used to support subroutines.

STACK:

Let’s review what stack is. Stack is a last-in first-out (LIFO) queue. In more detail, the stack is a data structure for which three operations are defined:

Push value
Pop
Top (distance)

The first operation adds a new element onto the stack. The order in which elements are added onto the stack is important. Internally, the elements are placed in a queue following exactly the order in which they were inserted. Push takes a single argument which is the value we will insert onto the stack. After a push, the number of stack elements increases by one. The pop operation removes the most recently inserted element within the stack. After a pop, the number of items in the stack is reduced by one. If the stack is empty then pop is not a valid operation.

Top returns the value of a stack element without removing it from the stack. Top accepts a single argument which specifies the relative to the most recently inserted element of the element we are interested in. So, top(0) returns the value of the most recently inserted element (this is also called the top of the stack , which corresponds to viewing the stack as a vertical queue with elements being placed on top of each other). Top(3) returns the value of the 4^th element in the stack (as measured starting from the top).

For example, assuming that initially the stack is empty here’s an example of how the stack operates:

Push 3 à (3)
Push 4 à (4,3) , where 4 is the top element
Push 10 à (10, 4, 3)
Top(2) à returns 3
Top(1) à returns 4
Pop à returns 10 and then the stack becomes (4, 3)

The 68k STACK:

By convention the 68k uses a stack to support subroutines (among other things such as interrupts and OS calls). The stack is implemented in memory with register a7 pointing to the top of the stack. That is, A7 contains a value which corresponds to the address of the top element of the stack in memory. The 68k stack (very much like most stacks for other CPU families) grows downwards. That is the stack starts at a high address and grows towards smaller addresses. The exact value the stack takes typically depends on the OS. We will assume that the stack starts at address $70000 in our examples. In most machines and OSes today a program is laid out in memory so that the program instructions appear first. Then follows the statically defined data (global variables), then comes the heap* which grows towards higher addresses. At the very end of memory starts the stack and grows downwards:

$20000	Instructions
	Statically defined data (global variables in C)
	Heap: Dynamically allocated data

$70000	Stack:

In 68k there is no way to represent an empty stack. The a7 always has a value and this always corresponds to a valid address (they are both 32 bit numbers). If you do use A7 to read from memory you will read the element at the top of the stack. If you did not push an element you are just reading the value that happens to be there where a7 points to. It would be garbage for all we care but you will still be able to read a value. So, whether the stack is empty and how elements it holds is a concept that only the programmer can understand and its all a matter of convention and of using the appropriate operations in sequence to achieve the effect we desire.

On another related note, normally, an element in memory is within the stack if its address is equal to or higher than the value of a7.

(*) the heap is used for dynamic memory allocation in several programming languages. For example, in C when you use malloc this is where the memory comes from. Similarly, in C++ you use “new” to get memory in the heap. In Java, objects are allocated within the heap. There is no need for hardware support for the heap. It’s all done with appropriate manipulation of register or memory variables. That is, the heap is a software construct. We may discuss more about the heap in subsequent lectures. But for our purposes we can safely ignore this topic for the time being.

PUSH value: Here’s how push can be implemented in 68k assembly. Let’s assumed that the value we want to push onto the stack is in register D0:

subq.l #4, a7 à grow the stack by four bytes (a long word)

move.l d0, (a7) à save the value of d0 onto the top of the stack

Because pop is used frequently, 68k incorporates an addressing mode that implements the above sequence of instructions using a single instruction:

move.l d0, -(a7)

This instruction does what the preceding two instructions did. The notation –(ax) where ax an address register (a0-a7) indication a pre-decrement addressing mode. The semantics of this addressing mode are:

1. Decrement the ax register by the size of the datatype (4 for longwords, 2 for words and 1 for bytes)

2. Access the memory location pointed to by the register ax

POP: Let’s assume that in addition to removing an element from the top of the stack pop also returns its value into register D0.

1. move.l (a7), d0 à read top value

2. addq.l #4, a7 à increment the stack point thus removing the top element

68k supports this directly as follows:

move.l (a7)+, d0

The (ax)+ is called the post-increment addressing mode. Its semantics are:

1. Access the memory location pointed to by ax (that is use ax’s value as an address to read or write memory)

2. Increment ax by the size of the datatype.

Top (index): Assuming that all elements in the stack are long words then we could access the ith element using a sequence of instructions. For example, if we assume that i (our index) is in register d0, then we can use:

add.l d0, d0 à assume d0 holds i

add.l d0, d0 à d0 = 4xi

movea.l a7, a6 à copy the top of the stack address into a temporary register (e.g., a6)

adda.l d0, a7 à address of the ith long word in the stack

move.l (a7), d1 à return value of the ith element into d1

Rather than going through this long sequence of instructions, 68k provides a way of accessing elements at fixed displacements from an address pointed to by an A register. For example:

move.l 4(a7), d0

returns the value of the second long word within the stack. Notice that we used a new way of specifying operands: 4(a7). This is similar to just “(a7)” but notice that we also have specified a constant (the “4”). The aforementioned instruction does the following:

d0 = Mem[a7 + 4]

Let’s assume that the stack has the following values:

A7 à $6fff0	01	02	03	04
$6fff4	10	20	30	40
$6fff8	11	22	33	44
$6fffc	55	66	77	88

Then here are a few examples:

1. move.l 8(a7), d0 à d0 = mem[a7 + 8] = mem[$6fff0 + 8] = mem[$6fff8] = $11223344

2. move.b $d(a7), d1 à d1 = mem[a7 + $d] = $66 (only a single byte is moved and only the lower least significant byte of d1 changes).

In general the Address Register Indirect with Displacement addressing mode takes the form:

Imm16(Ax) where Imm16 is a 16-bit immediate and Ax an A register.

The imm16 is interpreted as a two’s complement signed number and is extended to 32 bits prior to adding it to Ax. Thus, this addressing mode can be used to access any memory location that is at distance -32k to +32k-1 from that pointed to by the value of register Ax.

Requirement 1: Calling a subroutine and having it return to the caller

Having explained the stack we are now ready to explain how the first requirement can be satisfied using the stack. Let’s focus only on control flow for the time being and thus use subroutines that do not accept arguments and do not return values. Let’s use the following C code as our example:

boo_calls ()

{

coo ();

doo ();

return;

}

void coo ()

{

doo ();

return;

}

void doo()

{

return;

}

Notice that boo calls coo which then calls doo. After doo returns to coo and it returns to boo, boo calls doo directly. Focusing thus on doo, we see an example where a function is called from two different places and is supposed to return at different spots for each of those calls.

The “trick” is to apply the following convention: Prior to calling a subroutine the caller pushes onto the stack a value which represents the address of the instruction immediately after the call. The callee returns by popping that value and then making PC point to that instruction, i.e., PC = POP. These points are made clear in the following 68k code:

The 68k code is as follows:

org $20000

boo

move.l #boo_return, a6

move.l a6, -(a7) ; push the return address onto the stack

jmp coo ; resume execution at coo

boo_return

move.l #boo_return2, a6

move.l a6, -(a7)

jmp doo

boo_return2 ; code will not run on ultragizmo since the assembler uses only the first 8 letters of labels

… other code goes here

coo

move.l #coo_return, a6 ; push our return address onto the stack

move.l a6, -(a7)

jmp doo ; resume execution at doo

coo_return

move.l (a7)+, a6 ; pop return address of boo from the stack

jmp (a6) ; resume execution there

doo

move.l (a7)+, a6 ; pop return address of coo

jmp (a6)

The first three instructions push on the stack the return address for the call to coo. The return address is the address of the instruction that follows the call in the calling function.

move.l #boo_return, a6 à move into a6 the address of the instruction where coo should return at

move.l a6, -(a7) à push the return address onto the stack

jmp coo à resume execution at coo

after these instructions have been executed, the PC points to “coo” and the stack contains a single long-word element whose value is “boo_return”. In coo now, the first three instructions push onto the stack the return address in coo where execution should resume once doo returns. This sequence of three instructions is similar to the one we have just seen for boo. So, at this stage, the stack contains two long words. At the top of the stack is the return address in coo. After it is the return address in boo.

Since doo does not call any other routines it returns to whoever called it. To do this, “doo” pops the return address from the stack and then uses a jmp to resume execution at that address. These are done by the following instructions:

move.l (a7)+, a6 à pop return address from the stack

jmp (a6) à return, that is PC now points to the instruction after the call to doo (this is coo the first time)

Once these instructions are executed, the PC will point to coo_return. The instruction sequence starting at coo-return is the same. It reads a long word from the stack and then uses this long word as an address where execution should continue at. Once these two instructions get executed, we return to boo_return.

In boo_return we see the implementation of the second call to doo. It is similar to the code for the call to coo. Again we push a return address onto the stack and then resume execution in doo using a jmp. Doo is oblivious to who called it. It always executes the two instructions it contains. They will read a return address from the stack and resume execution at that location. Notice that while the first time doo was called the return address was pointing in coo, during the second call to doo the return is in boo. The stack thus allowed us to call a routine from two different places and the routine was able to return to the appropriate spot which was different in the two cases.

In the above discussion we have seen that when a routine wants to call another one it pushes onto the stack the return address and then makes the PC point to the beginning of the function being called. Because this is done often, 68k offers an instruction that performs both actions:

BSR label à move.l “address after the BSR”, -(a7)

PC = label

BSR = Branch to SubRoutine

JSR label à equivalent to BSR. BSR uses a displacement encoding of the destination address and thus has similar limitations as other branch instructions. JSR does not have these limitations however it requires more bytes for its encoding.

68k also provides an instruction that performs the actions necessary for returning from a subroutine. In our example we have seen that to return from a function we need to pop the return address from the stack and then make PC point to that location. In 68k we do all these using the RTS instruction (return from subroutine). RTS does not take any arguments:

RTS à PC = Mem[a7] (top of the stack)

a7 = a7 + 4

Using these instructions we can rewrite our example:

boo

bsr coo

bsr doo

…. Other code goes here and at the end we will have an rts

coo

bsr doo

rts

doo

rts