Lecture 2

Lecture 11

Andreas Moshovos

Spring 2005

Subroutines

Structured programming relies on subroutines. Restricting our attention to C, we could write for example a subroutine that accepts three numeric arguments and returns their sum. This subroutine can then be used (or called) from many different places in our program. That is we could for example write the following in C:

int add3 (int a, int b, int c)

{

return a + b +c;

}

int sum = 0;

main ()

{

sum += add3 (1, 2, 3);

sum += 10;

sum += add3 (10, 20, 30);

}

For C subroutines to work as intended we need the following functionality:

1. We should be able to call a subroutine from anywhere in our program. By call we mean being able to change control flow so that the routine is executed.

2. We should be able to pass parameters that may take different values across different calls.

3. A subroutine must be able to return a value.

4. A subroutine must be able to change control flow so that execution continues immediately after the point where it was called. And since a subroutine can be called from many different places this suggests that the routine should be able to differentiate between them and “return” to the right spot depending on where it was called from.

We can see these requirements at work in our example:

We can see that the add3 subroutine is called from two different places. The place a routine is called is also referred to as a call site. The three parameters that add3 accepts take different values at each call site ((1,2,3) and (10,20,30)). Add3 “returns” a value which is the sum of its three arguments. So, every time add3 is called, it calculates a result and that result is returned as the value of the subroutine. This value can then be used in an expression (e.g., in add3(10,20,30) evaluates to 60). The first time add3 is called we expect execution to resume immediately after the statement that called it. So, once add3(1,2,3) execution will resume by adding the return value to sum. Then, the “sum += 10” statement will get executed. The second time add3 is called execution resumes immediately after the particular call.

Let’s review the way control flows in with subroutines with aid of add3 example:

This diagram also shows a call site, the caller and the callee. The call site is the place a subroutine is called. The caller is the subroutine that is making the call. The callee is the subroutine that is called.

We can now look at the list of required functionalities and since we are interested in how to implement subroutines in assembly we can add concerns/requirements that related to machine level constructs. Thus we need to provide answers for the following questions:

1. How does the subroutine returns immediately after the call site?

2. Where and how does it return a value?

3. Where and how are we passing arguments to a subroutine?

4. Where and how are we allocating storage for any local variables (i.e., variable that belong to the subroutine)

5. What happens to register values once a subroutine is called. Do we require that the subroutine preserves their values or is it OK to overwrite some registers.

We will address each of these issues in turn. For most machines there is a set of rules that all valid subroutines must follow. This set of rules is called the calling convention. This set of rules is not the only viable option for implementing subroutines. However, at some point someone decided on a particular solution. If we want a subroutine to interoperate correctly with other subroutines written for the same machine from different people having all follow the same set of rules is necessary. We will be describing the calling convention used by gcc for the 68k family of processors.

There are a few different options for providing the aforementioned functionality. We will present the solution used in 68k and once this is understood we will discuss the other options. Key to supporting subroutines in 68k is the use of a stack. This stack is used to provide the functionality needed in points 1 through 5 above. We first explain how a stack can be implemented in 68k machine code and then explain how the stack is used to support subroutines.

STACK:

Let’s review what stack is. Stack is a last-in first-out queue. In more detail, the stack is a data structure for which three operations are defined:

Push value
Pop
Top (distance)

The first operation adds a new element into the stack. The order in which elements are added into the stack is important. Internally, the elements are placed in a queue following exactly the order in which they were inserted. Push takes a single argument which is the value we need inserted into the stack. After a push, the number of stack element has increased by one. The pop operation removes the most recently inserted element that is currently in the stack. After a pop, number of items in the stack is reduced by one. If the stack is empty then pop is not a valid operation.

Top returns the value of a stack element without removing that element from the stack. Top accepts a single argument which specifies the relative to the most recently instated element of the element we are interested in. So, top(0) returns the value of the most recently inserted element (this is also called the top of the stack). Top(3) returns the value of the 4^th element in the stack (as measured starting from the top).

For example, assuming that initially the stack is empty here’s an example of how the stack operates:

Push 3 à (3)
Push 4 à (4,3) , where 4 is the top element
Push 10 à (10, 4, 3)
Top(2) à returns 3
Top(1) à returns 4
Pop à returns 10 and then the stack becomes (4, 3)

STACK in 68k:

By convention the 68k uses a stack to support subroutines (among other things such as interrupts and OS calls). The stack is implemented in memory with register a7 pointing to the top of the stack That is A7 contains a values which is the address of the top element of the stack in memory. The 68k stack (very much like most stacks for other CPU families) grows downwards. That is the stack starts at a high address and grows towards smaller addresses. The exact value the stack takes typically depends on the OS. We will assume that the stack starts at address $70000 in our examples. In most machines and OSes today a program is laid out in memory so that the program instructions appear first. Then follows the statically defined data (global variables), the comes the heap which grows towards higher addresses. At the very end of memory starts the stack and grows downwards:

$20000	Instructions
	Statically defined data (global variables in C)
	Heap: Dynamically allocated data

$70000	Stack:

PUSH value: Here’s how push can be implemented in 68k assembly. Let’s assumed that the value we want to push onto the stack is in register D0:

subq.l #4, a7 à grow the stack by four bytes (a long word)

move.l d0, (a7) à save the value of d0 onto the top of the stack

Because pop is used frequently, 68k incorporates an addressing mode that implements the above sequence of instructions using a single instruction:

move.l d0, -(a7)

This instruction does what the two instructions did. The notation –(ax) where ax an address register (a0-a7) indication a pre-decrement addressing mode. The semantics of this addressing mode are:

1. Decrement the ax register by the size of the datatype (4 for longwords, 2 for words and 1 for bytes)

2. Access the memory location pointed to by the register ax

POP: Let’s assume that in addition to removing an element from the top of the stack pop also returns its value into register D0.

1. move.l (a7), d0 à read top value

2. addq.l #4, a7 à increment the stack point thus removing the top element

68k supports this directly as follows:

move.l (a7)+, d0

The (ax)+ is called the post-increment addressing mode. Its semantics are:

1. Access the memory location pointed to by ax (that is use ax’s value as an address to read or write memory)

2. Increment ax by the size of the datatype.

Top (index): Assuming that all elements in the stack are long words then we could access the ith element using a sequence of instructions. For example, if we assume that i (our index) is in register d0, then we can use:

add.l d0, d0 à assume d0 holds i

add.l d0, d0 à d0 = 4xi

movea.l a7, a6 à copy the top of the stack address into a temporary register (e.g., a6)

adda.l d0, a7 à address of the ith long word in the stack

move.l (a7), d1 à return value of the ith element into d1

Rather than going through this long sequence of instructions, 68k provides a way of accessing elements at fixed displacements from an address pointed to by an A register. For example:

move.l 4(a7), d0

returns the value of the second long word within the stack. Notice that we used a new way of specifying operands: 4(a7). This is similar to just “(a7)” but notice that we also have specified a constant (the “4”). The aforementioned instruction does the following:

[d0] = Mem[[a7] + 4]

Let’s assume that the stack has the following values:

A7 à $6fff0	01	02	03	04
$6fff4	10	20	30	40
$6fff8	11	22	33	44
$6fffc	55	66	77	88

Then here are a few examples:

1. move.l 8(a7), d0 à [d0] = mem[[a7] + 8] = mem[$6fff0 + 4] = mem[$6fff4] = $11223344

2. move.w $d(a7), d1 à [d1] = mem[[a7] + $d] = $66

In general the Address Register Indirect with Displacement addressing mode takes the form:

Imm16(Ax) where Imm16 is a 16-bit immediate and Ax an A register.

The imm16 is interpreted as a two’s complement signed number and is extended to 32 bits prior to adding it to Ax. Thus, this addressing mode can be used to access any memory location that is at distance -32k to +32k-1 from that pointed by the value of register Ax.

Requirement 1: Calling a subroutine and having it return to the caller

Having explained the stack we are now ready to explain how the first requirement can be satisfied using the stack. Let’s focus only on control flow for the time being and thus use subroutines that do not accept arguments and do not return values. Let’s use the following C code as our example:

boo_calls ()

{

coo ();

doo ();

return;

}

void coo ()

{

doo ();

return;

}

void doo()

{

return;

}

Notice that boo calls coo which then calls doo. After doo returns to coo and it returns to boo, boo calls doo directly. Focusing thus on doo, we see an example where a function is called from two different places and is supposed to return at different spots for each of those calls.

The 68k code is as follows:

org $20000

boo

move.l #boo_return, a6

move.l a6, -(a7) ; push the return address onto the stack

jmp coo ; resume execution at coo

boo_return

move.l #boo_return2, a6

move.l a6, -(a7)

jmp doo

boo_return2 ; code will not run on ultragizmo since the assembler uses only the first 8 letters of labels

… other code goes here

coo

move.l #coo_return, a6 ; push our return address onto the stack

move.l a6, -(a7)

jmp doo ; resume execution at doo

coo_return

move.l (a7)+, a6 ; pop return address of boo from the stack

jmp (a6) ; resume execution there

doo

move.l (a7)+, a6 ; pop return address of coo

jmp (a6)

The first three instructions push on the stack the return address for the call to coo. The return address is the address of the instruction that follows the call in the calling function.

move.l #boo_return, a6 à move into a6 the address of the instruction where coo should return at

move.l a6, -(a7) à push the return address onto the stack

jmp coo à resume execution at coo

after these instructions have been executed, the PC points to “coo” and the stack contains a single long-word element whose value is “boo_return”. In coo now, the first three instructions push onto the stack the return address in coo where execution should resume once doo returns. This sequence of three instructions is similar to the one we have just seen for boo. So, at this stage, the stack contains two long words. At the top of the stack is the return address in coo. After it is the return address in boo.

Since doo does not call any other routines it returns to whoever called it. To do this, “doo” pops the return address from the stack and then uses a jmp to resume execution at that address. These are done by the following instructions:

move.l (a7)+, a6 à pop return address from the stack

jmp (a6) à return, that is PC now points to the instruction after the call to doo (this is coo the first time)

Once these instructions are executed, the PC will point to coo_return. The instruction sequence starting at coo-return is the same. It reads a long word from the stack and then uses this long word as an address where execution should continue at. Once these two instructions get executed, we return to boo_return.

In boo_return we see the implementation of the second call to doo. It is similar to the code for the call to coo. Again we push a return address onto the stack and then resume execution in doo using a jmp. Doo is oblivious to who called it. It always executes the two instructions it contains. They will read a return address from the stack and resume execution at that location. Notice that while the first time doo was called the return address was pointing in coo, during the second call to doo the return is in boo. The stack thus allowed us to call a routine from two different places and the routine was able to return to the appropriate spot which was different in the two cases.

In the above discussion we have seen that when a routine wants to call another one it pushes onto the stack the return address and then makes the PC point to the beginning of the function being called. Because this is done often, 68k offers an instruction that performs both actions:

BSR label à move.l “address after the BSR”, -(a7)

[PC] = label

BSR = Branch to SubRoutine

JSR label à equivalent to BSR. BSR uses a displacement encoding of the destination address and thus has similar limitations as other branch instructions. JSR does not have these limitations however it requires more bytes for its encoding.

68k also provides an instruction that performs the actions necessary for returning from a subroutine. In our example we have seen that to return from a function we need to pop the return address from the stack and then make PC point to that location. In 68k we do all these using the RTS instruction (return from subroutine). RTS does not take any arguments:

RTS à [PC] = Mem[[a7]] (top of the stack)

[a7] = [a7] + 4

Using these instructions we can rewrite our example:

boo

bsr coo

bsr doo

….

Coo

bsr doo

rts

doo

rts