Lecture 2

Lecture 13

Andreas Moshovos

Spring 2005

Subroutines Continued: Allocating Frames, the Frame Pointer and General Discussion About Subroutines

Allocating Frames: Using single instruction to allocate space on the stack rather than a series of push instructions

As we have seen upon being called a subroutine saves on the stack a number of registers that it may change and then it allocates space for its local variables. Thus far we have being adjusting the stack pointer at every push and at every allocation. Thus if we wanted to save 4 registers and allocation space for four long words in the stack we would use four move.l .., -(a7) for saving the registers and an subq.l #16, a7 to allocate the space for the four local long words: Assuming that we wanted to save d2 through d5 the prologue of the subroutine would look as follows:

somefunction move.l d2, -(a7) ; save d2

move.l d3, -(a7)

move.l d4, -(a7)

move.l d5, -(a7)

subq.l #16, a7

and the epilogue will look as follows:

addq.l #16, a7 ; deallocate the space for the four local long words

move.l (a7)+, d5

move.l (a7)+, d4

move.l (a7)+, d3

move.l (a7)+, d2

rts

Rather than having a7 being adjusted at every single instruction we could use a single subtract in the prologue and a single addition at the epilogue. The true importance of this optimization will become clear once you attend the Computer Architecture course and learn about instruction-level parallelism and out-of-order execution, here we just note that we simply replace a chain of additions/subtractions with a single addition/subtraction. The code will look as follows:

Somefunction subq.l #32, a7 ; space for four saved registers and four long word locals

move.l d2, 28(a7) ; save the registers d2 through d5 – now we have to use displacements since we pre-allocated all of the space needed

move.l d3, 24(a7)

move.l d4, 20(a7)

move.l d5, 16(a7)

and the epilogue:

move.l 16(a7), d5

move.l 20(a7), d4

move.l 24(a7), d3

move.l 28(a7), d2

addq.l #32, a7 ; deallocate all of the stack space

rts

If you look at machine code for other processors (e.g., SPARC or MIPS) subroutine prologues and epilogues will look very similar to those we just described.

The Frame Pointer (a convenience most of the time not a necessity)

In this section we will be discussing about the “frame pointer” or FP for short. The FP is primarily a convenience mechanism hence in most cases we do not really needed it. There are scenarios where it is really useful such as during debugging or for implementing high level programming languages that support nested subroutine definitions. We will discuss it because 68K provides instruction level support for the FP. Moreover, the gcc compiler does use an FP (however, you can avoid having an FP by specifying the option –fomit-frame-pointer. This will probably result in faster code.).

Here’s the primary motivation for having a frame pointer. Consider the layout of a typical frame:

After the callee has made allocations A7à	Local variables	Allocated by callee	Fixed Distance to locals
	Saved registers	Allocated by callee	Fixed Distance to locals
When the callee is called A7 à	Return address		Fixed Distance to parameters
	First Parameter	Allocated by caller
	Second Parameter
	…

We noted the values A7 (the stack pointer) takes when the subroutine is called and after the subroutine has started executing.

Let’s use the terms entry-SP and local-SP to refer to those two values of A7. Note that with respect to entry-SP, the parameters are at fixed distances that are always the same independently of how many local variables the subroutine has or of how many registers it saves. The first parameter is at distance +4, the second at distance +8 and so on. Now consider what happens if the callee wants to call another subroutine callee2. Then it will have to allocate additional space on the stack. Hence all distances to locals and parameters will change with respect to A7. However the distance of all parameters and local variables with respect to entry_SP remain the same. Moreover these distances are known and are independent of each call that the callee may make. So, if we were to use the A7 to refer to the callee locals and parameters we have to be careful to adjust the distances used depending on the current value of A7. This is shown in the figure that follows:

Callee is about to call callee2 A7à	Parameters to callee2		Variable distance to locals and parameters depends on the call
After the callee has made allocations A7à	Local variables	Allocated by callee	Fixed Distance to locals
	Saved registers	Allocated by callee	Fixed Distance to locals
When the callee is called A7 à	Return address		Fixed Distance to parameters
	First Parameter	Allocated by caller
	Second Parameter
	…

We have seen an example of such adjustments in the distances used for accessing parameters in the implementation of Ackerman (see previous lecture), we provide the relevant code excerpt below:

Ynot0

; call Ackerman (x, y-1)

move.l 8(a7), d0 ; read y into d0

subq.l #1, d0 ; d0 = y - 1

movel.l -(a7), d0 ; push y-1 as second parameter

move.l 8(a7), d0 ; d0 = x (at distance 8 now since we just pushed a longword on the stack) it used to be at distance +4 before the previous instruction

move.l -(a7), d0 ; push x as first parameter

jsr Ackerman

addq.l #8, a7 ; deallocate space for the two parameters

If we somehow held in a known place the value entry-SP then we could use that value to index all parameters and locals using fixed distances and we could avoid having to worry about adjusting the distances used to access locals and parameters. This is exactly what the frame pointer is. So, the frame pointer in 68k is a register (in particular a6 is typically used for this purpose) that in subroutine it holds the value the stack pointer had (a7) at the time the routine was called. Because the FP changes for each subroutine we need to save and restore it on the stack as if it was a callee-saved register. This is the first thing that a subroutine must do. Let’s assume that we had a function foo() that has a number of local variables:

int foo (int a, int b)

{

unsigned int la[10];

unsigned int i;

SOME COMPUTATION

}

The stack frame for foo without a frame pointer should look as follows (assuming that foo needs to save/restore two registers):

		Distance from the top of the stack
A7à	la[0] – la[9]	+0
	Local i	+40
	Saved register
	Saved register
	Return address	+52
	a	+56
	b	+60
	…

The stack frame for foo with a frame pointer looks as follows:

		Distance from the FP
A7à	la[0] – la[9]	-52
	Local i	-12
	Saved register
	Saved register
A6à	Previous A6
	Return address	+4
	a	+8
	b	+9
	…

Notice that even if A7 changes during the execution of foo we can use the fp to refer to the local variables and the parameters using fixed distances.

The code for foo will be as follows (prologue and epilogue shown):

foo move.l a6,-(a7) ; save old fp value

move.l a7, a6 ; a6 now points to the top of the stack

add.l #-52, a7 ; allocate space for the local variables and for saving registers

; rest of prologue

SOME COMPUTATION

; restore registers

move.l a6,a7 ; make a7 point to the same place as FP (that is a6)

move.l (a7)+, a6 ; restore a6

rts ; return to caller

Because the two sequences of three instructions shown in the previous example are used whenever we want to have a frame pointer 68k provides single instruction equivalents:

First we have LINK Ax, #Immediate which can be used to replace the first three instructions. It does the following actions:

1. Push Ax onto the stack

a. which translates to:

i. (a7) = (a7) – 4

ii. Mem((a7)) = (a6))

2. (Ax) = (a7)

3. (a7) = (a7) + Immediate

In our example Immediate was 52 and Ax was a6.

Second we have UNLNK Ax which can be used to replace the two instructions before the rts. It performs the following actions:

1. (a7) = (Ax)

2. POP Ax \

a. which translates to

i. (Ax) = Mem((a7))

ii. (a7) = (a7) + 4)

With these instructions we can rewrite foo as follows:

foo link a6, #52

…

unlnk a6

rts

Other ways of implementing subroutine necessities

Passing Arguments: We have described a convention were parameters are always passed through the stack. We could have used registers for this purpose. However, since there few registers this cannot be a general solution. The advantage of using registers is that we avoid accessing memory. Today that memory is a lot slower than registers this is very important. The downside is that once we assign a register to be used for passing parameters then this register may not always be available to use for other purposes. Typically a compromise is used where few registers are used to pass the first say four arguments and if more arguments are needed these have to be allocated on the stack. This hybrid policy in some ways provides the best of both worlds. We can use the fast registers for the first few parameters and use memory only if our subroutine has a lot of parameters.

For example, let us assume that registers d3 and d4 were assigned by convention for passing the first two arguments to a function. Then add3 could be written as:

add3 add.l d3, d0 ; add first parameter to d0

add.l d4, d0 ; add second parameter to d0

add.l 4(a7), d0 ; add third parameter (this is the first parameter that is passed through the stack hence it is at distance +8 – at +4 is the return address)

rts

A call to add3(1,2,3) in this case would be as follows:

move.l #1, d3

move.l #2, d4

move.l #3, -(a7)

jsr add3

addq.l #4, a7 ; deallocate third parameter off the stack

Caller vs. Callee Registers: There is no clear advantage to either choice. However, depending on the specific call it may be advantageous to use one of them as opposed to the other (provide that we have available both options). In such cases it may be possible to completely avoid saving and restoring some registers. For example, consider the case where a function boo needs to call another one called foo. Prior to calling foo, boo produces a value which it places into a register and which it will also use after the call. If the register boo uses is a caller saved register then it will definitely insert a save/restore pair of instructions around the call to foo. This is shown in the following example were we assume that D1 is a caller saved register.

CALLER SAVED D1

boo

D1 = some value

PUSH D1

jsr foo

POP D1

However, the PUSH and POP are truly only needed if foo changes D1. If boo had that information then it could have avoided placing the PUSH/POP pair and we would have saved two memory accesses. However, boo typically has no idea what foo does.

In such cases it *may* be advantageous to use a callee-saved register. If D1 was a callee-saved register then boo wouldn’t have to insert the PUSH/POP pair. Foo would have to do this in its prologue and epilogue:

CALLEE SAVED D1

Foo PUSH D1

….

POP D1

Rts

However, foo will need to do this *only* if it changes D1. Hence, from the perspective of boo it makes sense to use a callee saved register, avoid placing the PUSH/POP pair and hope that the callee function does not touch this register.

Following a similar argument once can see that if foo wants to use a register internally then it is advantageous to use a caller saved register because in this case foo will not have to insert a PUSH/POP pair. Only if the boo uses that register it would have used a PUSH/POP pair to preserve its value across the call to foo.

However, because often subroutines are both called and call others it is not always easy to decide which register convention to use.

As an example consider the MIPS processor family that has 32 general purpose registers (all registers can be used for all purposes). The calling convention of this processor family defines some of the 32 registers to be caller saved and some to be callee saved thus allowing some flexibility to programmers and compilers.