Lecture 2

Andreas Moshovos

Fall 2007

Subroutines Continued: Passing Arguments, Returning Values and Allocating Local Variables

Thus far we have seen the mechanisms via which a subroutine can be called and return to its caller. In this lecture we will be looking at the mechanisms that are used for passing arguments and for returning a value. Please keep in mind that what we present is the calling convention used by the popular GNU gcc compiler for the NIOS II processor family.

For clarity, we will assume that all arguments and values returned are words, i.e., 32 bits. You can use the gcc compiler to see how other datatypes are passed including aggregate ones (structures).

Let us first ignore local variables and instead focus on parameter passing and return values.

In this calling convention the return value is returned into register r2. The first four parameters are passed respectively in registers r4, r5, r6, and r7. Additional parameters are passed through the stack in order, so that the fifth parameter is on the top of the stack prior to calling the function. Here are a couple of examples to clarify things where we show functions and where do they expect to find their arguments and where they return their value. The last column shows how to call the function. We will clarify this later in the lecture so you may ignore it if you find it confusing at first.

C Code

Explanation

Assembly code

How to call passing 1, 2, 3, … as arguments. Code assumes that space has been pre-allocated on the stack

int add0(void)

{

return 0;

}

This function takes no arguments, it will return the value into register r2.

add0:

add r2, r0, r0

ret

call add0

int add1(int a)

{

return a + 10;

}

This function takes a single argument “a”, which it expects to find in register r4. It will return the value in r2.

add1:

addi r2, r4, 10

ret

movi r4, 1

call add1

int add2(int a, int b)

{

return a + b;

}

It expects “a” to be in r4, and “b” in r5. It returns the sum in r2.

add2:

add r2, r4, r5

ret

movi r4, 1

movi r5, 2

call add2

int add3(int a, int b, int c)

{

return a + b + c;

}

It expects “a” to be in r4, “b” in r5, and “c” in r6. It returns the sum in r2.

add3:

add r2, r4, r5

add r2, r2, r6

ret

movi r4, 1

movi r5, 2

movi r6, 3

call add3

int add4(int a, int b, int c, int d)

{

return a + b + c + d;

}

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It returns the sum in r2.

add4:

add r2, r4, r5

add r2, r2, r6

add r2, r2, r7

ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

call add4

int add5(int a, int b, int c, int d, int e)

{

return a + b + c + d + e;

}

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldw __, 0(sp)” where “__” is a register (for example, r9). It returns the sum in r2.

add5:

add r2, r4, r5

add r2, r2, r6

add r2, r2, r7

ldw r7, 0(sp)

add r2, r2, r7

ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stw r2, 0(sp)

call add5

int add6(int a, int b, int c, int d, int e, int f)

{

return a + b + c + d + e + f;

}

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldw __, 0(sp)” where “__” is a register (for example, r9). It expects “f” to be the second element of the stack. It can access “f” via a “ldw __, 4(sp)”. It returns the sum in r2.

add6:

add r2, r4, r5

add r2, r2, r6

add r2, r2, r7

ldw r7, 0(sp)

add r2, r2, r7

ldw r7, 4(sp)

add r2, r2, r7

ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stw r2, 0(sp)

movi r2, 6

stw r2, 4(sp)

call add6

int add7(int a, int b, int c, int d, int e, int f, int g)

{

return a + b + c + d + e + f;

}

add7:

add r2, r4, r5

add r2, r2, r6

add r2, r2, r7

ldw r7, 0(sp)

add r2, r2, r7

ldw r7, 4(sp)

add r2, r2, r7

ldw r7, 8(sp)

add r2, r2, r7

ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stw r2, 0(sp)

movi r2, 6

stw r2, 4(sp)

movi r2, 7

stw r2, 8(sp)

call add7

The code for performing a call generally will consist of the following four sections (shown underlined):

Caller Callee

… prologue

pre-call main body

call callee epilogue

post-call ret

…

Pre-Call: Prior to making the call, the caller we will have to take some actions. In particular, it must write the appropriate values into the registers for the first four parameters if any. It then needs to push on the stack any other parameters. Parameters five and higher are pushed onto the stack in order. That is, we first push onto the stack the last parameter, then the previous to last and so on until we last push the 5th parameter.

Post-Call: After the call, the caller must de-allocate the stack space it allocated in the pre-call section for passing arguments, if any.

Prologue: In this section the callee will be allocating space for local variables and will be taking appropriate actions for preserving those register values that it should not change such as the return address register. We will revisit this later on.

Epilogue: In this section, the callee will be reversing all actions that took place in the prologue. We will expand on this later on.

Let’s see an example:

int add3 (int a, int b, int c)

{ return a + b + c;

}

int sum = 0;

main()

{

sum += add3 (1, 2, 3);

}

Here’s the code for main:

.data

sum: .word 0

.text

main:

# push ra on the stack, main will be calling add3

addi sp, sp, -4

stw ra, 0(sp)

# Pre-call section

# pass the arguments

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

call add3 ; call the subroutine

# post-call section

retadd:

# return value is in r2

# nothing to do

# load sum from memory add r2 to it and save back to memory

movia r9, sum

ldw r10, 0(r9)

add r10, r10, r2

stw r10, 0(r9)

# restore return address from the stack

lwdio ra, 0(sp)

addi sp, sp, 4

ret

Here’s the code for add3:

add3:

add r2, r4, r5 # add the first two parameters and store the sum in r2

add r2, r2, r6 # add the third parameter to r2

ret # return to the caller

There are no prologue and epilogue sections in add3 since it has no local variables and since it does not change any registers other than d0.

A More Elaborate Example:

Now let see at a more elaborate example where main calls add7

int add7 (int a, int b, int c, int d, int e, int f, int g)

{ return a + b + c + d + e + f + g;

}

int sum = 0;

main()

{

sum += add7 (1, 2, 3, 4, 5, 6, 7);

}

Here’s the code for main:

.data

sum: .word 0

.text

main:

# push ra on the stack, main will be calling add3

addi sp, sp, -4

stw ra, 0(sp)

# Pre-call section

# pass the arguments

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

# allocate space for arguments five through seven

# Need space for 3 words, or 12 bytes

addi sp, sp, -12

# pass the fifth argument, it goes on the top of the stack

# we use r2 as a temporary

movi r2, 5

stw r2, 0(sp)

# pass the sixth argument, it should be the second argument on the stack

# that is at distance plus four from the top of the stack

# we use r2 as a temporary

movi r2, 6

stw r2, 4(sp)

# pass the seventh argument, it should be the third argument on the stack

# that is at distance plus eight from the top of the stack

# we use r2 as a temporary

movi r2, 7

stw r2, 8(sp)

# call the subroutine

call add7

# post-call section

retadd:

# return value is in r2

# adjust the stack by popping the three values

addi sp, sp, 12

# load sum from memory add r2 to it and save back to memory

movia r9, sum

ldw r10, 0(r9)

add r10, r10, r2

stw r10, 0(r9)

# restore and pop the return address from the stack

lwdio ra, 0(sp)

addi sp, sp, 4

ret

Here’s the code for add7:

add7:

add r2, r4, r5 # add the first two arguments and place the sum into r2

add r2, r2, r6 # add the third argument to r2

add r2, r2, r7 # add the fourth argument to r2

ldw r7, 0(sp) # read the fifth argument from the stack

add r2, r2, r7 # add to r2

ldw r7, 4(sp) # read the sixth argument from the stack

add r2, r2, r7 # add to r2

ldw r7, 8(sp) # read the seventh argument from the stack

add r2, r2, r7 # add to r2

ret

Here’s how the stack looks like when add7 is called:

sp à	+0	5^th argument
	+4	6^th argument
	+8	7^th argument
	+12	Main’s saved return address
	+16

Aggregating Stack Changes

Going back to the code for main(), please notice that the adjustments to the stack have been highlighted in blue. Rather that adjusting the stack before and after each call, the code generated by GCC does the adjustments in the beginning of main. Specifically, the compile figures out what is the maximum space that will be needed for the function and pre-allocates the space at the beginning. Focusing on parameter passing the maximum space needed is determined by the callee that has the maximum number of arguments.

In the modified main() that follows please notice that there are only two adjustments made to the stack, one at the prologue and one at the epilogue. As a result, the relative index for the saved “ra” value was changed to +12. The stack frame looks as follows:

After main is called but before any instruction in main executes:

spà

+16

After main executes its prolog

sp à	+0	Used to pass the 5^th argument
	+4	Used to pass the 6^th argument
	+8	Used to pass the 7^th argument
	+12	Saved return address
	+16

.data

sum: .word 0

.text

main:

# PROLOG

# create space on the stack for the return address and the three parameters that need to be

# pushed on the stack. We need 16 bytes since we will be saving four words.

addi sp, sp, -16

# save the return address. This is the first word that should be pushed, so it ends up at

# the bottom of the stack. Ra now occupies the bytes at offsets +12 through +15.

stw ra, 12(sp)

# Pre-call section

# pass the arguments

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

# pass the fifth argument, it goes on the top of the stack

# we use r2 as a temporary

movi r2, 5

stw r2, 0(sp)

# pass the sixth argument, it should be the second argument on the stack

# that is at distance plus four from the top of the stack

# we use r2 as a temporary

movi r2, 6

stw r2, 4(sp)

# pass the seventh argument, it should be the third argument on the stack

# that is at distance plus eight from the top of the stack

# we use r2 as a temporary

movi r2, 7

stw r2, 8(sp)

# call the subroutine

call add7

# post-call section

retadd:

# return value is in r2

# load sum from memory add r2 to it and save back to memory

movia r9, sum

ldw r10, 0(r9)

add r10, r10, r2

stw r10, 0(r9)

# EPILOGUE

# restore and pop the return address from the stack

lwdio ra, 12(sp)

# pop all values from the stack

addi sp, sp, 16

ret

The portion of the stack the function allocates and uses is called the stack frame for the function. We have seen that the stack frame contains:

1. The input parameters which were pushed by the calling function.

2. The saved return address if this function calls another one.

3. Space for input parameters for calling other functions.

We’ll complete the description of the stack frame soon.

As we noted in the beginning, we need to figure out what is the maximum space that will be needed on the stack. This is determined by the callee that has the maximum number of arguments amongst all functions that are being called from this one. Even if this is a variable argument function, every call to it has a specific number of arguments. See the following examples. Notice the constants used to adjust the stack and to save and restore the return address (shown in blue):

Example code

Stack Allocation Explanation

Prologue

Epilogue

main()

{

…

foo (1, 2, 3)

…

boo (1, 2, 3, 4, 5, 6, 7, 8)

}

Boo has the maximum number of arguments

We’ll need to allocate space for 8-4=4 words on the stack for arguments 5 through 8, and the return address. In total that’s five words, or 20 bytes.

addi sp, sp, -20

stw ra, 16(sp)

ldw ra, 16(sp)

addi sp, sp, 20

main()

{

…

foo (1, 2, 3)

…

printf (“%d %d %d %d %d”, 2, 3, 4, 5, 6)

}

Printf() takes 6 arguments (we will see that a string is passed by passing its starting address as the parameter).

We need to allocate space for 6-4=2 arguments plus the return address. That’s 4*3 = 12.

addi sp, sp, -12

stw sp, 8(sp)

ldw ra, 8(sp)

addi sp, sp, 12

What happens to registers across calls? Callee- vs. Caller-Saved registers.

In the previous example, the callee (add7) did not change any registers other than r2, and r4 through r7. The caller expects these registers to change as r2 is used to return a value and the others to pass arguments. What if add7 was using other registers? The convention says that registers r16 through r23, and registers r26, r27, and r28 should be preserved across a call. That is, the caller expects that when the callee returns, these registers will have the same values they had before the callee was called. If we read through this statement carefully we can see that it does not say that the registers should not change value while the callee executes. All we have to guarantee is that before returning to the caller the registers must be loaded with the original values. There are two ways of achieving this: (1) Do not touch a register at all, (2) Allow a register to change its value but remember what value it had prior to the call and restore that value prior to returning to the caller. For (2) we can do the following. In the subroutine prologue save on the stack the values of all those registers that the routine will change. In the epilogue restore the registers to their original values using those stored onto the stack.

So registers that must be preserved across a call by the callee are called callee-saved registers.

While this is contrived example, let us assume that add7 was using register r16 to calculate the return value prior to writing it to r2. The modified code follows. Note that add7 first saves the value of r16 on the stack in the prologue and restores it in the epilogue. For this reason, the offsets of the parameters need to change. Notice that the fifth argument is now at offset +4 and not +0, the sixth at +8 and not +4 and the last at offset +12 and not +8 (offsets shown in blue):

Original code that uses r2 for the partial sum

Code that uses r16, a callee-saved register

add7:

add r2, r4, r5

add r2, r2, r6

add r2, r2, r7

ldw r7, 0(sp)

add r2, r2, r7

ldw r7, 4(sp)

add r2, r2, r7

ldw r7, 8(sp)

add r2, r2, r7

ret

add7:

# Prologue

# push r16’s value on the stack

addi sp, sp, -4

stw r16, 0(sp)

# add the first four arguments

add r16, r4, r5

add r16, r16, r6

add r16, r16, r7

# add arguments five through seven

ldw r7, 4(sp)

add r16, r16, r7

ldw r7, 8(sp)

add r16, r16, r7

ldw r7, 12(sp)

add r2, r16, r7

# Epilogue

# restore r16’s original value

ldw r16, 0(sp)

addi sp, sp, 4

ret

Notice that the prologue and epilogue sections are symmetric. One saves register values onto the stack and the other restore them. Restoring is typically done in the reverse order. For example, if we needed to save and restore registers r16, r17 and r18 we will use the following prologue and epilogue sections:

prologue: addi sp, sp, -12

stw r16, 0(sp)

stw r17, 4(sp)

stw r18, 8(sp)

…

epilogue ldw r18, 8(sp)

ldw r17, 4(sp)

ldw r18, 0(sp)

addi sp, sp, 12

There is a second class of registers in the calling convention. These are the caller-saved registers. These are registers that are not guaranteed to maintain the value across a call. If the calling function needs these values to be preserved that it has to explicitly save them on the stack prior to the call (i.e., in the pre-call section), and then restore them from the stack after the call (i.e., in the post-call section). In NIOS II r8 through r15, and r2 through r7 are caller-saved registers. Notice that these include the return value and the argument passing registers since these are guaranteed to change whenever a function is called.

In summary here are the register save/restore conventions for NIOS

CALLER-SAVED	R8 through R15, R2 through R7	Save in pre-call / Restore in post-call
CALLEE-SAVED	R16 through R23, R26 through R28	Save in prologue / Restore in epilogue

An example follows where we implement add6() by calling add2() three times.

The equivalent C code for add6() is:

int add6(int a, int b, int c, int d, int e, int f)

{

int t;

t = add2 (a, b);

t += add2 (c, d);

t += add2 (e, f);

return t;
}

We will be using register r8 for “t”. R8 is a caller-saved register. Note that add6 () does not know whether add2()changes r8, hence it must preserved it value across the call. It must also save registers r6 and r7 which contain arguments “c” and “d”, since these may change during the first call to add2(). Here’s the assembly code for add6():

Here’s how the stack frame looks before any instruction of add6() executes and after space is allocated in the prologue:

When add6() is called but before any instruction executes in add6():

sp à	+0	5^th argument
	+4	6^th argument
	+8	Caller’s saved return address

After add6() allocates space on the stack:

sp à	+0	to preserve r8
	+4	to preserve r6
	+8	To preserve r7
	+12	add6’s return address
	+16	5^th argument
	+20	6^th argument
	+24	Caller’s saved return address

.text

add7:

# Prologue

# allocate space for 4 words: we will be saving ra, r6, r7, and r8

addi sp, sp, -16

# save ra

stw ra, 12(sp)

# pre-call section

# save r7 and r6 prior to calling add2 for the first time

stw r7, 8(sp)

stw r6, 4(sp)

# note that a and b are already in r4 and r5 so

# that they can be used directly as the first and second arguments by add2

call add2

# post-call section

ldw r6, 4(sp)

ldw r7, 8(sp)

# t = return value of add2

add r8, r2, 0

# pre-call section

# save r8 on the stack

stw r8, 0(sp)

# pass c and d as the two parameters to add2

add r4, r6, r0

add r5, r7, r0

# we don’t need to preserve r6 and r7 since we don’t care for these values any more

call add2

# post-call

# restore r8’s value

ldw r8, 0(sp)

# t += return value
add r8, r8, r2

# pre-call section

# save r8 on the stack

stw r8, 0(sp)

# pass e,f and g as the three parameters to add2

# is now at offset +16, it was at offset +0 before add6 allocated four words on the stack

ldw r4, 16(sp)

ldw r5, 20(sp)

call add2

# post-call

# restore r8’s value

ldw r8, 0(sp)

# add6’s return value = t + add2’s return value
add r2, r8, r2

# epilogue

ldw ra, 12(sp)

addi sp, sp, 12

# done

ret

Local Variables? Local variables can either be allocated in registers or on the stack immediately after the space allocated for preserving register values.

Stack Frame: This term is used to refer to the stack space allocated per subroutine invocation. Based on our discussion the layout of a stack frame is as follows:

spà	Local variables	Allocated by callee
	Saved registers
	Return address
	Fifth Parameter	Allocated by caller
	sixth Parameter
	…