Andreas Moshovos
Fall 2007
Subroutines
Continued: Passing Arguments, Returning Values and Allocating Local Variables
Thus far we have seen the mechanisms via which a subroutine can be called and return to its caller. In this lecture we will be looking at the mechanisms that are used for passing arguments and for returning a value. Please keep in mind that what we present is the calling convention used by the popular GNU gcc compiler for the NIOS II processor family.
For clarity, we will assume that all arguments and values returned are words, i.e., 32 bits. At the end we will discuss the conventions for passing longer arguments and returning longer values and aggregate datatypes (structures).
Let us first ignore local variables and instead focus on parameter passing and return values.
In this calling convention the return value is returned into register r2. The first four parameters are passed respectively in registers r4, r5, r6, and r7. Additional parameters are passed through the stack in order, so that the last parameter is on the top of the stack prior to calling the function. Here are a couple of examples to clarify things where we show functions and where do they expect to find their arguments and where they return their value. The last column shows how to call the function. We will clarify this later in the lecture so you may ignore it if you find it confusing at first.
C Code |
Explanation |
Assembly code |
How to call passing
1, 2, 3,
as arguments. Code assumes that space has
been pre-allocated on the stack |
int add0(void) { return 0; } |
This function takes no arguments, it will return the value into register r2. |
add0: add r2, r0, r0 ret |
call add0 |
int add1(int a) { return a + 10; } |
This function takes a single argument a, which it expects to find in register r4. It will return the value in r2. |
add1: addi r2, r4,
10 ret |
movi r4, 1 call add1 |
int add2(int a, int b) { return a + b; } |
It expects a to be in r4, and b in r5. It returns the sum in r2. |
add2: add r2, r4, r5 ret |
movi r4, 1 movi r5, 2 call add2 |
int add3(int a, int b, int c) { return a + b + c; } |
It expects a to be in r4, b in r5, and c in r6. It returns the sum in r2. |
add3: add r2, r4, r5 add r2, r2, r6 ret |
movi r4, 1 movi r5, 2 movi r6, 3 call add3 |
int add4(int a, int b, int c, int d) { return a + b + c + d; } |
It expects a to be in r4, b in r5, c in r6, and d in r7. It returns the sum in r2. |
add4: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 call add4 |
int add5(int a, int b, int c, int d, int e) { return a + b + c + d + e; } |
It expects a to be in r4, b in r5, c in r6, and d in r7. It expects e to be on top of the stack. It can access e via a ldwio __, 0(sp) where __ is a register (for example, r9). It returns the sum in r2. |
add5: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldwio r7,
0(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stwio r2, 0(sp) call add5 |
int add6(int a, int b, int c, int d, int e, int f) { return a + b + c + d + e + f; } |
It expects a to be in r4, b in r5, c in r6, and d in r7. It expects e to be on top of the stack. It can access e via a ldwio __, 0(sp) where __ is a register (for example, r9). It expects f to be the second element of the stack. It can access f via a ldwio __, 4(sp). It returns the sum in r2. |
add6: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldwio r7,
0(sp) add r2, r2, r7 ldwio r7,
4(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stwio r2, 0(sp) movi r2, 6 stwio r2, 4(sp) call add6 |
int add7(int a, int b, int c, int d, int e, int f, int g) { return a + b + c + d + e + f; } |
It expects a to be in r4, b in r5, c in r6, and d in r7. It expects e to be on top of the stack. It can access e via a ldwio __, 0(sp) where __ is a register (for example, r9). It expects f to be the second element of the stack. It can access f via a ldwio __, 4(sp). It expects g to be the third element of the stack. It can access g via a ldwio __, 8(sp). It returns the sum in r2. |
add7: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldwio r7,
0(sp) add r2, r2, r7 ldwio r7,
4(sp) add r2, r2, r7 ldwio r7,
8(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stwio r2, 0(sp) movi r2, 6 stwio r2, 4(sp) movi r2, 7 stwio r2, 8(sp) call add7 |
The code for performing a call generally will consist of the following four sections (shown underlined):
Caller Callee
prologue
pre-call main body
call callee epilogue
post-call ret
Pre-Call: Prior to making the call, the caller we will have to take some actions. In particular, it must write the appropriate values into the registers for the first four parameters if any. It then needs to push on the stack any other parameters. Parameters 5 and higher are pushed onto the stack in order. That is, we first push onto the stack the fifth parameter, then the sixth and so on.
Post-Call: After the call, the caller must de-allocate the stack space it allocated in the pre-call section for passing arguments, if any.
Prologue: In this section the callee will be allocating space for local variables and will be taking appropriate actions for preserving those register values that it should not change such as the return address register. We will revisit this later on.
Epilogue: In this section, the callee will be reversing all actions that took place in the prologue. We will expand on this later on.
Lets see an example:
int
add3 (int a, int b, int c)
{ return a + b + c;
}
int
sum = 0;
main()
{
sum += add3 (1, 2, 3);
}
Heres the code for main:
.data
sum: .word
0
.text
main:
# push ra on the stack, main will be calling add3
addi sp, sp, -4
stwio
ra, 0(sp)
#
Pre-call section
# pass the
arguments
addi
sp, sp, -12 # make space for two
arguments on the stack
movi r4, 1
movi r5, 2
movi r6, 3
movi r7, 4
call add3 ;
call the subroutine
# post-call section
retadd:
# return value is
in r2
# nothing to do
# load sum from
memory add r2 to it and save back to memory
movia
r9, sum
ldwio
r10, 0(r9)
add r10, r10, r2
stwio r10, 0(r9)
# restore return
address from the stack
lwdio
ra, 0(sp)
addi
sp, sp, 4
ret
Heres the code for add3:
add3:
add r2, r4, r5 # add the
first two parameters and store the sum in r2
add r2, r2, r6 # add the third parameter to r2
ret # return to the caller
There are no prologue and epilogue sections in add3 since it has no local variables and since it does not change any registers other than d0.
A More Elaborate Example:
Now let see at a more elaborate example where main calls add7
int
add7 (int a, int b, int c, int d, int
e, int f, int g)
{ return a + b + c + d + e + f + g;
}
int
sum = 0;
main()
{
sum += add7 (1, 2, 3, 4, 5, 6, 7);
}
Heres the code for main:
.data
sum: .word
0
.text
main:
# push ra on the stack, main will be calling add3
addi sp, sp, -4
stwio
ra, 0(sp)
#
Pre-call section
# pass the
arguments
movi r4, 1
movi r5, 2
movi r6, 3
movi r7, 4
# allocate space for arguments five through
seven
# Need space for 3 words, or 12 bytes
addi sp,
sp, -12
#
pass the fifth argument, it goes on the top of the stack
#
we use r2 as a temporary
movi r2, 5
stwio r2, 0(sp)
#
pass the sixth argument, it should be the second argument on the stack
#
that is at distance plus four from the top of the
stack
#
we use r2 as a temporary
movi r2, 6
stwio r2, 4(sp)
#
pass the seventh argument, it should be the third argument on the stack
#
that is at distance plus eight from the top of the
stack
#
we use r2 as a temporary
movi r2, 7
stwio r2, 8(sp)
#
call the subroutine
call add7
# post-call section
retadd:
# return value is
in r2
# adjust the stack by popping the three
values
addi
sp, sp, 12
# load sum from
memory add r2 to it and save back to memory
movia
r9, sum
ldwio
r10, 0(r9)
add r10, r10, r2
stwio r10, 0(r9)
# restore and pop
the return address from the stack
lwdio
ra, 0(sp)
addi
sp, sp, 4
ret
Heres the code for add7:
add7:
add r2, r4, r5 # add the first two arguments and place the
sum into r2
add r2, r2, r6 # add the third argument to r2
add r2, r2, r7 # add the fourth argument to r2
ldwio
r7, 0(sp) # read the fifth argument from
the stack
add r2, r2, r7 # add to r2
ldwio
r7, 4(sp) # read the sixth argument from
the stack
add r2, r2, r7 # add to r2
ldwio
r7, 8(sp) # read the seventh argument
from the stack
add r2, r2, r7 # add to r2
ret
Heres
how the stack looks like when add7 is called:
sp à |
+0 |
5th
argument |
|
+4 |
6th
argument |
|
+8 |
7th
argument |
|
+12 |
|
|
+16 |
|
Aggregating Stack Changes
Going back to the code for main(), please notice that the adjustments to the stack have been highlighted in blue. Rather that adjusting the stack before and after each call, the code generated by GCC does the adjustments in the beginning of main. Specifically, the compile figures out what is the maximum space that will be needed for the function and pre-allocates the space at the beginning. Focusing on parameter passing the maximum space needed is determined by the callee that has the maximum number of arguments.
In the modified main() that follows please notice that there are only two adjustments made to the stack, one at the prologue and one at the epilogue. As a result, the relative index for the saved ra value was changed to +12. The stack frame looks as follows:
After main is called but before any instruction in main executes:
spà |
+16 |
|
After main executes its prolog
sp à |
+0 |
Used to pass the 5th
argument |
|
+4 |
Used to pass the 6th
argument |
|
+8 |
Used to pass the 7th
argument |
|
+12 |
Saved return
address |
|
+16 |
|
.data
sum: .word 0
.text
main:
# PROLOG
# create space on the stack for the return
address and the three parameters that need to be
# pushed on the stack. We need 16 bytes since
we will be saving four words.
addi sp, sp, -16
# save the return address. This is the
first word that should be pushed, so it ends up at
# the bottom of the stack. Ra now occupies
the bytes at offsets +12 through +15.
stwio ra, 12(sp)
#
Pre-call section
# pass the arguments
movi r4, 1
movi r5, 2
movi r6, 3
movi r7, 4
#
pass the fifth argument, it goes on the top of the stack
#
we use r2 as a temporary
movi r2, 5
stwio r2, 0(sp)
#
pass the sixth argument, it should be the second argument on the stack
#
that is at distance plus four from the top of the stack
#
we use r2 as a temporary
movi r2, 6
stwio r2, 4(sp)
#
pass the seventh argument, it should be the third argument on the stack
#
that is at distance plus eight from the top of the stack
#
we use r2 as a temporary
movi r2, 7
stwio r2, 8(sp)
#
call the subroutine
call
add7
#
post-call section
retadd:
# return value is in r2
# load sum from memory add r2 to it and
save back to memory
movia
r9, sum
ldwio
r10, 0(r9)
add r10, r10, r2
stwio r10, 0(r9)
#
EPILOGUE
# restore and pop the return address from
the stack
lwdio ra, 12(sp)
# pop all values from the stack
addi sp, sp, 16
ret
The portion of the stack the function allocates and uses is called the stack frame for the function. We have seen that the stack frame contains:
1. The input parameters which were pushed by the calling function.
2. The saved return address if this function calls another one.
3. Space for input parameters for calling other functions.
Well complete the description of the stack frame soon.
As we noted in the beginning, we need to figure out what is the maximum space that will be needed on the stack. This is determined by the callee that has the maximum number of arguments amongst all functions that are being called from this one. Even if this is a variable argument function, every call to it has a specific number of arguments. See the following examples. Notice the constants used to adjust the stack and to save and restore the return address (shown in blue):
Example code |
Stack Allocation
Explanation |
Prologue |
Epilogue |
main() {
foo (1, 2, 3)
boo (1, 2, 3, 4, 5, 6, 7, 8) } |
Boo has the maximum number of arguments Well need to allocate space for 8-4=4 words on the stack for arguments 5 through 8, and the return address. In total thats five words, or 20 bytes. |
addi sp, sp, -20 stwio ra, 16(sp) |
ldwio ra, 16(sp) addi sp, sp, 20 |
main() {
foo (1, 2, 3)
printf (%d %d
%d %d %d, 2, 3, 4, 5, 6) } |
Printf() takes 6 arguments (we will see that a string is passed by passing its starting address as the parameter). We need to allocate space for 6-4=2 arguments plus the return address. Thats 4*3 = 12. |
addi sp, sp, -12 stwio sp, 8(sp) |
ldwio ra, 8(sp) addi sp, sp, 12 |
What happens to registers across calls? Callee- vs. Caller-Saved registers.
In the previous example, the callee (add7) did not change any registers other than r2, and r4 through r7. The caller expects these registers to change as r2 is used to return a value and the others to pass arguments. What if add7 was using other registers? The convention says that registers r16 through r23, and registers r26, r27, and r28 should be preserved across a call. That is, the caller expects that when the callee returns, these registers will have the same values they had before the callee was called. If we read through this statement carefully we can see that it does not say that the registers should not change value while the callee executes. All we have to guarantee is that before returning to the caller the registers must be loaded with the original values. There are two ways of achieving this: (1) Do not touch a register at all, (2) Allow a register to change its value but remember what value it had prior to the call and restore that value prior to returning to the caller. For (2) we can do the following. In the subroutine prologue save on the stack the values of all those registers that the routine will change. In the epilogue restore the registers to their original values using those stored onto the stack.
So registers that must be preserved across a call by the callee are called callee-saved registers.
While this is contrived example, let us assume that add7 was using register r16 to calculate the return value prior to writing it to r2. The modified code follows. Note that add7 first saves the value of r16 on the stack in the prologue and restores it in the epilogue. For this reason, the offsets of the parameters need to change. Notice that the fifth argument is now at offset +4 and not +0, the sixth at +8 and not +4 and the last at offset +12 and not +8 (offsets shown in blue):
Original code that
uses r2 for the partial sum |
Code that uses r16,
a callee-saved register |
add7: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldwio r7, 0(sp)
add r2, r2, r7 ldwio r7, 4(sp)
add r2, r2, r7 ldwio r7, 8(sp)
add r2, r2, r7 ret |
add7: #
Prologue # push r16s value on the stack addi sp, sp,
4 stwio r16,
0(sp) # add the first four arguments add r16, r4, r5 add r16, r16, r6 add r16, r16, r7 # add arguments five through seven ldwio r7, 4(sp)
add r16, r16, r7 ldwio r7, 8(sp)
add r16, r16, r7 ldwio r7, 12(sp)
add r2, r16, r7 #
Epilogue
# restore r16s
original value ldwio r16,
0(sp) addi sp, sp,
4 ret |
Notice that the prologue and epilogue sections are symmetric. One saves register values onto the stack and the other restore them. Restoring is typically done in the reverse order. For example, if we needed to save and restore registers r16, r17 and r18 we will use the following prologue and epilogue sections:
prologue: addi sp, sp, 12
stwio r16, 0(sp)
stwio r17, 4(sp)
stwio r18, 8(sp)
epilogue ldwio r18, 8(sp)
ldwio r17, 4(sp)
ldwio r18, 0(sp)
addi sp, sp, 12
There is a second class of registers in the calling convention. These are the caller-saved registers. These are registers that are not guaranteed to maintain the value across a call. If the calling function needs these values to be preserved that it has to explicitly save them on the stack prior to the call (i.e., in the pre-call section), and then restore them from the stack after the call (i.e., in the post-call section). In NIOS II r8 through r15, and r2 through r7 are caller-saved registers. Notice that these include the return value and the argument passing registers since these are guaranteed to change whenever a function is called.
In summary here are the register save/restore conventions for NIOS
CALLER-SAVED |
R8 through R15, R2 through R7 |
Save in pre-call / Restore in post-call |
CALLEE-SAVED |
R16 through R23, R26 through R28 |
Save in prologue / Restore in epilogue |
An example follows where we implement add6() by calling add2() three times.
The equivalent C code for add6() is:
int add6(int a, int b, int c, int d, int e, int f)
{
int t;
t = add2 (a, b);
t += add2 (c, d);
t += add2 (e, f);
return t;
}
We will be using register r8 for t. R8 is a caller-saved register. Note that add6 () does not know whether add2()changes r8, hence it must preserved it value across the call. It must also save registers r6 and r7 which contain arguments c and d, since these may change during the first call to add2(). Heres the assembly code for add6():
Heres how the stack frame looks before any instruction of add6() executes and after space is allocated in the prologue:
When add6() is called
but before any instruction executes in add6():
sp à |
+0 |
5th
argument |
|
+4 |
6th
argument |
|
+8 |
Callers saved
return address |
After add6()
allocates space on the stack:
sp à |
+0 |
to preserve r8 |
|
+4 |
to preserve r6 |
|
+8 |
To preserve r7 |
|
+12 |
add6s return
address |
|
+16 |
5th
argument |
|
+20 |
6th
argument |
|
+24 |
Callers saved
return address |
.text
add7:
# Prologue
# allocate space for 4 words: we will be
saving ra, r6, r7, and r8
addi sp, sp, 16
# save ra
stwio ra, 12(sp)
# pre-call section
# save r7 and r6 prior to calling add2 for
the first time
stwio r7, 8(sp)
stwio r6, 4(sp)
# note that a and b are already in r4 and
r5 so
#
that they can be used directly as the first and second arguments by add2
call add2
# post-call section
ldwio r6, 4(sp)
ldwio r7, 8(sp)
# t = return value of add2
add r8, r2, 0
# pre-call section
# save r8 on the stack
stwio r8, 0(sp)
# pass c and d as the two parameters to
add2
add r4, r6, r0
add r5, r7, r0
# we dont need to preserve r6 and r7 since
we dont care for these values any more
call add2
# post-call
# restore r8s value
ldwio r8, 0(sp)
#
t += return value
add r8, r8, r2
# pre-call section
# save r8 on the stack
stwio r8, 0(sp)
# pass e,f and
g as the three parameters to add2
# is now at offset +16, it was at offset +0
before add6 allocated four words on the stack
ldwio r4, 16(sp)
ldwio r5, 20(sp)
call add2
# post-call
# restore r8s value
ldwio r8, 0(sp)
#
add6s return value = t + add2s return value
add r2, r8, r2
# epilogue
ldwio ra, 12(sp)
addi sp, sp, 12
# done
ret
Local Variables? Local variables can either be allocated in registers or on the stack immediately after the space allocated for preserving register values.
Stack Frame: This term is used to refer to the stack space allocated per subroutine invocation. Based on our discussion the layout of a stack frame is as follows:
spà |
Local variables |
Allocated by callee |
|
Saved registers |
|
|
Return address |
|
|
Fifth Parameter |
Allocated by caller |
|
sixth Parameter |
|
|
|