Andreas Moshovos
Fall 2007
Subroutines
Continued: Passing Arguments, Returning Values and Allocating Local Variables
Thus far we have seen the mechanisms via which a subroutine can be called and return to its caller. In this lecture we will be looking at the mechanisms that are used for passing arguments and for returning a value. Please keep in mind that what we present is the calling convention used by the popular GNU gcc compiler for the NIOS II processor family.
For clarity, we will assume that all arguments and values returned are words, i.e., 32 bits. You can use the gcc compiler to see how other datatypes are passed including aggregate ones (structures).
Let us first ignore local variables and instead focus on parameter passing and return values.
In this calling convention the return value is returned into register r2. The first four parameters are passed respectively in registers r4, r5, r6, and r7. Additional parameters are passed through the stack in order, so that the fifth parameter is on the top of the stack prior to calling the function. Here are a couple of examples to clarify things where we show functions and where do they expect to find their arguments and where they return their value. The last column shows how to call the function. We will clarify this later in the lecture so you may ignore it if you find it confusing at first.
C Code |
Explanation |
Assembly code |
How to call passing
1, 2, 3, … as arguments. Code assumes that space has been pre-allocated on
the stack |
int add0(void) { return 0; } |
This function takes no arguments, it will return the value into register r2. |
add0: add r2, r0, r0 ret |
call add0 |
int add1(int a) { return a + 10; } |
This function takes a single argument “a”, which it expects to find in register r4. It will return the value in r2. |
add1: addi r2, r4, 10 ret |
movi r4, 1 call add1 |
int add2(int a,
int b) { return a + b; } |
It expects “a” to be in r4, and “b” in r5. It returns the sum in r2. |
add2: add r2, r4, r5 ret |
movi r4, 1 movi r5, 2 call add2 |
int add3(int a,
int b, int c) { return a + b + c; } |
It expects “a” to be in r4, “b” in r5, and “c” in r6. It returns the sum in r2. |
add3: add r2, r4, r5 add r2, r2, r6 ret |
movi r4, 1 movi r5, 2 movi r6, 3 call add3 |
int add4(int a,
int b, int c, int d) { return a + b + c + d; } |
It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It returns the sum in r2. |
add4: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 call add4 |
int add5(int a,
int b, int c, int d, int e) { return a + b + c + d + e; } |
It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldw __, 0(sp)” where “__” is a register (for example, r9). It returns the sum in r2. |
add5: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldw r7, 0(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stw r2, 0(sp) call add5 |
int add6(int a,
int b, int c, int d, int e, int f) { return a + b + c + d + e + f; } |
It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldw __, 0(sp)” where “__” is a register (for example, r9). It expects “f” to be the second element of the stack. It can access “f” via a “ldw __, 4(sp)”. It returns the sum in r2. |
add6: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldw r7, 0(sp) add r2, r2, r7 ldw r7, 4(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stw r2, 0(sp) movi r2, 6 stw r2, 4(sp) call add6 |
int add7(int a,
int b, int c, int d, int e, int f, int g) { return a + b + c + d + e + f; } |
It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldw __, 0(sp)” where “__” is a register (for example, r9). It expects “f” to be the second element of the stack. It can access “f” via a “ldw __, 4(sp)”. It expects “g” to be the third element of the stack. It can access “g” via a “ldw __, 8(sp)”. It returns the sum in r2. |
add7: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldw r7, 0(sp) add r2, r2, r7 ldw r7, 4(sp) add r2, r2, r7 ldw r7, 8(sp) add r2, r2, r7 ret |
movi r4, 1 movi r5, 2 movi r6, 3 movi r7, 4 movi r2, 5 stw r2, 0(sp) movi r2, 6 stw r2, 4(sp) movi r2, 7 stw r2, 8(sp) call add7 |
The code for performing a call generally will consist of the following four sections (shown underlined):
Caller Callee
… prologue
pre-call main body
call callee epilogue
post-call ret
…
Pre-Call: Prior
to making the call, the caller we will have to take some actions. In particular, it must write the appropriate
values into the registers for the first four parameters if any. It then needs
to push on the stack any other parameters. Parameters five and higher are
pushed onto the stack in order. That is, we first push onto the stack the last parameter,
then the previous to last and so on until we last push the 5th parameter.
Post-Call: After the call, the caller must de-allocate the stack space it allocated in the pre-call section for passing arguments, if any.
Prologue: In this section the callee will be allocating space for local variables and will be taking appropriate actions for preserving those register values that it should not change such as the return address register. We will revisit this later on.
Epilogue: In this section, the callee will be reversing all actions that took place in the prologue. We will expand on this later on.
Let’s see an example:
int add3 (int a, int
b, int c)
{ return a + b + c;
}
int sum = 0;
main()
{
sum
+= add3 (1, 2, 3);
}
Here’s the code for main:
.data
sum: .word 0
.text
main:
# push ra on the stack, main will be
calling add3
addi sp,
sp, -4
stw ra, 0(sp)
#
Pre-call section
# pass the arguments
movi r4,
1
movi r5,
2
movi r6,
3
movi r7,
4
call add3 ; call the subroutine
#
post-call section
retadd:
# return value is in r2
# nothing to do
# load sum from memory add r2 to it and
save back to memory
movia r9, sum
ldw r10, 0(r9)
add r10,
r10, r2
stw
r10, 0(r9)
# restore return address from the stack
lwdio ra, 0(sp)
addi sp, sp, 4
ret
Here’s the code for add3:
add3:
add r2, r4, r5 # add the first two
parameters and store the sum in r2
add r2, r2, r6 # add the third parameter to r2
ret #
return to the caller
There are no prologue and epilogue sections in add3 since it has no local variables and since it does not change any registers other than d0.
A More Elaborate Example:
Now let see at a more elaborate example where main calls add7
int add7 (int a, int
b, int c, int d, int e, int f, int g)
{ return a + b + c + d + e + f + g;
}
int sum = 0;
main()
{
sum
+= add7 (1, 2, 3, 4, 5, 6, 7);
}
Here’s the code for main:
.data
sum: .word 0
.text
main:
# push ra on the stack, main will be
calling add3
addi sp,
sp, -4
stw ra, 0(sp)
#
Pre-call section
# pass the arguments
movi r4,
1
movi r5,
2
movi r6,
3
movi
r7, 4
# allocate space for arguments five through
seven
# Need space for 3 words, or 12 bytes
addi sp, sp, -12
#
pass the fifth argument, it goes on the top of the stack
#
we use r2 as a temporary
movi
r2, 5
stw
r2, 0(sp)
#
pass the sixth argument, it should be the second argument on the stack
#
that is at distance plus four from the top of the stack
#
we use r2 as a temporary
movi
r2, 6
stw
r2, 4(sp)
#
pass the seventh argument, it should be the third argument on the stack
#
that is at distance plus eight from the top of the stack
#
we use r2 as a temporary
movi
r2, 7
stw
r2, 8(sp)
#
call the subroutine
call
add7
#
post-call section
retadd:
# return value is in r2
# adjust the stack by popping the three
values
addi sp, sp, 12
# load sum from memory add r2 to it and
save back to memory
movia r9, sum
ldw r10, 0(r9)
add r10,
r10, r2
stw
r10, 0(r9)
# restore and pop the return address from
the stack
lwdio ra, 0(sp)
addi sp, sp, 4
ret
Here’s the code for add7:
add7:
add r2, r4, r5 # add the first two arguments and place the sum into r2
add r2, r2, r6 # add the third argument to r2
add
r2, r2, r7 # add the fourth argument to
r2
ldw r7, 0(sp) # read the fifth argument from the stack
add r2, r2, r7 # add to r2
ldw r7, 4(sp) # read the sixth argument from the stack
add r2, r2, r7 # add to r2
ldw r7, 8(sp) # read the seventh argument from the stack
add r2, r2, r7 # add to r2
ret
Here’s
how the stack looks like when add7 is called:
sp à |
+0 |
5th
argument |
|
+4 |
6th
argument |
|
+8 |
7th
argument |
|
+12 |
|
|
+16 |
|
Aggregating Stack Changes
Going back to the code for main(), please notice that the adjustments to the stack have been highlighted in blue. Rather that adjusting the stack before and after each call, the code generated by GCC does the adjustments in the beginning of main. Specifically, the compile figures out what is the maximum space that will be needed for the function and pre-allocates the space at the beginning. Focusing on parameter passing the maximum space needed is determined by the callee that has the maximum number of arguments.
In the modified main() that follows please notice that there are only two adjustments made to the stack, one at the prologue and one at the epilogue. As a result, the relative index for the saved “ra” value was changed to +12. The stack frame looks as follows:
After main is called but before any instruction in main executes:
spà |
+16 |
|
After main executes its prolog
sp à |
+0 |
Used to pass the 5th
argument |
|
+4 |
Used to pass the 6th
argument |
|
+8 |
Used to pass the 7th
argument |
|
+12 |
Saved return
address |
|
+16 |
|
.data
sum: .word 0
.text
main:
# PROLOG
# create space on the stack for the return
address and the three parameters that need to be
# pushed on the stack. We need 16 bytes
since we will be saving four words.
addi sp,
sp, -16
# save the return address. This is the
first word that should be pushed, so it ends up at
# the bottom of the stack. Ra now occupies
the bytes at offsets +12 through +15.
stw ra, 12(sp)
#
Pre-call section
# pass the arguments
movi r4,
1
movi r5,
2
movi r6,
3
movi
r7, 4
#
pass the fifth argument, it goes on the top of the stack
#
we use r2 as a temporary
movi
r2, 5
stw
r2, 0(sp)
#
pass the sixth argument, it should be the second argument on the stack
#
that is at distance plus four from the top of the stack
#
we use r2 as a temporary
movi
r2, 6
stw
r2, 4(sp)
#
pass the seventh argument, it should be the third argument on the stack
#
that is at distance plus eight from the top of the stack
#
we use r2 as a temporary
movi
r2, 7
stw
r2, 8(sp)
#
call the subroutine
call
add7
#
post-call section
retadd:
# return value is in r2
# load sum from memory add r2 to it and
save back to memory
movia r9, sum
ldw r10, 0(r9)
add r10,
r10, r2
stw
r10, 0(r9)
#
EPILOGUE
# restore and pop the return address from
the stack
lwdio ra, 12(sp)
# pop all values from the stack
addi sp, sp, 16
ret
The portion of the stack the function allocates and uses is called the stack frame for the function. We have seen that the stack frame contains:
1. The input parameters which were pushed by the calling function.
2. The saved return address if this function calls another one.
3. Space for input parameters for calling other functions.
We’ll complete the description of the stack frame soon.
As we noted in the beginning, we need to figure out what is the maximum space that will be needed on the stack. This is determined by the callee that has the maximum number of arguments amongst all functions that are being called from this one. Even if this is a variable argument function, every call to it has a specific number of arguments. See the following examples. Notice the constants used to adjust the stack and to save and restore the return address (shown in blue):
Example code |
Stack Allocation
Explanation |
Prologue |
Epilogue |
main() { … foo (1, 2, 3) … boo (1, 2, 3, 4, 5, 6, 7, 8) } |
Boo has the maximum number of arguments We’ll need to allocate space for 8-4=4 words on the stack for arguments 5 through 8, and the return address. In total that’s five words, or 20 bytes. |
addi sp, sp, -20 stw ra, 16(sp) |
ldw ra, 16(sp) addi sp, sp, 20 |
main() { … foo (1, 2, 3) … printf (“%d %d %d %d %d”, 2, 3, 4, 5, 6) } |
Printf() takes 6 arguments (we will see that a string is passed by passing its starting address as the parameter). We need to allocate space for 6-4=2 arguments plus the return address. That’s 4*3 = 12. |
addi sp, sp, -12 stw sp, 8(sp) |
ldw ra, 8(sp) addi sp, sp, 12 |
What happens to registers across calls? Callee- vs.
Caller-Saved registers.
In the previous example, the callee (add7) did not change any registers other than r2, and r4 through r7. The caller expects these registers to change as r2 is used to return a value and the others to pass arguments. What if add7 was using other registers? The convention says that registers r16 through r23, and registers r26, r27, and r28 should be preserved across a call. That is, the caller expects that when the callee returns, these registers will have the same values they had before the callee was called. If we read through this statement carefully we can see that it does not say that the registers should not change value while the callee executes. All we have to guarantee is that before returning to the caller the registers must be loaded with the original values. There are two ways of achieving this: (1) Do not touch a register at all, (2) Allow a register to change its value but remember what value it had prior to the call and restore that value prior to returning to the caller. For (2) we can do the following. In the subroutine prologue save on the stack the values of all those registers that the routine will change. In the epilogue restore the registers to their original values using those stored onto the stack.
So registers that must be preserved across a call by the callee are called callee-saved registers.
While this is contrived example, let us assume that add7 was using register r16 to calculate the return value prior to writing it to r2. The modified code follows. Note that add7 first saves the value of r16 on the stack in the prologue and restores it in the epilogue. For this reason, the offsets of the parameters need to change. Notice that the fifth argument is now at offset +4 and not +0, the sixth at +8 and not +4 and the last at offset +12 and not +8 (offsets shown in blue):
Original code that
uses r2 for the partial sum |
Code that uses r16,
a callee-saved register |
add7: add r2, r4, r5 add r2, r2, r6 add r2, r2, r7 ldw r7, 0(sp) add r2, r2, r7 ldw r7, 4(sp) add r2, r2, r7 ldw r7, 8(sp) add r2, r2, r7 ret |
add7: #
Prologue # push r16’s value on the stack addi sp, sp, -4 stw r16, 0(sp) # add the first four arguments add r16, r4, r5 add r16, r16, r6 add r16, r16, r7 # add arguments five through seven ldw r7, 4(sp) add r16, r16, r7 ldw r7, 8(sp) add r16, r16, r7 ldw r7, 12(sp) add r2, r16, r7 #
Epilogue
# restore r16’s
original value ldw r16, 0(sp) addi sp, sp, 4 ret |
Notice that the prologue and epilogue sections are symmetric. One saves register values onto the stack and the other restore them. Restoring is typically done in the reverse order. For example, if we needed to save and restore registers r16, r17 and r18 we will use the following prologue and epilogue sections:
prologue: addi
sp, sp, -12
stw r16, 0(sp)
stw r17, 4(sp)
stw r18, 8(sp)
…
epilogue ldw
r18, 8(sp)
ldw r17, 4(sp)
ldw r18, 0(sp)
addi sp, sp, 12
There is a second class of registers in the calling convention. These are the caller-saved registers. These are registers that are not guaranteed to maintain the value across a call. If the calling function needs these values to be preserved that it has to explicitly save them on the stack prior to the call (i.e., in the pre-call section), and then restore them from the stack after the call (i.e., in the post-call section). In NIOS II r8 through r15, and r2 through r7 are caller-saved registers. Notice that these include the return value and the argument passing registers since these are guaranteed to change whenever a function is called.
In summary here are the register save/restore conventions for NIOS
CALLER-SAVED |
R8 through R15, R2 through R7 |
Save in pre-call / Restore in post-call |
CALLEE-SAVED |
R16 through R23, R26 through R28 |
Save in prologue / Restore in epilogue |
An example follows where we implement add6() by calling add2() three times.
The equivalent C code for add6() is:
int add6(int a, int
b, int c, int d, int e, int f)
{
int t;
t = add2 (a, b);
t += add2 (c, d);
t += add2 (e, f);
return
t;
}
We will be using register r8 for “t”. R8 is a caller-saved register. Note that add6 () does not know whether add2()changes r8, hence it must preserved it value across the call. It must also save registers r6 and r7 which contain arguments “c” and “d”, since these may change during the first call to add2(). Here’s the assembly code for add6():
Here’s how the stack frame looks before any instruction of add6() executes and after space is allocated in the prologue:
When add6() is called
but before any instruction executes in add6():
sp à |
+0 |
5th
argument |
|
+4 |
6th
argument |
|
+8 |
Caller’s saved
return address |
After add6()
allocates space on the stack:
sp à |
+0 |
to preserve r8 |
|
+4 |
to preserve r6 |
|
+8 |
To preserve r7 |
|
+12 |
add6’s return
address |
|
+16 |
5th
argument |
|
+20 |
6th
argument |
|
+24 |
Caller’s saved
return address |
.text
add7:
# Prologue
# allocate space for 4 words: we will be
saving ra, r6, r7, and r8
addi sp, sp, -16
# save ra
stw ra, 12(sp)
# pre-call section
# save r7 and r6 prior to calling add2 for
the first time
stw r7, 8(sp)
stw r6, 4(sp)
# note that a and b are already in r4 and
r5 so
#
that they can be used directly as the first and second arguments by add2
call add2
# post-call section
ldw r6, 4(sp)
ldw r7, 8(sp)
# t = return value of add2
add r8, r2, 0
# pre-call section
# save r8 on the stack
stw r8, 0(sp)
# pass c and d as the two parameters to
add2
add r4, r6, r0
add r5, r7, r0
# we don’t need to preserve r6 and r7 since
we don’t care for these values any more
call add2
# post-call
# restore r8’s value
ldw r8, 0(sp)
#
t += return value
add r8, r8, r2
# pre-call section
# save r8 on the stack
stw r8, 0(sp)
# pass e,f and g as the three parameters to add2
# is now at offset +16, it was at offset +0
before add6 allocated four words on the stack
ldw r4, 16(sp)
ldw r5, 20(sp)
call add2
# post-call
# restore r8’s value
ldw r8, 0(sp)
#
add6’s return value = t + add2’s return value
add r2, r8, r2
# epilogue
ldw ra, 12(sp)
addi sp, sp, 12
# done
ret
Local Variables? Local variables can either be allocated in registers or on the stack immediately after the space allocated for preserving register values.
Stack Frame: This term is used to refer to the stack space allocated per subroutine invocation. Based on our discussion the layout of a stack frame is as follows:
spà |
Local variables |
Allocated by callee |
|
Saved registers |
|
|
Return address |
|
|
Fifth Parameter |
Allocated by caller |
|
sixth Parameter |
|
|
… |