Andreas Moshovos

Fall 2007

 

Subroutines Continued: Passing Arguments, Returning Values and Allocating Local Variables

 

Thus far we have seen the mechanisms via which a subroutine can be called and return to its caller. In this lecture we will be looking at the mechanisms that are used for passing arguments and for returning a value. Please keep in mind that what we present is the calling convention used by the popular GNU gcc compiler for the NIOS II processor family.

 

For clarity, we will assume that all arguments and values returned are words, i.e., 32 bits. At the end we will discuss the conventions for passing longer arguments and returning longer values and aggregate datatypes (structures).

 

Let us first ignore local variables and instead focus on parameter passing and return values.

 

In this calling convention the return value is returned into register r2. The first four parameters are passed respectively in registers r4, r5, r6, and r7. Additional parameters are passed through the stack in order, so that the last parameter is on the top of the stack prior to calling the function. Here are a couple of examples to clarify things where we show functions and where do they expect to find their arguments and where they return their value. The last column shows how to call the function. We will clarify this later in the lecture so you may ignore it if you find it confusing at first.

 

C Code

Explanation

Assembly code

How to call passing 1, 2, 3, … as arguments. Code assumes that space has been pre-allocated on the stack

int add0(void)

{

   return 0;

}

 

This function takes no arguments, it will return the value into register r2.

 

add0:

      add r2, r0, r0

      ret

call add0

int add1(int a)

{

   return a + 10;

}

This function takes a single argument “a”, which it expects to find in register r4. It will return the value in r2.

 

add1:

      addi r2, r4, 10

      ret

movi r4, 1

call add1

int add2(int a, int b)

{

    return a + b;

}

It expects “a” to be in r4, and “b” in r5. It returns the sum in r2.

 

add2:

      add r2, r4, r5

      ret

movi r4, 1

movi r5, 2

call add2

int add3(int a, int b, int c)

{

   return a + b + c;

}

 

It expects “a” to be in r4, “b” in r5, and “c” in r6. It returns the sum in r2.

 

add3:

      add r2, r4, r5

      add r2, r2, r6

      ret

movi r4, 1

movi r5, 2

movi r6, 3

call add3

int add4(int a, int b, int c, int d)

{

   return a + b + c + d;

}

 

 

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It returns the sum in r2.

add4:

      add r2, r4, r5

      add r2, r2, r6

      add r2, r2, r7

      ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

call add4

int add5(int a, int b, int c, int d, int e)

{

   return a + b + c + d + e;

}

 

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldwio __, 0(sp)” where “__” is a register (for example, r9). It returns the sum in r2.

 

add5:

      add r2, r4, r5

      add r2, r2, r6

      add r2, r2, r7

      ldwio r7, 0(sp)

      add r2, r2, r7

      ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stwio r2, 0(sp)

call add5

int add6(int a, int b, int c, int d, int e, int f)

{

   return a + b + c + d + e + f;

}

 

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldwio __, 0(sp)” where “__” is a register (for example, r9). It expects “f” to be the second element of the stack. It can access “f” via a “ldwio __, 4(sp)”. It returns the sum in r2.

 

add6:

      add r2, r4, r5

      add r2, r2, r6

      add r2, r2, r7

      ldwio r7, 0(sp)

      add r2, r2, r7

      ldwio r7, 4(sp)

      add r2, r2, r7

      ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stwio r2, 0(sp)

movi r2, 6

stwio r2, 4(sp)

call add6

int add7(int a, int b, int c, int d, int e, int f, int g)

{

   return a + b + c + d + e + f;

}

 

It expects “a” to be in r4, “b” in r5, “c” in r6, and “d” in r7. It expects “e” to be on top of the stack. It can access “e” via a “ldwio __, 0(sp)” where “__” is a register (for example, r9). It expects “f” to be the second element of the stack. It can access “f” via a “ldwio __, 4(sp)”. It expects “g” to be the third element of the stack. It can access “g” via a “ldwio __, 8(sp)”. It returns the sum in r2.

 

add7:

      add r2, r4, r5

      add r2, r2, r6

      add r2, r2, r7

      ldwio r7, 0(sp)

      add r2, r2, r7

      ldwio r7, 4(sp)

      add r2, r2, r7

      ldwio r7, 8(sp)

      add r2, r2, r7

      ret

movi r4, 1

movi r5, 2

movi r6, 3

movi r7, 4

movi r2, 5

stwio r2, 0(sp)

movi r2, 6

stwio r2, 4(sp)

movi r2, 7

stwio r2, 8(sp)

call add7

 

The code for performing a call generally will consist of the following four sections (shown underlined):

 

Caller                          Callee

 

…                                prologue

pre-call                        main body

call callee                    epilogue

post-call                       ret

…

 

Pre-Call: Prior to making the call, the caller we will have to take some actions.  In particular, it must write the appropriate values into the registers for the first four parameters if any. It then needs to push on the stack any other parameters. Parameters 5 and higher are pushed onto the stack in order. That is, we first push onto the stack the fifth parameter, then the sixth and so on.

 

Post-Call: After the call, the caller must de-allocate the stack space it allocated in the pre-call section for passing arguments, if any.

 

Prologue: In this section the callee will be allocating space for local variables and will be taking appropriate actions for preserving those register values that it should not change such as the return address register. We will revisit this later on.

 

Epilogue: In this section, the callee will be reversing all actions that took place in the prologue. We will expand on this later on.

 

Let’s see an example:

 

int add3 (int a, int b, int c)

{  return a + b + c;

}

 

int sum = 0;

main()

{

sum += add3 (1, 2, 3);

}

 

Here’s the code for main:

 

     .data

sum: .word 0

 

     .text

main:

     # push ra on the stack, main will be calling add3

     addi sp, sp, -4

     stwio ra, 0(sp)

 

     # Pre-call section

     # pass the arguments

     addi sp, sp, -12   # make space for two arguments on the stack

     movi r4, 1

     movi r5, 2

     movi r6, 3

     movi r7, 4

 

     call add3      ; call the subroutine

 

     # post-call section

retadd:

     # return value is in r2

     # nothing to do

 

     # load sum from memory add r2 to it and save back to memory

     movia r9, sum

     ldwio r10, 0(r9)

     add  r10, r10, r2

     stwio  r10, 0(r9)

 

     # restore return address from the stack

     lwdio ra, 0(sp)

     addi sp, sp, 4

     ret

 

Here’s the code for add3:

 

add3:   

    add r2, r4, r5 #  add the first two parameters and store the sum in r2

    add r2, r2, r6 # add the third parameter to r2

    ret           # return to the caller

 

There are no prologue and epilogue sections in add3 since it has no local variables and since it does not change any registers other than d0.

 

A More Elaborate Example:

 

Now let see at a more elaborate example where main calls add7

int add7 (int a, int b, int c, int d, int e, int f, int g)

{  return a + b + c + d + e + f + g;

}

 

int sum = 0;

main()

{

sum += add7 (1, 2, 3, 4, 5, 6, 7);

}

 

 

Here’s the code for main:

 

     .data

sum: .word 0

 

     .text

main:

     # push ra on the stack, main will be calling add3

     addi sp, sp, -4

     stwio ra, 0(sp)

 

     # Pre-call section

     # pass the arguments

     movi r4, 1

     movi r5, 2

     movi r6, 3

movi r7, 4

 

     # allocate space for arguments five through seven

     # Need space for 3 words, or 12 bytes

addi sp, sp, -12

 

# pass the fifth argument, it goes on the top of the stack

# we use r2 as a temporary

movi r2, 5

stwio r2, 0(sp)

# pass the sixth argument, it should be the second argument on the stack

# that is at distance plus four from the top of the stack

# we use r2 as a temporary

movi r2, 6

stwio r2, 4(sp)

# pass the seventh argument, it should be the third argument on the stack

# that is at distance plus eight from the top of the stack

# we use r2 as a temporary

movi r2, 7

stwio r2, 8(sp)

 

# call the subroutine

call add7

 

     # post-call section

retadd:

     # return value is in r2

     # adjust the stack by popping the three values

     addi sp, sp, 12

 

     # load sum from memory add r2 to it and save back to memory

     movia r9, sum

     ldwio r10, 0(r9)

     add  r10, r10, r2

     stwio  r10, 0(r9)

 

     # restore and pop the return address from the stack

     lwdio ra, 0(sp)

     addi sp, sp, 4

     ret

 

Here’s the code for add7:

 

add7:

      add r2, r4, r5    # add the first two arguments and place the sum into r2

      add r2, r2, r6    # add the third argument to r2

      add r2, r2, r7    # add the fourth argument to r2

      ldwio r7, 0(sp)   # read the fifth argument from the stack

      add r2, r2, r7    # add to r2

      ldwio r7, 4(sp)   # read the sixth argument from the stack

      add r2, r2, r7    # add to r2

      ldwio r7, 8(sp)   # read the seventh argument from the stack

      add r2, r2, r7    # add to r2

     ret

 

Here’s how the stack looks like when add7 is called:

 

sp à

+0

5th argument

 

+4

6th argument

 

+8

7th argument

 

+12

Main’s saved return address

 

+16

 

 

Aggregating Stack Changes

 

Going back to the code for main(), please notice that the adjustments to the stack have been highlighted in blue. Rather that adjusting the stack before and after each call, the code generated by GCC does the adjustments in the beginning of main. Specifically, the compile figures out what is the maximum space that will be needed for the function and pre-allocates the space at the beginning. Focusing on parameter passing the maximum space needed is determined by the callee that has the maximum number of arguments.

 

In the  modified main() that follows please notice that there are only two adjustments made to the stack, one at the prologue and one at the epilogue. As a result, the relative index for the saved “ra” value was changed to +12. The stack frame looks as follows:

 

After main is called but before any instruction in main executes:

 

spà

+16

 

 

After main executes its prolog

           

sp à

+0

Used to pass the 5th argument

 

+4

Used to pass the 6th argument

 

+8

Used to pass the 7th argument

 

+12

Saved return address

 

+16

 

 

 

     .data

sum: .word 0

 

     .text

main:

     # PROLOG

     # create space on the stack for the return address and the three parameters that need to be

     # pushed on the stack. We need 16 bytes since we will be saving four words.

     addi sp, sp, -16

     # save the return address. This is the first word that should be pushed, so it ends up at

     # the bottom of the stack. Ra now occupies the bytes at offsets +12 through +15.

     stwio ra, 12(sp)

 

     # Pre-call section

     # pass the arguments

     movi r4, 1

     movi r5, 2

     movi r6, 3

movi r7, 4

 

# pass the fifth argument, it goes on the top of the stack

# we use r2 as a temporary

movi r2, 5

stwio r2, 0(sp)

# pass the sixth argument, it should be the second argument on the stack

# that is at distance plus four from the top of the stack

# we use r2 as a temporary

movi r2, 6

stwio r2, 4(sp)

# pass the seventh argument, it should be the third argument on the stack

# that is at distance plus eight from the top of the stack

# we use r2 as a temporary

movi r2, 7

stwio r2, 8(sp)

 

# call the subroutine

call add7

 

     # post-call section

retadd:

     # return value is in r2

 

     # load sum from memory add r2 to it and save back to memory

     movia r9, sum

     ldwio r10, 0(r9)

     add  r10, r10, r2

     stwio  r10, 0(r9)

    

     # EPILOGUE

     # restore and pop the return address from the stack

     lwdio ra, 12(sp)

     # pop all values from the stack

     addi sp, sp, 16

     ret

 

The portion of the stack the function allocates and uses is called the stack frame for the function. We have seen that the stack frame contains:

1.      The input parameters which were pushed by the calling function.

2.      The saved return address if this function calls another one.

3.      Space for input parameters for calling other functions.

 

We’ll complete the description of the stack frame soon.

 

As we noted in the beginning, we need to figure out what is the maximum space that will be needed on the stack. This is determined by the callee that has the maximum number of arguments amongst all functions that are being called from this one. Even if this is a variable argument function, every call to it has a specific number of arguments. See the following examples. Notice the constants used to adjust the stack and to save and restore the return address (shown in blue):

 

Example code

Stack Allocation Explanation

Prologue

Epilogue

main()

{

   …

   foo (1, 2, 3)

   …

   boo (1, 2, 3, 4, 5, 6, 7, 8)

}

Boo has the maximum number of arguments

We’ll need to allocate space for 8-4=4 words on the stack for arguments 5 through 8, and the return address. In total that’s five words, or 20 bytes.

addi sp, sp, -20

stwio ra, 16(sp)

 

 

ldwio ra, 16(sp)

addi sp, sp, 20

main()

{

   …

   foo (1, 2, 3)

   …

   printf (“%d %d %d %d %d”, 2, 3, 4, 5, 6)

}

Printf() takes 6 arguments (we will see that a string is passed by passing its starting address as the parameter).

We need to allocate space for 6-4=2 arguments plus the return address. That’s 4*3 = 12.

addi sp, sp, -12

stwio sp, 8(sp)

 

 

ldwio ra, 8(sp)

addi sp, sp, 12

 

What happens to registers across calls? Callee- vs. Caller-Saved registers.

 

In the previous example, the callee (add7) did not change any registers other than r2, and r4 through r7. The caller expects these registers to change as r2 is used to return a value and the others to pass arguments. What if add7 was using other registers? The convention says that registers r16 through r23, and registers r26, r27, and r28 should be preserved across a call. That is, the caller expects that when the callee returns, these registers will have the same values they had before the callee was called. If we read through this statement carefully we can see that it does not say that the registers should not change value while the callee executes. All we have to guarantee is that before returning to the caller the registers must be loaded with the original values. There are two ways of achieving this: (1) Do not touch a register at all, (2) Allow a register to change its value but remember what value it had prior to the call and restore that value prior to returning to the caller.  For (2) we can do the following. In the subroutine prologue save on the stack the values of all those registers that the routine will change. In the epilogue restore the registers to their original values using those stored onto the stack.

 

So registers that must be preserved across a call by the callee are called callee-saved registers.

 

While this is contrived example, let us assume that add7 was using register r16 to calculate the return value prior to writing it to r2. The modified code follows. Note that add7 first saves the value of r16 on the stack in the prologue and restores it in the epilogue. For this reason, the offsets of the parameters need to change. Notice that the fifth argument is now at offset +4 and not +0, the sixth at +8 and not +4 and the last at offset +12 and not +8 (offsets shown in blue):

 

Original code that uses r2 for the partial sum

Code that uses r16, a callee-saved register

add7:

      add r2, r4, r5

      add r2, r2, r6

      add r2, r2, r7

 

      ldwio r7, 0(sp)

      add r2, r2, r7

      ldwio r7, 4(sp)

      add r2, r2, r7

      ldwio r7, 8(sp)

      add r2, r2, r7

 

      ret

add7:

      # Prologue

      # push r16’s value on the stack

      addi sp, sp, 4

      stwio r16, 0(sp)

 

      # add the first four arguments

      add r16, r4, r5

      add r16, r16, r6

      add r16, r16, r7

      # add arguments five through seven

      ldwio r7, 4(sp)

      add r16, r16, r7

      ldwio r7, 8(sp)

      add r16, r16, r7

      ldwio r7, 12(sp)

      add r2, r16, r7

 

      # Epilogue

      # restore r16’s original value

      ldwio r16, 0(sp)

      addi sp, sp, 4

     

      ret

 

 

Notice that the prologue and epilogue sections are symmetric. One saves register values onto the stack and the other restore them. Restoring is typically done in the reverse order. For example, if we needed to save and restore registers r16, r17 and r18 we will use the following prologue and epilogue sections:

 

            prologue:         addi sp, sp, 12

                                    stwio r16, 0(sp)

                                    stwio r17, 4(sp)

                                    stwio r18, 8(sp)

            …

            epilogue           ldwio r18, 8(sp)

                                    ldwio r17, 4(sp)

                                    ldwio r18, 0(sp)

                                    addi sp, sp, 12

 

 

There is a second class of registers in the calling convention. These are the caller-saved registers. These are registers that are not guaranteed to maintain the value across a call. If the calling function needs these values to be preserved that it has to explicitly save them on the stack prior to the call (i.e., in the pre-call section), and then restore them from the stack after the call (i.e., in the post-call section). In NIOS II r8 through r15, and r2 through r7 are caller-saved registers. Notice that these include the return value and the argument passing registers since these are guaranteed to change whenever a function is called.

 

In summary here are the register save/restore conventions for NIOS

 

CALLER-SAVED

R8 through R15, R2 through R7

Save in pre-call / Restore in post-call

CALLEE-SAVED

R16 through R23, R26 through R28

Save in prologue / Restore in epilogue

 

 

An example follows where we implement add6() by calling add2() three times.

 

The equivalent C code for add6() is:

 

int add6(int a, int b, int c, int d, int e, int f)

{

            int t;

 

            t = add2 (a, b);

            t += add2 (c, d);

            t += add2 (e, f);

return t;
}

 

We will be using register r8 for “t”. R8 is a caller-saved register. Note that add6 () does not know whether add2()changes r8, hence it must preserved it value across the call.  It must also save registers r6 and r7 which contain arguments “c” and “d”, since these may change during the first call to add2(). Here’s the assembly code for add6():

 

Here’s how the stack frame looks before any instruction of add6() executes and after space is allocated in the prologue:

 

When add6() is called but before any instruction executes in add6():

 

sp à

+0

5th argument

 

+4

6th argument

 

+8

Caller’s saved return address

 

After add6() allocates space on the stack:

 

sp à

+0

to preserve r8

 

+4

to preserve r6

 

+8

To preserve r7

 

+12

add6’s return address

 

+16

5th argument

 

+20

6th argument

 

+24

Caller’s saved return address

 

     .text

add7:

     # Prologue

     # allocate space for 4 words: we will be saving ra, r6, r7, and r8

     addi sp, sp, 16

     # save ra

     stwio ra, 12(sp)

    

     # pre-call section

     # save r7 and r6 prior to calling add2 for the first time

     stwio r7, 8(sp)

     stwio r6, 4(sp)

     # note that a and b are already in r4 and r5 so

# that they can be used directly as the first and second arguments by add2

     call add2

 

     # post-call section

     ldwio r6, 4(sp)

     ldwio r7, 8(sp)

 

     # t = return value of add2

     add r8, r2, 0

 

     # pre-call section

     # save r8 on the stack

     stwio r8, 0(sp)

     # pass c and d as the two parameters to add2

     add r4, r6, r0

     add r5, r7, r0

     # we don’t need to preserve r6 and r7 since we don’t care for these values any more

     call add2

 

     # post-call

     # restore r8’s value

     ldwio r8, 0(sp)

 

# t += return value
     add r8, r8, r2

 

     # pre-call section

     # save r8 on the stack

     stwio r8, 0(sp)

     # pass e,f and g  as the three parameters to add2

     # is now at offset +16, it was at offset +0 before add6 allocated four words on the stack

     ldwio r4, 16(sp)

     ldwio r5, 20(sp)

 

     call add2

 

     # post-call

     # restore r8’s value

     ldwio r8, 0(sp)

 

# add6’s return value = t + add2’s return value
     add r2, r8, r2

 

     # epilogue

     ldwio ra, 12(sp)

     addi sp, sp, 12

 

     # done

     ret

 

 

 

Local Variables? Local variables can either be allocated in registers or on the stack immediately after the space allocated for preserving register values.

 

Stack Frame: This term is used to refer to the stack space allocated per subroutine invocation. Based on our discussion the layout of a stack frame is as follows:

 

spà

Local variables

Allocated by callee

 

Saved registers

 

Return address

 

Fifth Parameter

Allocated by caller

 

sixth Parameter