Lecture 5

Andreas Moshovos

Spring 2007

 

Using the Assembly Programming Language to Write Programs

 

As we have discussed, internally instructions and data are represented using binary quantities. To aid in programming a machine, however, we commonly use a symbolic representation of instructions and data. This is the assembly programming language. Here we explain the conventions used by the assembly programming language used for NIOS-II and the specific tools that you are going to be using for the DE2 or DE1 boards. Different CPUs most likely use different assembly languages. Moreover, there may even been different assembly dialects for the same CPU. The particular language we will describe in based on the popular GNU GCC- and BINUTILS-based tools. This GNU tool-chain can be configured to support several architectures. Some of what we will be describing applies to other architectures also. Some of it may seem unnecessarily complex at this stage. Keep in mind that this tool-chain was developed to support the development of complex programs and operating systems and for several architectures. That’s where most of the apparent complexity comes from.

 

Please go over the lab1 handout which can be found on the course’s website as it will give you a more detailed overview of how to program in assembly on the DE2/DE1 system. The current system uses an IDE that hides all of these details. But it’s nice to know what happens underneath.

 

The first step in programming is to write in a text file the assembly program. The assembly program is a text representation of instructions and memory data values). You have to pass this through the assembler. The assembler is a program that parses your assembly program and translates it into its equivalent binary form. It is this binary representation that is then loaded into the computer’s memory and gets executed. There are several different formats for this “binary representation”. The tools that communicates with the DE2 board knows how interpret these formats and to load your program and data in memory before it is executed by the computer. For DE2 a simple text representation is used to communicate the memory contents including instructions and data. This format is called SREC. Briefly in SREC data values are represented using the hexadecimal system using the ASCII encoding for each hex digit. Each line has the form ADDRESS DATA where ADDRESS is the starting address where the data values encoded in DATA should be stored (note this is a simplified explanation of the format – the actual format is a bit more complicated – you can look at the srec file that the assembler produces and then find a complete description on-line).

 

The previous discussion is overly simplistic. In the specific case, converting an assembly program into its loadable form is done in steps. Here are they:

 

1.      Pass all assembly programs through the assembler. The assembler is called nios2-elf-as in our case. This produces an object file (typically identified by the .o suffix). The object files contains the code and data you wrote in the assembly program.

2.      Pass all object files to the linker. The linker is called nios2-elf-ld in our case. The linker links together all object files into a single binary to be loaded onto the board. The format for this binary file is ELF in our case. ELF is a commonly used format for binary programs.

3.      Pass the ELF file through a tool to convert the contents into SREC format.

 

We’ll discuss these steps in more detail during the labs.

 

Let’s see how we could express our example program in NIOS II assembly:

 

      .section .data

      .align 2

va:   .long 0x0

vb:   .long 0x11223344

vc:   .long 0x55667788

 

       .section .text

       .global main

 

main:

        movia r11, vb

        ldw   r9, 0(r11)

        movia r11, vc

        ldw   r10, 0(r11)

        add   r8, r9, r10

        movia r11, vc

        stw   r8, 0(r11)

 

 

The “.section .data” is an assembler directive. That is, it *does not* correspond to a NIOS II instruction or data. It just tells to the assembler that whatever follows will be used as data. This information is used by the linker to pack together all .data declared quantities into a continuous portion of memory.

 

The “.align 2” directive instructs the assembler to start working at an address that is divisble by 2^2=4. This is necessary as we are going to crate word variables in memory. Words must be aligned. If we wanted half words we would have used “.align 1”. There is of course no need to align for bytes.

 

The next line is:

 

va:     .long 0x0

 

The “.long 0x0” instructs the assembler to allocate four bytes in memory and to initialize them to zero. When these four bytes are viewed as a word, its numeric value is zero. We used a hexadecimal representation for the number but we could have used decimal. The “va:” is a label. The assembler is instructed to tread the identified “va” as the number that corresponds to the memory location where the “.long 0x0” is placed. At this stage this address is unknown. However, at link time the linker will bind it to a specific memory location and thus will update all references accordingly. Unfortunately, with the current infrastructure it is not possible to easily force va to be at a specific location in memory. We have to rely on the linker to allocate it. So, if eventually the linker allocated “va  at location 0x00100040, then all references to va will be replaced to references to this number. Any time you wrote va it’s as if you wrote 0x00100040.

 

The next two lines instruct the assembler to allocate word for vb and vc respectively.

 

The “.section .text” instructs the assembler that whatever comes next should be treated as instructions. The effect is that eventually the linker will pack together all “.text” sections into a continuous portion of memory that precedes all “.data” sections.

 

The “.global main” instructs the assembler to pass on the linker the identifier “main”. The standard libraries linked with your program look for this symbol and treat it as the entry to your code. Your program should include a label “main:” at some point. This is the next statement.

 

movia r11, vb” is the textual representation of the first instruction. This is actually a pseudo-instruction that is provided by the assembler for your convenience. It gets translated into two instructions:

 

        movhi   r11, upper 16 bits of vb

        addi    r11, r11, lower 16 bits of vb

 

 

The rest of the program should be self-explanatory given the information we presented thus far.

 

You can now compile your program. You have to call the assembler and then the linker. As part of the labs you should have received a preconfigured software package that has all the necessary arguments.

Create a file called bpc.s write in it the assembly program and then do “make SRCS=bpc.s compile”. This will produce amongst other files, a file called prog.elf. You can inspect its contents as follows:

nios2-elf-objdump –d prog.elf. This will produce a disassembly of the .text section (note the code below uses ldwio and stwio instead of ldw and stw – we will describe the difference between these at a later time, for the time being treat them as equivalent):

 

prog.elf:     file format elf32-littlenios2

 

Disassembly of section .text:

 

01000000 <main>:

 1000000:       02c04034        movhi   r11,256

 1000004:       5ac01804        addi    r11,r11,96

 1000008:       5a400037        ldwio   r9,0(r11)

 100000c:       02c04034        movhi   r11,256

 1000010:       5ac01904        addi    r11,r11,100

 1000014:       5a800037        ldwio   r10,0(r11)

 1000018:       4a91883a        add     r8,r9,r10

 100001c:       02c04034        movhi   r11,256

 1000020:       5ac01704        addi    r11,r11,92

 1000024:       5a000035        stwio   r8,0(r11)

 

The format of each line is: “address in memory”, “binary encoding of the instruction”, “textual representation of the instruction”

You’ll notice there are more instructions after that. That’s the default init code for C programs. Please ignore for the time being.

 

If you want to see where the variables va, vb, and vc where defined you can use “nios2-elf-objdump –syms prog.elf”. You’ll get many symbols but if you look closely our variables should be there too:

 

01000060 l       .data  00000000 vb

01000064 l       .data  00000000 vc

0100005c l       .data  00000000 va

 

These three lines give the address, the type (l = long = word), the section (.data) and the symbolic name used in the program.

 

Now you should be able to figure out why the first two instructions are:

 

1000000:       02c04034        movhi   r11,256

1000004:       5ac01804        addi    r11,r11,96

 

Note that the numbers in the instuctions are in decimal and not hexademical. To add insult to injury, the first two numbers, that is the address and the instruction encoding, are in hexadecimal with the 0x omitted.

The end result of these two instructions would be that r11 gets the value 0x0100064.  These two instructions are the same as:

 

1000000:       02c04034        movhi   r11,0x0100

1000004:       5ac01804        addi    r11,r11,0x64

 

Other examples:

 

The following code calculates a = 2 x b + c

 

        .section .text

        .global main

main:

        movia   r11, vb

        ldw     r9, 0(r11)

        movia   r11, rc

        ldw     r10, 0(r11)

        add     r8, r9, r10

        add     r8, r8, r9

        movia   r11, va

        stwio   r8, 0(r11)

 

The same for this:

 

        .section .text

        .global main

main:

        movia   r11, vb

        ldw     r9, vb

        add     r9, r9, r9

        movia   r11, vc

        ldw     r10, 0(r11)

        add     r8, r9, r10

        movia   r11, va

        stw     r8, 0(r11)

 

This calculates a  = 3 x b + 5 x c

 

        .section .text

        .global main

main:

        movia   r11, vb

        ldw     r9, 0(r11)

        add     r10, r9, r9

        add     r9, r10, r9

        movia   r11, vc

        ldw     r10, 0(r11)

        add     r8, r10, r10

        add     r8, r8, r8

        add     r8, r8, r10

        add     r8, r9, r8

        movia   r11, vc

        stw     r8, 0(r11)

 

Other Datatypes

 

The examples we have seen so far use words, i.e., full 32-bit numbers. NIOS II provides byte and half-word loads and stores. The byte load and stores are:

 

ldb   rX,  Imm16(rY)

ldbu  rX, Imm16(rY)

stb  rX,  Imm16(rY)

 

The “imm16(rY)” part is identical to the ldwio we have seen up to this point. It calculates the address that is accessed as: value of register RY + sign-extended(imm16).

ldb” loads a single byte from the memory address and sign-extends it to 32-bits into register rX. So if the value read from memory is 0x7F rX will get the value 0x0000007F. If the value read is 0x81, rX will get the value 0xFFFFFF81. This instruction treats the byte as a signed 8-bit integer in 2’s complement.

 

Ldbu” treats the byte as an unsigned 8-bit integer and zero-extends it to 32-bits in rX. So if the values read are 0x7F or 0x81, rX will be written with the values 0x0000007F and 0x00000081 respectively.

 

Stb” takes the lower 8-bits of register rX and writes them to the memory byte referenced. There is no issue of sign- or zero-extension since we take an 8-bit quantity and write to another 8-bit quantity.

 

The equivalent instructions for half-word are:

 

ldh rX,  Imm16(rY)

ldhu rX, Imm16(rY)

sth rX, Imm16(rY)

 

They read and write two bytes from memory. The two loads tread the word as either a signed 16-bit integer (ldh) or as an unsigned 16-bit integer (ldhu) and respectively sign- or zero-extend it to 32 bits.

sth” takes the lower two bytes of rX and writes them to memory.

 

A Shorter Program

 

We can use the immediate field of the load and store instructions to reduce the number of instructions needed to implement our programs.

Take for example the a = b + c program.  Because we declare data as follows:

 

      .section .data

      .align 2

va:   .long 0x0

vb:   .long 0x11223344

vc:   .long 0x55667788

 

We know that vb = va + 2 and vc = va + 4. That is the address represented by vb is the address represented by va plus two.

 

So, now instead of using movia for every address we can instead use one to load va and then use distances (or offsets) to access the other variables:

 

      .text

main:

       movia r11, va

       ldw    r8, 4(r11)           # access vb

       ldw    r9, 8(r11)           # access vc

       add    r8, r8, r9

  stw    r8, 0(r11)           # write the result into va

 

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Reducing the Instruction Count Further

 

We will cover this part of the lecture later in the course. Please ignore for the time being.

 

In our program every load and store to a memory variable is translated into three instructions. Two for the movia and one for the actual load. We can reduce the instructions as follows:

 

        .section .text

        .global main

main:

        movhi   r11, %hiadj(vb)

        ldw     r9, %lo(vb)(r11)

        movhi   r11, %hiadj(vb)

        ldw     r10, %lo(vb)(r11)

        add     r8, r9, r10

        movhi   r11, %hiadj(va)

        stw     r8, %lo(va)(r11)

 

        .section .data

        .align 2

va:     .long 0x0

vb:     .long 0x11223344

vc:     .long 0x55667788

 

Here we use two assembly macros: %hiadj(label) and %lo(label). These are macros for the assembler and they are evaluated by the assembler and converted into number. They are not evaluated at run time.

Given a label or equivalently a 32-bit number the two macros can be used to calculate that immediate using a movhi and an addition. So, “movhi r11, %hiadj(vb)” followed by ldw r9, %lo(vb)(r11)” effectively read from memory location vb. A “movhi r11, %hiadj(vb)” followed by “addi r11, r11, %lo(vb)” writes the 32-bit value vb into r11.

 

An Even Shorter Program

 

Finally, here’s how we can can use r11 as the base register for all loads, thus eliminating the two extra movhi instructions. In the following code we exploit that va, vb, and vc are allocated consecutively in memory. Thus the distance in memory addresses of vb from va is 4 and the distance of vc from va is 8:

 

        .section .text

        .global main

main:

        movhi   r11, %hiadj(vb)

        ldw     r9, %lo(va)+4(r11)

        ldw     r10, %lo(va)+8(r11)

        add     r8, r9, r10

        stw     r8, %lo(va)+0(r11)

 

        .section .data

        .align 2

va:     .long 0x0

vb:     .long 0x11223344

vc:     .long 0x55667788