Lecture 6

Andreas Moshovos

Spring 2005

 

Using the Assembly Programming Language to Write Programs

 

As we have discussed, internally instructions and data are represented using binary quantities. To aid in programming a machine, however, we commonly use a symbolic representation of instructions and data. This is the assembly programming language. Here we explain the conventions used by the assembly programming language used for 68k and the specific tools that you are going to be using for the Ultragizmo board. Different CPUs most likely use different assembly languages. Moreover, there may even been different assembly dialects for the same CPU.

 

Once you write an assembly program (which is a text representation of instructions and memory data values), then you have to pass it through an assembler. The assembler is a program that parses your assembly program and translates it into its equivalent binary form. It is this binary representtaion that is then loaded into the computer’s memory and gets executed. Even for that “binary representation” there are specific formats that are used. The operating system (or “monitor” in the case of Ultragizmo) running on the computer knows how interpret these formats and to load your program and data in memory before it is executed by the computer. In the ultragizmo a simple text representation is used to communicate the memory contents including instructions and data. This format is called SREC. Briefly in SREC data values are represented using the hexadecimal system using the ASCII encoding for each hex digit. Each line has the form ADDRESS DATA where ADDRESS is the starting address where the data values encoded in DATA should be stored (note this is a simplified explanation of the format – the actual format is a bit more complicated – you can look at the srec file that the assembler produces and then find a complete description on-line).

 

Let’s see how we could express our example program in 68k assembly:

 

        ORG     $10000

 

start   move.l $20004, d6

        add.l $20008, d0

lala    move.l d6, $20000

 

        ORG     $20000

 

VA      dc.l $00000000

VB      dc.l $11223344

VC      dc.l $22334455

 

The “org $10000” is an assembler directive. That is, it *does not* correspond to a 68k instruction or data. It just tells to the assembler that whatever follows should be placed starting from memory address $10000.

 

The next line is:

 

start   move.l $20004, d6

 

The “move.l $20004, d6” is a textual representation of our first instruction.

 

The “start” prefix defines a label. The label is “start” and what really happens is that if at any point we use the word “start” this is going to get replaced with the constant $10000. The $10000 is the address where the instruction will be stored (since it follows the org statement). Note that similarly, “lala” is later defined to correspond to the constant $1000c (since each instruction is 6 bytes, as we have seen, lala is at $10000 + 2 x 6 = $1000c).

Anything that appears on the first column of your assembly text file is interpreted as a label definition.

 

The next two lines, each defines one instruction. The binary representation of each instruction is placed immediately after the previous instruction in memory. So, the first move will be placed starting at address $10000, the add.l that follows will be placed at address $10000+sizeof(first move.l) = $10000 + 6 = $10006, and finally, the last move.l will be placed immediately after the add.l, or at $10006 + sizeof(add.l) = $10006 + 6 = $1000c. The last move will occupy also 6 bytes (as we explained in the previous lecture) and hence it will occupy addresses $1000c through $10011.

 

The second “org” statement directs the assembler to now place whatever follows starting from address $20000. If there wasn’t an ORG directive whatever followed would have been placed at address $10012 (immediately after the last move.l).

 

The next line is:

 

VA      dc.l $00000000

 

The “VA” defines the label “VA” to be the constant $20000 since this immediately follows the ORG $20000 directive.

 

The “dc.l” is an assembler directive that instructs the assembler to interpret whatever follows as a long word constant. dc = Define Constant. Hence this directive will result in the assembler placing the value $0 as in memory as a long word starting from address $20000.

 

The next line is:

VB      dc.l $11223344

 

This defines VB to be the constant $20004 because it just follows the previous statement which ended up at address $20003. Similarly to the previous line, this places a long word constant in memory.

 

The last line is similar to the previous two.

 

We can now rewrite our program by utilizing labels as opposed to direct constants:

 

 

        ORG     $10000

 

start   move.l VB, d6

        add.l VC, d0

lala    move.l d6, VA

 

        ORG     $20000

 

VA      dc.l $00000000

VB      dc.l $11223344

VC      dc.l $22334455

 

Note that now we do not refer to the variables using their absolute address. Instead we use the labels placed in front of them. This way we could use a different starting address for our data (by changing the parameter of the second ORG) without having to go and update all instruction references.

 

General Notes:

 

Please read through the corresponding section of the Ultragizmo manual (assembly language) for detailed information about all assembly language directives. Here we discuss only some of them.

 

The DC. directive can take a list of values as in:

 

dc.l $01, $02, $03, $04

 

Each value will be placed as a long word consecutively in memory. Thus the aforementioned DC will allocate 16 bytes in memory (four long words).

 

The dc. directive also accepts a datatype. Besides long words, we can use it for bytes and words (.b and .w respectively).

 

Use the $ prefix for hexadecimal numbers, no prefix for decimal numbers, the % for binary numbers and 0 for octal.

You can also refer to ascii values using single quotes as in ‘0’ (this is 48 the ascii code for 0).

You can also use expressions such as ‘lala + 4’.

 

The DS directive take the form DS number and it simply allocates memory space. It does not initialize this space. So DS 100 allocates 100 bytes.

 

Koko EQU $ffff is the equivalent of #define Koko 0xffff in C. It is used to define symbolic constants.

 

Other examples:

 

The following code calculates a = 2 x b + c

 

            org $20000

vb        dc.l $12003200

vc        dc.l $00223311

va        ds        4

 

            org $10000

move.l.            vb, d0

add.l                d0, d0

add.l                vc, d0

move.l             d0, va

What does this calculate?

 

            org $20000

vb        dc.l $12003200

vc        dc.l $00223311

va        ds        4

 

            org $10000

move.l.            vb, d0

add.l                d0, d0

add.l                vc, d0

add.l                d0, d0

add.l                d0, d0

move.l             d0, va