Lecture 2

Andreas Moshovos

Spring 2008

Binary Manipulation Instructions:

Thus far we have seen instructions that move data (from/to memory or registers), calculate arithmetic functions (add, sub) or that change the control flow. There are other classes of instructions in 68k that are also commonly found in most modern processors. The first class of instructions we will discuss, implements bit-wise boolean logic operations. The second class performs shifts and rotates. Some processors also include instructions that operate on single bits. NIOS II does not. However, we will explain how we can use a sequence of instructions to implement single bit manipulation instructions.

Bitwise Boolean Logic Instructions

As an example let’s consider the AND instruction. All bitwise instructions treat their arguments as a collection or 32 independent bits without assigning an arithmetic value to the collection. For them, a register or a memory value is just an ordered collection of 32 bits. They then operate on each of those 32 bits separately.

So, AND r8, r9, r10 performs 32 boolean AND calculations in parallel. Each of those 32 AND calculations is done using two bits, one from the r9 register and one from the r10 register. The bit 0 of r9 is used with bit 0 of r10, bit 1 of r9 is used with bit 1 of r10 and so on. Each of two 32 bit pairs produces a result which is written into the corresponding bit position in the destination register (r8). That is, the result of (bit 0 of r9 AND bit0 of r10) is written into bit 0 of r8.

As an example assume the following initial values of D0 and D1:

3 2 1

1 3 5 7 0

R9 = 00000101 11111111 11001100 00110011

R10 = 11111100 00000111 00111001 00010010

------------------------------------------------------

R8 00000100 00000111 00001000 00010010 <-- AND R8, R9, R10

Note that this is equivalent to the “&” operation in C. The “&&” operation in C operates on TRUE and FALSE values (non-zero and zero respectively in most C dialects – some define <= 0 as false).

These operations are called bitwise because they treat their operands as a collection of bits that are operated upon individually.

AND with Immediate: There are two variations of AND where the source operand can be an immediate. These are the ANDI and the ANDHI instructions. The immediate is only 16 bits. In the case of ANDI the immediate is zero-extended to 32 prior to the calculation. In the case of of ANDHI 16 zero bits are appended to the immediate prior to the calculation. For example, ADDI r9, r10, 0xFFFF and ANDHI r9, r10, 0xFFFF bitwise and r10 with 0x0000FFFF and 0xFFFF0000 respetively.

Other bitwise logical operations include: OR, ORI, ORHI, NOR, XOR, XORI, and XORHI. Where XOR is the Exclusive OR operation (0 XOR 0 = 0, 0 XOR 1 = 0, 1 XOR 0 = 1, and 1 XOR 1 = 0).

As a side note, here’s a quick way to swap the values of two registers:

XOR r9, r9, r10

XOR r10, r10, r9

XOR r9, r9, r10

If you have not seen this before try to figure out how it works? Try a couple of examples, then try to express the outcome of each instruction as a function of the original values of r9 and r10 and XOR. Keep in mind that A XOR A = 0, A XOR 0 = A and A XOR 1…1 = not(A). Some machines provide a swap instruction.

The C equivalents of the aforementioned bitwise operations are:

1. a AND b à a & b

2. a OR b à a | b

3. a XOR b à a ^ b

4. NOT a à ~a

NIOS II does not have a NOT instruction. NOT inverts all bits of a value. However, this can be synthesized using XORI and XORHI. Here’s how we can invert the value in register r9:

XORI R9, R9, 0xFFFF

XORHI R9, R9, 0xFFFF

This works because given a bit x then x XOR 1 = NOT x (substitute 0 and 1 for x and see).

Shifts:

The second class of bit manipulation instructions treats their operands as series of bits placed one after the other. They displace these bits by a number of positions. Lets us look at each one of these instructions. The shifts correspond to the “>>” and “<<” operations in C. There are no C equivalents for the rotate instructions (although we can synthesize them using shifts and bitwise logical operations).

First are logical shifts. For clarity

Logical shifts can be performed either to the left or to the right. In shifts, the bits of the operands being shifted are treated as 32 bits placed in order one after the other and on a straight line. The instruction shifts those bits either to the left or to the right by a number of bit positions. The positions that are left empty are filled with zeroes and the bits that are shifted out of limits are simply discarded.

The two instructions are:

SLL Rdest, Rsource, Rbits à shift left the bits from Rsource by Rbits bits, write the result to Rdest

SRL.Rdest, Rsource, Rbits à shift right the bits from Rsource by Rbits bits, write the result to Rdest

Consider for example a logical shift right:

srl r8, r9, r10

Assume that initially the value of r9 is 0x12345678 and that r10 contains the value 1. We have to write r9 in binary to understand what is going on: 0001 0010 0011 0100 0101 0110 0111 1000. Now let’s shift this left by one bit: 0000 1001

Original value	0001 0010 0011 0100 0101 0110 0111 1000
Shifted right by 1 bit	0000 1001 0001 1010 0010 1011 0011 1100

Note that the vacant bit position (bit 31 – shown in bold and underlined) was filled with a zero. If r10 was 4 the resulting value would be 0000 0001 0010 0011 0100 0101 0110 0111. If r10 was 32 the resulting value would be 0.

The left shift is similar with the only difference being that the bits move to the left. So:

Sll r8, r9, r10

With the same assumptions gives:

Original value	0001 0010 0011 0100 0101 0110 0111 1000
Shifted left by 1 bit	0010 0100 0110 1000 1010 1100 1111 0000
Shifted left by 4 bits	0010 0011 0100 0101 0110 0111 1000 0000

Besides SLL and SRL there are is also SLLI and SRLI. These take the form SLLI Rdest, Rsource, Imm5 and SRL Rdest, Rsource, Imm5 where Imm5 is a five bit immediate. They shift Rsource by the number of bits given in the 5 bit immediate.

In summary, logical shifts move the bits by a specified number of bit positions to the left or right. Bits positions that are emptied are filled with zeroes and bits that are moved out of the bit positions that are available are discarded.

Relation of Shifts with Division and Multiplication: If the operand being shifted is interpreted as an unsigned integer then shifting left by 1 bit is equivalent to multiplying the operand by 2. Similarly shifting right by 1 bit is equivalent to dividing by 2. It follows that:

1. Shifting left by n bits is equivalent to multiplying by 2^n

2. Shifting right by n bits is equivalent to dividing by 2^n

Arithmetic Shift: In addition to the logical shift right there is also an arithmetic shift right which takes the forms:

SRA Rdest, Rsource, Rbits

SRAI Rdest, Rsource, Imm5

These are similar to the logical shift right. However, the differ in what value they fill in the vacant positions created after the shift. While the logical fills in the emptied bits with zeroes, the arithmetic shift fills them with the most significant bit of the source operand.

For example, assume R9 = 1101 0010 0000 0000 0000 0000 00000 and R8 = 0101 0011 0000 0000 0000 0000 0000 then:

SRA R9, R9, 3 à will produce 1111 1010 0100 0000 0000 0000 0000 0000

SRL R9, R9, 3 à will produce 0001 1010 0100 0000 0000 0000 0000 0000

SRA R8, R8, 2 à produces 0001 0100 1100 0000 0000 0000 0000 0000

SRL R8, R8, 2 à produces 0001 0100 1100 0000 0000 0000 0000 0000

Arithmetic shifts to the right by n bits is equivalent to dividing by a power of 2 with one exception: if the operand is -1 then the arithmetic shift right does not produce 0 as it should. It produces -1 (note -1 is represented as all ones).

ROTATES:

The last class of bit manipulation instructions treats their source/destination operand as a sequence of bits that are placed on a circle. That is the bits are ordered (bit 1 is next to bit 2 which is next to bit 3 and so on) and bits 0 and 7 (for byte instructions) are also adjacent.

The two instructions are:

ROR Rdest, Rsource, Rbits à rotate Rsource to the right by Rbits bits. To the right means towards bit 0. The result is written into register Rdest.

ROL Rdest, Rsource, Rbits à rotate Rsource to the left by Rbits bits. The result is written into register Rdest.

ROLI Rdest, Rsource, Imm5 à rotate Rsource to the left by Imm5 bits. The result is written into register Rdest.

These instructions treat the source value as a ring of 32 bits and rotate by the number of bits specified. Here are a few examples:

Original value	0001 0010 0011 0100 0101 0110 0111 1000
Rotated left by 4 bits	0010 0011 0100 0101 0110 0111 1000 0001
Rotated left by 6 bits	10 0011 0100 0101 0110 0111 1000 0001 00
Rotated right by 8 bits	0111 1000 0001 0010 0011 0100 0101 0110

Synthesizing Multiplication Using Shifts and Additions

Multiplication is a commonly needed calculation. Multipliers, however, are relatively expensive circuits and/or slow. For this reason but also as means of getting familiar with shifts we now explain how multiplication can be implemented using just left shift and addition instructions. At a high level, the key idea is to convert multiplication into a sum of multiplications with just powers of two. This is easier understood using an example. Assume we want to multiply the value in register D0 by 5. One alternative would be to add the value in D0 five consecutive times. This quickly becomes impractical (for example, to multiply by 1000 we would need 1000 additions). An alternative is to analyze the multiplier to a sum of powers of 2. So, 5 becomes (4 + 1). In turn 5 X D0 becomes 4 X d0 + 1 X d0. Similarly, 11 x d0 can be rewritten as 8 x d0 + 2 x d0 + 1 x d0. The nice thing about using powers of two is that the corresponding multiplications can be implemented via simple left shift instructions.

Here’s the code for multiplying d0 by 11 with the result stored in d1:

move.l d0, d1 ; make a copy of d0 in d1, so now d1 = d0 x 1

lsl.l #1, d0 ; d0 = d0 x 2

add.l d0, d1 ; d1 = d0 + d0 x 2

lsl.l #2, d0 ; d0 = d0 x 4 but since we changed d0 to 2 times its original value, now d0 is 8 times its original value

add.l d0, d1 ; d1 = d0 + d0 x 2 + d0 x 8

Here’s the code for multiplying d0 by 17:

move.l d0, d1 ; d1 = d0

lsl.l #4, d0 ; d0 = 16 x d0

add.l d0, d1 ; d1 = d0 + 16 x d0

In general, if we need to calculate A x B then look at the binary representation of B focusing on the bits that are 1. If B’s binary representation has a 1 at bit i, then we need to include A x 2^i into our sum. So, if B = 10101 the sum will include the following terms: A x 2^0, A x 2^2 and A x 2^4.

Synthesizing Multiplication Using Shifts, Additions and Subtractions --- Booth’s Algorithm

In our previous discussion we only used shifts and additions. What if we were also allowed to use subtraction? Can we perform multiplication in fewer steps?

Yes, we can here’s an example why: Assume that the multiplier is 01110011110 in binary. Let’s rewrite this as 01110000000 + 00000011110, where we separated the two streaks of 1’s.

Now notice that A x 01110011110 can be rewritten as A x 01110000000 + A x 00000011110. With the previous method (adds and shifts) we will need three shifts for the first term, four shifts for the second term and so on. In addition we will need several additions to sum all these together.

Before proceeding with the example it is going to be helpful to realize that any binary number of the form 0..01..10..0 (where the 0…0 parts are optional), that is any number where all bits that are 1 form continuous sequence, can be rewritten as a difference of two binary numbers each with a single bit set to 1. For example, 011 = 100 – 001, 0110 = 1000 – 0010, 01111 = 10000 – 00001 and 0011000 = 0100000 – 0001000. In general if we have a number of the form 0…01…10..0 where the leftmost 1 is at position L and the rightmost 1 is at position R (where bit positions are numbered right to left, starting with 0) the number can be rewritten as (A – B) = (0…010…0 – 0…010…0) where the 1 in A appears at bit position L+1 and the 1 in B appears at position R.

So returning to our example, we had A x 01110011110 = A x 01110000000 + A x 00000011110.

But, 01110000000 can be further rewritten as (10000000000 - 00010000000), so A x 01110000000 can be written as A x 10000000000 – A x 00010000000. The last expression requires just two shifts and one subtraction.

Similarly, A x 00000011110 can be rewritten as A x 00000100000 - A x 00000000010. For this calculation we need just two shifts and a subtraction.

In summary A x 01110011110 can be calculated as (A x 10000000000 – A x 00010000000 + A x 00000100000 - A x 00000000010)

In general, if we need to calculate A x B we write B in binary and then starting from the rightmost bit and moving towards the left we do:

1. result = 0

2. If the current bit at position i is 0, move to the next bit (i = i + 1) and repeat this step, else result = result - A x 2^i and move to step 3.

3. If the current bit at position i is 1, move to the next bit (i = i + 1) and repeat this step, else result = result + A x 2^i and move to step 2.

This algorithm is known as Booth’s Algorithm for multiplication. The reason we discuss this algorithm is that it is important to know that further optimizations may be possible when doing arithmetic or other operations. So, when designing circuits or algorithms to perform these operations it is always important to check what is the best algorithm that exists and that better matches your problem (certain things are faster/preferable in hardware than in software and vice versa).

Single-Bit Manipulation

Single-bit manipulation operations are often needed especially when communicating with devices. Other processor implement bit manipulation instructions. NIOS II does not. This does not mean that it is impossible to manipulate single bits. It means that to do so, a sequence of instructions is needed to achieve the desired effect. Generally, a designer must think long and hard before introducing instructions into the processor design and instruction set architecture. Once an instruction is introduced it must be supported in all implementations, even those that will be built 10 years down the line. For this reason, modern processors implement only a few instructions and rely on a sequence of these simple instructions for implementing other operations. Only when one can demonstrate the introducing an instruction improves some characteristic significantly (for example performance or power) additional instructions are introduced.

So, how can we manipulate single bits?

Reading a single bit value:

This can be done with an AND instruction. Specifically,

AND operations are useful among other things for masking out bits out of longer values. For example, say we are interested in whether an integer is even. In this case, the lower bit of the integer must be 0. Accordingly we can use the following sequence:

ANDI r10, r9, 1

BEQ r10, r0, ISEVEN

By “masking out” we mean isolating the value of a bit or a collection of bits.

Setting a bit to zero

AND can also be used to set specific bits to 0. For example, let’s say that we want to make bit 2 of r9 zero. Here’s how:

movia r8, 0xfffffffb # this is hex for 11…1011

andi r9, r9, r8

Note that all but the bit we are interested in are set to 1 in the immediate of movia.

As a parenthetic note, notice that andi r9, r9, 0x3 is equivalent to (r9 MOD 8) and that

movia r8, 0xFFFFFFF8 # this is hex for 1…100

andi r9, r9, r8

is equivalent to r9 = (r9/8)*8 in integer arithmetic.

What we have done is use a constant that had a 1 only at the bit position that we were interested in. By doing a logical AND with this constant we essentially zeroed out all bits except the last one. As a result, the outcome of the AND operation would be zero if the least significant bit was zero otherwise it will be non-zero.

Such operations are very handy when accessing data that uses long datatypes to encapsulate a collection of different information pieces. This we have seen when we discussed I/O devices.

Setting a bit to one

Using OR we can set specific bits to 1. For example, ORI r9, r9, 0x4 sets bit 2 to 1.

Inverting a bit

Using XOR we can “flip” the value of bits. For example XORI r9, r8, 0x8, inverts bit 3 (a 1 becomes a 0 and vice versa).

These operations are quite useful for graphics applications where screen pixels are represented as bits in memory.