When to avoid “mul”.

In my reading about ARM assembler, I have come across the following advice: if you are multiplying by a constant, use shifts and adds, the multiplier is slow.

Just to prove the point, gcc doesn’t use “mul” for multiplication by a constant. Here’s a simple C program:

int c = 5;
   c = c * 10;

I compiled this with the following instruction:
gcc -S times.c -O3

And here is the relevant part of the assembly code that was produced:

ldr r3, .L2
ldr r2, [r3, #0]
add r2, r2, r2, asl #2
mov r2, r2, asl #1
str r2, [r3, #0]

Let’s go through this step by step. Each left shift is equivalent to multiplication by 2.

1. load r2 with initial value of c.
    r2 = 5
2. Add to r2 to the arithmetic left shift (2 places) of r2. 
    r2 = 5 + 5 * 4 = 25
3. Replace r2 with the arithmetic left shift (1 place) of r2
    r2 = r2 * 2 = 50

So, it’s official. If you want your assembly code to be at least as good as that produced by gcc, you should avoid using the “mul” instruction for multiplication by constants.

By the way, there are often several ways to achieve a multiplication by a series of shifts and adds. Here is my preferred way to multiply by 10. In this example, r4 holds the number to be multiplied and I want the result to be in r5.

mov r5, r4, lsl $3            @ r5 now holds r4 * 8
add r5, r5, r4, lsl $1        @ add r4 * 2 to r5

Happy Hacking


Using macros

Sometimes you will need to repeat something several times in your code. If it is a relatively small piece of code, you may be reluctant to turn it into a function since the branch and return commands may be almost as costly as the code you want to run. This is a good time to use a macro.

A macro is an assembler directive. You declare a macro at the start of the program. The assembler then looks through your code for occurences of the macro name and replaces them with the body of the macro. You can use parameters in a macro, so they are pretty flexible.

Here’s an example. One of my first programs was to calculate the gcd. It boils down to these lines:

    cmp	r0, r1
    subgt r0, r0, r1
    sublt r1, r1, r0    
    bne gcd

Here’s a version of my gcd program in which I have made gcd into a macro.

@ macgcd.s

.section    .data
    .long 6099
    .long 2166

.macro gcd a, b
@ the '\@' in the next line causes the assembler to generate a local label.
    cmp	\a, \b
    subgt \a, \a, \b	
    sublt \b, \b, \a    
    bne gcd_\@ 

.globl  _start
    adr r1, m
    ldr r0, [r1]	@ load first number into r0.
    adr r2, n
    ldr r1, [r2]	@ load second number into r1.
    gcd r0, r1          @ use macro

    mov     r7, $1	@ prepare to exit
    swi     $0		@ wake kernel

As you can see, you declare a macro with a line that begins “.macro”. Following this is the name of the macro and any parameters used. The parameter names must be prefixed with a ‘\’ in the body of the macro. At the end of the macro definition we need “.endm”.

This particular example works fine and it will use whichever registers are passed to it to calculate the gcd. It doesn’t use any other registers, so there is no need to worry about other data getting “clobbered”.

Of course, you can write macros that do use named registers. Here’s a short program that uses a macro to print characters.

@ macros.s
@ experimenting with a couple of macros

.section    .data
    .byte 0xA   @ ascii codes
    .byte 0x20
    .byte 0x41

.macro print_ch a
mov r0, $1
ldr r1, =\a
mov r2, $1
mov r7, $4
svc $0

.section  .text
.globl  _start

ldr r3, =char
ldrb r4, [r3]

print_ch nl
    print_ch char
    print_ch spc
    cmp r4, $0x5a
    addle r4, r4, $1
    strltb r4, [r3]
    blt loop

print_ch nl
mov r7, $1
svc $0


If you run this program it should print out the upper-case letters from A to Z. The macro uses printing, so that is going to clobber the registers r0, r1, r2 & r7. It might be wiser to use a function, but there are times when quick and dirty is the way to go and macros are very useful.

Here’s one more example. Here I’ve got a little modulus procedure that’s packaged as a macro.

@ simple program setting up a modulus macro

.section    .data
    .long   2013
    .long   19

.macro mod a, b
mod_\@:   cmp \a, \b
            subge \a, \a, \b
            bge mod_\@

.globl _start

    ldr r2, =x
    ldr r0, [r2]
    ldr r2, =y
    ldr r1, [r2]
    mod r0, r1

    mov r7, $1
    swi 0

As you can see this is another “safe” macro, like the gcd example. It works out the modulus of two numbers using only the registers that were passed to it. I am going to use this in my “Easter” algorithm to make the code a little cleaner.

Happy Hacking!

Enter a number with the keyboard

When we looked at printing out numbers in a recent program, we had to make a conversion between a binary number stored in the computer and a series of ascii characters that could represent a denary number.

The same issue is faced when we want to get a number from the user and do something with it in an asm program. What follows is a simple solution to this problem. A string of characters is taken in from the keyboard and they are converted into a number that is stored in r0.

As you can tell, I am working my way towards having a few basic functions that I can use to create programs which can carry out mathematical operations as well as handling I/O.

Watch this space!

@ num_in.s
@ get user input ready to do maths

@ registers
@ r1 - pointer to the buffer
@ r2 - used to load digits
@ r3 - counts digits
@ r4 - multiplier
@ r6 - counter
@ r0 - return value

.section	.bss
.comm buffer, 48	@ reserve 48 byte buffer

.section	.data
	.ascii	"Enter a number: "
msgLen = . - msg

.section	.text
.globl	_start

mov r0, $1		@ print program's opening message	
ldr r1, =msg
ldr r2, =msgLen
mov r7, $4
svc $0

                        @ get input 
mov r7, $3		@ 3 is the "read" syscall
mov r0, $1		
ldr r1, =buffer
mov r2, $0x30
svc $0

                        @ prepare to convert ascii to number 
ldrb r2, [r1]		@ load first char 
mov r3, $0		@ initialize r3 as a counter

stmfd	sp!, {r2}	@ push digit onto stack
add	r3, r3, $1	@ increment counter
ldrb	r2, [r1, #1]!	@ load next digit (use writeback)
cmp	r2, $0xA	@ check for ascii code 10
beq 	convDigits      @ jump to conversion section
bne 	pushDigits      @ or carry on pushing

mov r4, $1		@ initialize multiplier to 1
mov r0, $0 		@ initialize number
mov r6, $0		@ initialize counter

ldmfd	sp!, {r2}	@ pop a digit
cmp 	r2, $0x30	@ is it a digit?
blt	error
cmp	r2, $0x39       @ sure?
bgt	error
sub	r2, r2, $0x30	@ take away 48, to get the digit value
mul	r2, r4, r2	@ multiply it by r4
add	r0, r0, r2	@ add to r0
add	r6, r6, $1	@ increment counter		
cmp	r6, r3		@ check to see if done
beq 	exit
mov     r5, r4, lsl $3        @ do multiply by ten using adds and shifts
add     r4, r5, r4, lsl $1    @ x * 8 + x * 2 = x * 10
bal	jumpBack

@ Least significant byte available via "echo $?"

mov r0, $-1
bal exit

mov r7, $1	@ exit syscall
svc $0		@ wake kernel

Hopefully this makes sense.
Happy hacking!

A simple greeter in asm

Here’s a short program that gets input from the keyboard and then prints it back to the screen.

@ greet.s - a little asm greeter.

.section	.bss
.comm buffer, 48	     @ reserve 48 byte buffer

.section	.data
	.ascii	"** Greeter **\nPlease enter your name: "
msgLen = . - msg
	.ascii	"Hello "
msg2Len = . - msg2

.section	.text
.globl	_start

mov r0, $1		    @ print program's opening message	
ldr r1, =msg
ldr r2, =msgLen
mov r7, $4
svc $0

mov r7, $3		    @ read syscall
mov r0, $1		
ldr r1, =buffer
mov r2, $0x30
svc $0

mov r0, $1		    @ print msg2
ldr r1, =msg2
ldr r2, =msg2Len
mov r7, $4
svc $0

mov r0, $1		    @ now print the user input
ldr r1, =buffer
mov r2, $0x30
mov r7, $4
svc $0

mov r7, $1	            @ exit syscall
svc $0		            @ wake kernel

It should look something like this when you run it.


Prime Numbers (based on Donald Knuth’s TAOCP)

This program finds and prints the first 500 prime numbers. It is based on the program presented in Knuth’s classic The Art of Computer Programming.

The code itself is pretty heavily commented, so I don’t think much further explanation is needed.

I have used the number printing routine described in my earlier blog post to handle the output in this example.

@ primes.s 
@ find and print first 500 primes
@ based on Knuth TAOCP vol. 1 147ff
@ I've used the labels and variables Knuth uses, up to a point.

@ registers @

@ r1 points to address of "prime"
@ r3 number we are checking 	(N in Knuth)
@ r4 count of primes		(J in Knuth)
@ r5 divisors we are checking	
@ r6 index for the prime divisors
@ r7 will hold remainder	(R in Knuth)
@ r8 will hold quotient		(Q in Knuth)
@ r9 number of primes sought.

.section	.bss
.comm prime, 2000	@ reserve space for the primes

.section	.data
spc:			@ space the primes with 3 spaces
	.ascii "  "
len = . - spc
nl: 			@ just a newline
	.ascii "\n"
limit:			@ index of last prime we need
	.long 500 

.section .text
.globl	_start

P1:			@ setting up
ldr r1, =prime		@ r1 points to "prime"
mov r0, $2		@ First prime is 2
str r0, [r1]		@ Store this number in "prime"
ldr r0, =limit
ldr r9, [r0]		@ r2 holds number of primes to find
mov r3, $3		@ initialize N to 3
mov r4, $1		@ initialize J to 1

P2:			@ we come here when we have found a prime
add r4, r4, $1		@ increment J
str r3, [r1, #4]!	@ store N in "prime" (with writeback)

cmp r4, r9		@ check if we are done
bge P9			@ if so branch to p9

add r3, r3, $2		@ add 2 to N 

P5:			@ start checking divisors
ldr r6, =prime		@ copy pointer to start of "prime"
ldr r5, [r6]		@ load first divisor
mov r7, r3		@ copy N into R
mov r8, $0		@ initialize Q

cmp r7, r5		@ if R >= divisor
subge r7, r7, r5 	@ subtract divisor from R
addge r8, r8, $1	@ increment Q
bge P6			@ repeat
cmp r7, $0		@ if R == 0 ...
beq P4			@ N is not prime so try next N

cmp r8, r5		@ compare Q with divisor
ble P2			@ if Q <= divisor, N is prime

mov r7, r3		@ reset r7 to N
mov r8, $0		@ reset Q
ldr r5, [r6, #4]!	@ get next divisor
bal P6			@ divide again

mov r0, $1		@ choose stdout
mov r4, $0		@ use r4 as a temporary counter
mov r5, $0		@ r5 counts total primes printed
ldr r6, =prime		@ pointer to "prime"
ldr r3, [r6]		@ load first prime

bl print_num		@ function call
add r4, $1		@ add one to temp counter
add r5, $1		@ add one to counter
cmp r5, r9 		@ are we done?
bge exit		@ if so, exit
cmp r4, $9		@ after 10 primes ...
bgt newline		@ print a newline
ble space		@ add spaces

space:                  @ we jump here if
mov r0, $1              @ we are going to 
ldr r1, =spc		@ print spaces
ldr r2, =len	
mov r7, $4
svc $0
ldr r3, [r6, #4]!	@ load next prime
bal printLoop           @ continue printing

newline:                @ we jump here if
mov r0, $1              @ we are going to
ldr r1, =nl             @ print a newline
mov r2, $1
mov r7, $4
svc $0
ldr r3, [r6, #4]!       @ load next prime
mov r4, $0              @ reset temporary counter
bal printLoop		@ continue printing

@ print_num function @

	stmfd sp!, {r0-r9, lr}	@ push regs to stack
	mov r4, $0 		@ set division counter to zero
	mov r5, $1		@ set char counter to one

loop:				@ division routine
	cmp r3, $9		
	ble stackPush		@ if r3 <= 9, call stackPush
	sub r3, r3, $10		@ else, subtract 10 from r3
	add r4, r4, $1		@ add one to div. counter
	bal loop		@ repeat

	add r5, r5, $1		@ increment char counter
	orr r0, r3, $0x30	@ logical OR - add 48 to digit to get ascii code
	stmfd	sp!, {r0}	@ push onto stack
	cmp r4, $0		@ if the div. counter is zero ...
	beq printChars		@ call print function
	mov r3, r4		@ else, load div. count into r3
	mov r4, $0		@ reset div. counter
	bal loop		@ back to top of loop

	mov r1, sp		@ use stack pointer to provide ascii code
	mov r0, $1		@ stdout is file descriptor 1
	mov r2, $1		@ length to print is 1
	mov r7, $4		@ write syscall
	svc $0			@ wake kernel
	subs r5, r5, $1		@ decrement string counter and set flag
	ble return		@ return if done
	ldmfd sp!, {r0}		@ pull next char from from stack 
	bal printChars		@ get next char
	ldmfd sp!, {r0-r9, pc}	@ restore registers

mov r0, $1			@ print a newline
ldr r1, =nl
mov r2, $1
mov r7, $4
svc $0
mov r7, $1			@ exit
svc $0


Using the debugger

The GNU debugger (gdb) is a very useful tool. Here I am going to show you one or two of its features.

You may need to install gdb (on my RPi, running Arch, the command was: “pacman -S gdb”). When you have done so, you can use it to look at what is happening as your program is running. This may help you to correct something that is going wrong. For now, though, we are going to look at a sample program that is working correctly.

If you want to use gdb, you need to invoke the assembler with some additional options. Adding the gdb functionality makes your executable larger, so you won’t want to do this unless you are actually planning a gdb session. The as command will look like this:

as -gstabs -o filename.o filename.s

Okay, here’s the sample program we are going to look at. It is very simple!

@ use_gdb.s
@ demo program
.section	.data
.section	.text
.globl		_start
mov r1, $5	@ load r1 with 5
cmp r1, $4	@ compare r1 with 4
sub r1, r1, $1	@ subtract 1 
cmp r1, $4      @ r1 now DOES equal 4
sub r1, r1, $1
cmp r1, $4

mov r7, $1	@ exit syscall
svc $0		@ wake kernel

Here’s a screen-shot of what to do:

When gdb starts, we need to set a break-point. The execution of the program will stop there and we can step forwards one instruction at a time from that point. Here, I am setting the break-point at the _start label.

(gdb) break *_start
(gdb) run

After you have given the run command, gdb will execute the operations up to the break-point and await instructions. In this example, I typed “next” twice to cause the next two instructions to be executed. Then I typed “info registers” to see the contents of my registers.


As you can see, r1 holds the value 5. The stack pointer has an address in memory and the program counter shows where we are up to in the program.

The other register I am interested in is “cpsr”, this is the program status register which shows which flags have been set. At this point, its contents are not very interesting as we have not done any comparisons.

After one more instruction, the registers look like this:


You can see that the program counter shows we have moved 4 bytes forward (each instruction is 4 bytes).  Also the program status register shows the result of comparing r1 with 4.

If we move on a couple of instructions, we get to this position.


Now r1 holds 4 and the “zero” flag on the cpsr has been set. This is because “cmp” actually does a subtraction and then sets the flags acordingly. Since¬† 4 – 4 = 0, the zero flag is set.

If we move on once more, we come to this position.


We are now almost at the end of the program, r1 no longer equals 4 and so the last “cmp” instruction has caused the zero flag to be unset.

That’s the end of this brief look at gsb.

Use a template

This post ought to have come earlier …

When writing asm, it’s a good idea to have a simple template upon which to build your code. Here’s an example:

@ dummy.s
@ a template for asm programs
.section	.data
.section	.text
.globl		_start
nop		@ no operation
mov r7, $1	@ exit syscall
svc $0		@ wake kernel

This will help you to remember to include the important sections of the program and the exit instructions.

Anything that starts with a full-stop is a directive to the assembler rather than an actual instruction. The final directive (“.end”) is optional, but I think it is useful. For instance, if the last line of your program has a comment on it, the assembler will complain.

Commenting your code is important. We can use ‘@’ to begin a comment in asm programs. I like to use the first line to record the name of the program.