Cornell University
Electrical Engineering 476
Fixed Point mathematical functions
in GCC and assembler


Floating point arithmetic is too slow for small, 8-bit processors to handle, except when human interaction is involved. Scaling a human input in floating point is generally fast enough (compared to the human). However in fast loops, such as IIR filters or animation, you are going to need to use fixed point arithmetic. This page concentrates on 8:8 fixed point in which there are 8 bits of integer and 8 bits of fraction stored for each number. Numbers are stored in 2's complement form.

This page has the following sections:

8:8 Fixed Point representation

This section will concentrate on numbers stored as 16-bit signed ints, with the binary-point between bit 7 and bit 8. There will be 8 bits of integer and 8 bits of fraction, so we will refer to this as 8:8 fixed point. This representation allows a dynamic range of +/- 127, with a resolution of 1/256=0.00396. Sign representation will be standard 2's complement. For instance to get the fixed point representation for -1.5, we can take the representation for +1.5, which is 0x0180, invert the bits and add 1. So inverting, we get 0xfe7f and adding one (to the least significant bit) we get 0xfe80.


Decimal Value
Fixed Representation
~(0x0240)+1 = 0xfdc0

Basic 8:8 Arithmetic

Addition and subtraction can just use the standard C operators, but multiplication and division need to be defined. These arithmetic functions were initially defined as macros rather than functions. While useful for summarizing the technique and for testing, it turned out to be much faster to implement the fixed point multiply as a function (see section below).

Multplication and division:
#define multfixSlow(a,b) ((int)((((long)(a))*((long)(b)))>>8)) //multiply two fixed #don't use this-too slow:
#define divfixSlow(a,b)  ((int)((((long)(a))<<8)/((long)(b)))) //divide two fixed #:don't use this-too slow:

We will also need to convert integers to fixed, float to fixed, fixed to integers,:

Type conversions:

#define int2fix(a)   (((int)(a))<<8)            //Convert char to fix. a is a char
#define fix2int(a)   ((signed char)((a)>>8))    //Convert fix to char. a is an int
#define float2fix(a) ((int)((a)*256.0))         //Convert float to fix. a is a float
#define fix2float(a) ((float)(a)/256.0)         //Convert fix to float. a is an int 

This program and multiply routine contain multiply, divide, and square root for 8:8 numbers. Detailed descriiptions are below.

8:8 multiplication

A fast mulitply operation is essential for digital filtering, control systems, and animated games. The macro-based multiply speed was disappointing so I rewrote it as an assembler function call based on the information in Atmel application note AVR201. The speed increase over the macro version was a factor of 2. The entire multiply takes about 40 machine cycles. The demo program above produces some timing comparisions between operations. I validated the fast multiply routine by running all 2^32 possible 16-bit operand pairs through it and comparing each output to the macro version.

8:8 division

A fast division operation is occasionally nice to have, but is harder to implement than multiplcation. The fastest way I have seen uses Newton-Raphson iteration and consists of several steps:

  1. Strip off the sign of the denominator (D) to make it positive.
  2. Scale D to the range 0.5<=D<=1.0 and record the number of shifts (N) necessary to to this.
  3. Estimate of the reciprocal of the scaled D using a linear appoximation requiring only shifts and adds.
  4. Use Newton's method to iteratatively improve the reciprocal approximation. 8:8 accuracy takes only one iteration to converge.
  5. Shift the scaled approximation by N back to the correct magnitude.
  6. Restore the sign to form the complete reciprocal.
  7. Multiply the numerator by the reciprocal of D.

The program above contains the reciprocal and division routines. The algorithm execution speed depends to the value of the denominator.

D input value
division cycles
0.5 to 1.0
4 or 0.25
10 or 0.1
100 or 0.01


Square Root for 8:8

The sqrtfix version in the above program executes in less than 380 cycles, depending on the value of the input. Shorter times at high input values are caused by limited bits to be computed and some detailed range checking to prevent internal overflow. A zero or negative input yields a zero output. The algorithm first determines one of 12 different possible integer parts of the square root (0 to 11), then uses successive approximation to get the fractional bits.

input value
0.00 to 120.99
< 380
121.00 to 126.56
126.57 to 127.99

A fixed point square root macro which depends on the math.h libary lsqrt function is shown below. This fixed point square root macro is slower, and in some cases, much slower than the floating operation:

#define sqrtfixSlow(a)   (lsqrt(((long)(a))<<8))  //Don't use this square root -- too slow

Copyright Cornell University Dec 2007
Bruce R. Land