Fixed Point Arithmetic
Dec. 2002
Presented by Fore June
Author of Windows Fan, Linux Fan

Copyrighted by Fore June, All rights reserved
    Fixed-point Arithmetic

    Operations basically the same as integer operations. Users have to keep track of the binary point.

  1. 2s Complement Representation

    Leftmost bit is sign-bit

    X = bk . . . b1 b0 . b-1 b-2 .... b-m

    S = bk is the sign bit, 0 => value positive, 1 => value negative
    bi = 0 or 1

    Value of X

    V(X) = -bk x 2k + bk-1 x 2k-1 + ... + b0 x 20 + b-1 x 2-1 + b-2 x 2-2 + ... + b-m x 2-m There's a corresponding unsigned value U(X).
    Most negative V(X)min = -2k    with X     =   1 0 ... 0 . 0 ... 0
    Most positive V(X)max = 2k-1 + ... + 20 + 2-1 + 2-2 + ... + 2-m
          = 2k - 2-m
      with X =   0 1 ... 1 . 1 1 ... 1
    Complement of a bit 1 → 0
    0 → 1
    bibi
    bi + bi = 1

    1s Complement of X X = bk . . . b1 b0 . b-1 b-2 .... b-m
    2s Complement of X ( = -X )

    -X = X + 0 . . . 0 . 0 ... 0 1

    and discarding any carry left to bk

    ( i.e. To obtain the 2s complement of a binary number, we perform a 1s complement and add 1 to the least significant bit )

    So

    -X = bk . . . b1 b0 . b-1 b-2 .... b-m + 0 . . . 0 . 0 ... 0 1
        = bk . . . b1 b0 . b-1 b-2 .... b-(p-1)1 0 ... 0

    where p m and b-p is the first nonzero bit in the X representation

    One can show that V(-X) = -V(X)

    Note also that the 2s complement of the 2s complement of a number is itself.
    i.e. -(-X) = X

    Examples

  2. Integer
    e.g. 8-bit

    A = 0 0 0 0 1 0 0 1
    V(A) = 9

    -A = 1 1 1 1 0 1 1 1
    V(-A) = -V(A) = -9

  3. Fraction
    e.g. 8-bit

    Suppose bit 7 is sign-bit, binary point is between bit 7 and bit 6

    F = 0 . 0 1 0 0 0 0 0
    V(F) = 2-2 = 0.25

    X = -F = 1 . 1 1 0 0 0 0 0
    V(X) = V(-F) = -V(F) = - ( 0 . 0 1 0 0 0 0 ) = - 0.25

    Alternative view :

    V(X) = -20 + 2-1 + 2-2 = -0.25

  4. Real number
    e.g. 8-bit

    Suppose the binary point is between bit 4 and bit 3; leftmost bit ( bit 7 ) is the sign-bit

    R = 0 1 0 1 . 1 1 0 0
    V(R) = 22 + 20 + 2-1 + 2-2 = 3.75

    Y = -R = 1 0 1 0 . 0 1 0 0
    V(Y) = V(-R) = -V(R) = -5.75

    Alternative view :

    V(X) = -23 + 21 + 2-2 = -8 + 2 + 0.25 = -5.75
  5. Typical DSP Operation

  6. 1.15 format
    16-bit number, bit 15 ( left-most bit ) is sign-bit, binary point is between bit 15 and bit 14 (i.e. m = 15 ) X = S . x x x x x x x x x x x x x x x Resolution = 2-15

    V(X)min = -1    ( 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 )    ( 8000 Hex )
    V(X)max = 1 - 2-15    (0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )    ( 7FFF Hex )

    1.15 x 1.151.30

    In fraction mode, a typical DSP processor will left-shift the product one bit, so the product is saved in 1.31 format in a temporary 32-bit register with the binary point between bit 31 and bit 30.

    Example:
    Consider 0.25 x ( -0.75 )

      0.25   = 0.010 0000 0000 0000    ( 1.15 format ) = 2000H -0.75 = - 0.110 0000 0000 0000 = 1.010 0000 0000 0000
      = A000H

      0.25 x ( -0.75 ) = -( 0.0 1 x 0.11 )
        = - 0. 0 0 1 1
        = 1.110 1000 0000 0000 0000 0000 0000 0000    ( 1.31 format )
        = E800 0000 Hex
      0.25 x ( -0.75 ) = 0.0 1 x ( 1.010 1000 .. )
        = 1.110 1000 0000 0000 0000 0000 0000 0000    ( 1.31 format )
        = E800 0000 Hex ( note two 1s were shifted in )

      	BSET	PSW, 9		;fraction mode
      	MOV	Sx, #2000H
      	MOV	Sy, #A000H
      	MRa = Sx * Sy, (SS)
      	
      MRa1 = E800 H, MRa0 = 0000 H

  7. Divisions

    Calculate a / b

    a

    b
    = a x 1

    b
    We want to find ( 1 / b ).
    Assume b to be 15-bit positive integer ( signs can be handled separately, a fraction can be normalized to an integer by multiplications )
    i.e. 0 b < 215

    Let

    α ≈ 1/b Then α b ≈ 1 Let α = 0. α15 α14 ... α0

    Algorithm to find α :

    	i = 0;
     	α = 0;
    	αi = 15;		//i.e. α = 0.100 0000 0000 0000
    	one = 215 - 1;	//i.e one ~ 0.1111 1111 1111 1111 ~ 1
    	while ( i ≥  0 ) {
    	  if ( α * b  > one )
    	     αi = 0;	//guess too large, reset bit
    			//else fine, take it
    	  --i;
    	  αi = 1;	//try next bit
    	}
      

    e.g. 1/3 ≈ 0.010 1010 1010 1010

  8. Normalize a set of real numbers to 1.15

    A = { a1,, ..., an }
    amax = maximum ( A )
    α = 0.α-1 .... α-15
      i = -1;
      α = 2i;
      while ( i > -14 ) {
        if ( amax * α < 1 ) {	//want to increase a to gain precision
            --i;
            α = α + 2i;
        } else {                    //α too large
            α = α - 2i;		//reset bit i to 0
            --i;			//test next bit
            α = α + 2i;
        }
      }
      


Acknowlegments

Many thanks to Cula.net for allowing me to post materials on its site, Mr. Richard Friedman for giving valuable suggestions and help.

Fore June, 2002.
Author of Windows Fan, Linux Fan