Datatype limits

The maximum representable value is 32767.999985. The minimum value is -32768.0

The minimum value is also used to represent fp16.overflow for overflow detection, so for some operations it cannot be determined whether it overflowed or the result was the smallest possible value. In practice, this does not matter much.

The smallest unit (machine precision) of the datatype is 1/65536=0.000015259.

Fixed-point functions

All the provided functions operate on 32-bit numbers, qFP16_t, which have 16-bit integer part and 16-bit fractional part.

Conversion functions

Conversion from integers and floating-point values. These conversions retain the numeric value and perform rounding where necessary.

qFP16_IntToFP() Simply multiplies a by qFP16.one=65536
qFP16_FPToInt() Divides by qFP16_t and rounds to nearest integer.
qFP16_FloatToFP() Multiplies by qFP16_t and rounds to nearest value.
qFP16_FPToFloat() Divides by qFP16.one. Rounding is according to the current floating-point mode
qFP16_DoubleToFP() Multiplies by qFP16 and rounds to nearest value.
qFP16_FPToDouble() Divides by qFP16.one. All qFP16_t values fit into a double, so no rounding happens.
qFP16_FPToA() Converts from qFP16_t to string.
qFP16_AToFP() Converts from string to qFP16_t.

Basic arithmetic

These functions perform rounding and detect overflows. When overflow is detected, they return qfp16.overflow as a marker value.

qFP16_Add() Addition
qFP16_Sub() Subtraction
qFP16_Mul() Multiplication
qFP16_Div() Division
qFP16_Mod() Modulo

Exponential and transcendental functions

Roots, exponents & similar.

qFP16_Sqrt() Square root. Performs rounding and is accurate to qFP16 limits.
qFP16_Exp() Exponential function using power series approximation. Accuracy depends on range, worst case +-40 absolute for negative inputs and +-0.003% for positive inputs. Average error is +-1 for neg and +-0.0003% for pos.
qFP16_Log() Natural logarithm using Newton approximation and qFP16_Exp(). Worst case error +-3 absolute, average error less than 1 unit.
qFP16_Log2() Logarithm base 2.
qFP16_IPow() Computes the integer-power of a qFP16_t number
qFP16_Pow() Modulo

Trigonometric functions and helpers

qFP16_Sin() Sine for angle in radians
qFP16_Cos() Cosine for angle in radians
qFP16_Tan() Tangent for angle in radians
qFP16_Asin() Inverse of sine, output in radians
qFP16_Acos() Inverse of cosine, output in radians
qFP16_Atan() Inverse of tangent, output in radians
qFP16_Atan2() Arc tangent in radians x,y
qFP16_Sinh() Hyperbolic sine
qFP16_Cosh() Hyperbolic cosine
qFP16_Tanh() Hyperbolic tangent
qFP16_RadToDeg() Converts angle units from radians to degrees.
qFP16_DegToRad() Converts angle units from degrees to radians
qFP16_WrapToPi() Wrap the fixed-point angle in radians to [−pi pi]
qFP16_WrapTo180() Wrap the fixed-point angle in degrees to [−180 180]

Example: Solution of the quadratic equation

This draft example computes one solution of the quadratic equation by using the fixed point format. Equation is given by:

\( x = \frac{ -b + \sqrt{ b^{2} - 4ac} }{ 2a } \)

#include <stdio.h>
#include <stdlib.h>
#include "qfp16.h"
 
int main( int argc, char *argv[] ) 
{
    qFP16_t a = qFP16_Constant( 1.5f );
    qFP16_t b = qFP16_Constant( 5.2f );
    qFP16_t c = qFP16_Constant( 4.0f );
    qFP16_t tmp;
    char ans[ 10 ];
    
    tmp = qFP16_Mul( qFP16_IntToFP( 4 ), qFP16_Mul( a, c ) );
    tmp = qFP16_Add( -b, qFP16_Sqrt( qFP16_Sub( qFP16_Mul( b, b ), tmp  ) ) );
    tmp = qFP16_Div( tmp, qFP16_Mul( qFP16_IntToFP( 2 ), a ) );
    printf( " result = %s \r\n" , qFP16_FPToA( tmp, ans, 4 ) );
    return EXIT_SUCCESS;
}

Table of Contents