IEEE 754 Floating-Point Types in C++

June 15, 2018 · 8 min read

If you want to use IEEE 754 32-bit or 64-bit floating-point types in C++, you might think of using float and double. Unfortunately, the C++ standard offers very few guarantees about its built-in floating-point types.

§ 6.7.1.8 There are three floating-point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. ...

So, are we doomed? Not at all. C++ provides std::numeric_limits, which gives us information about floating-point type traits. We can combine this with some neat compile-time tricks to craft a clean type API. Let's see how. The goal is to construct the following IEEE 754 floating-point types:

#include <iostream>
#include "ieee754_types.hpp"

int main() {
  IEEE_754_2008_Binary<32> x = 1.0;
  IEEE_754_2008_Binary<64> y = 2.0;

  std::cout << x + y << std::endl;

  // Compile-time error if the requested type doesn't exist on the system.
  // IEEE_754_2008_Binary<16> z;
}

Here, IEEE_754_2008_Binary<n> is an n-bit IEEE 754 floating-point type. On most systems, IEEE_754_2008_Binary<32> will be float and IEEE_754_2008_Binary<64> will be double. If a requested type like IEEE_754_2008_Binary<16> is not available, we should get a compile-time error with a clear message.

At this point, you might naturally ask, "Do we really need this? Can't we just assume float and double are IEEE 754, since they are on 99.9% of systems?" I used to think so, but I later found that others have submitted related proposals, like N1703 and N3626, to the C and C++ standards committees to address this issue by adding standard types like float16_t, float32_t, float64_t, and float128_t. So, perhaps this effort isn't entirely pointless. With that, let's get started.

First, let's create a way to check if a given type T fulfills the IEEE 754 standard and our other conditions.

template <int storage_bits, int exponent_bits, int mantissa_bits>
struct Is_Ieee754_2008_Binary_Interchange_Format {
  template <typename T>
  static constexpr bool value =
      ::std::is_floating_point<T>()            &&
      ::std::numeric_limits<T>::is_iec559      &&
      ::std::numeric_limits<T>::radix == 2     &&
      get_storage_bits<T>() == storage_bits    &&
      get_exponent_bits<T>() == exponent_bits  &&
      get_mantissa_bits<T>() == mantissa_bits;
};

We use a variable template for the type-dependent boolean check and wrap it in a template struct. This makes it easy to pass around as a type template parameter later.

First, we check if T complies with IEEE 754 (equivalently, IEC 60559) using std::numeric_limits<T>::is_iec559. We also check radix == 2 because IEEE 754 defines two types of floating-point numbers: binary and decimal. Finally, we check if T has the requested number of storage (width), exponent, and mantissa bits. Although IEEE 754 defines a standard number of exponent and mantissa bits for certain sizes (16, 32, 64, 128, etc.), it also allows implementations to have arbitrary sizes and bit counts (e.g., the x86 extended precision format). Therefore, we need to check that T has the exact format we want.

We can calculate the number of bits for T with the following simple compile-time functions:

template <typename T>
constexpr int get_storage_bits() {
  return sizeof(T) * CHAR_BIT;
}

template <typename T>
constexpr int get_exponent_bits() {
  int exponent_range = ::std::numeric_limits<T>::max_exponent -
                       ::std::numeric_limits<T>::min_exponent;
  int bits = 0;
  while ((exponent_range >> bits) > 0) ++bits;
  return bits;
}

template <typename T>
constexpr int get_mantissa_bits() {
  return ::std::numeric_limits<T>::digits - 1;
}

For the mantissa bits, std::numeric_limits<T>::digits includes the implicit leading bit for normalized numbers, so we subtract 1 to get the number of explicit mantissa bits. For the exponent bits, a direct property is not available in std::numeric_limits, so we instead calculate the minimum number of bits required to represent its exponent range.

Now we have everything needed to determine if a given T is the type we're looking for. The next step is to automatically select such a type from the built-in floating-point types (float, double, and long double), given a size in bits (e.g., 32, 64). This is where it gets interesting.

The following recursive function, find_type(), selects a type from T and Ts... that satisfies the condition C.

template <typename C, typename T, typename... Ts>
constexpr auto find_type() {
  throw;

  if constexpr (C::template value<T>) {
    return T();
  } else if constexpr (sizeof...(Ts) >= 1) {
    return find_type<C, Ts...>();
  } else {
    return void();
  }
}

In our case, the types to search (T and Ts...) will be float, double, and long double. The condition C is the Is_Ieee754_2008_Binary_Interchange_Format<...> struct we defined previously.

typename... Ts is a parameter pack that can match any number of types. The first if condition, C::template value<T>, checks if T satisfies the condition C; if so, it returns a default-constructed instance of T. The second if condition, sizeof...(Ts) >= 1, checks if there are more types in Ts... to examine; if so, it recursively calls find_type() with Ts... to continue the search. Finally, if Ts... is empty, it returns void().

Since the return type of find_type() is auto, the compiler will deduce the return type at compile time from the return statement in the branch that is taken. Additionally, if constexpr discards the unused conditional paths at compile time, so find_type() compiles successfully even though it has multiple return statements with different types.

Since find_type()'s return type is what we need, we can use decltype(find_type<...>()) to get the resulting type. The throw; statement on the first line of find_type() is not strictly necessary, but it's there to indicate that find_type() is not meant to be called at runtime.

The following code defines a BinaryFloatOrVoid type alias using decltype(find_type<...>()). This new type will be an IEEE 754 floating-point type that matches the given storage, exponent, and mantissa bits, or it will be void if the search fails.

template <int storage_bits,
          int exponent_bits =
              standard_binary_interchange_format_exponent_bits<storage_bits>(),
          int mantissa_bits =
              standard_binary_interchange_format_mantissa_bits<storage_bits>()>
using BinaryFloatOrVoid =
    decltype(find_type<                                                //
             Is_Ieee754_2008_Binary_Interchange_Format<storage_bits,   //
                                                       exponent_bits,  //
                                                       mantissa_bits>,
             float, double, long double>());

The functions standard_binary_interchange_format_exponent_bits() and standard_binary_interchange_format_mantissa_bits() return the standard number of exponent and mantissa bits, respectively. We set them as default values for exponent_bits and mantissa_bits for convenience. I will omit their implementations, as they are straightforward.

Before if constexpr was added in C++17, this kind of compile-time type manipulation was typically implemented with SFINAE. The following code shows how this can be done using that approach:

// Recursion termination: Type not found.
template <typename C, typename... Ts>
struct FindType {
  using type = void;
};

// Recursion
template <typename C, typename T, typename... Ts>
struct FindType<C, T, Ts...> {
  // Set `type = T` if T satisfies condition C; otherwise, keep
  // searching in the remaining types, Ts...
  using type = ::std::conditional_t<  //
      C::template value<T>, T, typename FindType<C, Ts...>::type>;
};

template <int storage_bits,
          int exponent_bits =
              standard_binary_interchange_format_exponent_bits<storage_bits>(),
          int mantissa_bits =
              standard_binary_interchange_format_mantissa_bits<storage_bits>()>
using BinaryFloatOrVoid = typename FindType<                  //
    Is_Ieee754_2008_Binary_Interchange_Format<storage_bits,   //
                                              exponent_bits,  //
                                              mantissa_bits>,
    float, double, long double>::type;

Clearly, the if constexpr version is simpler and much more readable. I expect to see less of the "SFINAE mess" in the future, thanks to if constexpr and, hopefully, concepts.

Finally, we introduce another type layer to produce a compile-time error with a clear message if the requested type is not available (i.e., BinaryFloatOrVoid is void).

template <typename T>
struct AssertTypeFound {
  static_assert(
      !::std::is_same_v<T, void>,
      "No corresponding IEEE 754-2008 binary interchange format found.");
  using type = T;
};

template <int storage_bits>
using IEEE_754_2008_Binary = typename AssertTypeFound<
    BinaryFloatOrVoid<storage_bits>>::type;

OK, we have finally constructed the type IEEE_754_2008_Binary<n> that guarantees conformance to the IEEE 754 standard binary interchange format. Yay!

So are we done? Not quite. There's one last step that every programmer loves: writing tests. :)

template <int storage_bits, int exponent_bits, int mantissa_bits>
void test_if_type_exists() {
  throw;

  if constexpr (!::std::is_same_v<BinaryFloatOrVoid<storage_bits>, void>) {
    using T = IEEE_754_2008_Binary<storage_bits>;
    static_assert(::std::is_floating_point<T>(), "");
    static_assert(::std::numeric_limits<T>::is_iec559, "");
    static_assert(::std::numeric_limits<T>::radix == 2, "");
    static_assert(get_storage_bits<T>() == storage_bits, "");
    static_assert(get_exponent_bits<T>() == exponent_bits, "");
    static_assert(get_mantissa_bits<T>() == mantissa_bits, "");
  }
}

void tests() {
  throw;

  test_if_type_exists<16, 5, 10>();
  test_if_type_exists<32, 8, 23>();
  test_if_type_exists<64, 11, 52>();
  test_if_type_exists<128, 15, 112>();
}

Again, all the checks are done at compile time with static_assert, so we don't need to call tests(). We just have to ensure that the test_if_type_exists functions are instantiated. If a type doesn't exist (e.g., 16- and 128-bit types on most systems), then if constexpr will simply discard the checks.

I hope you had as much fun reading this as I did writing it. The full implementation is available in this repository: https://github.com/kkimdev/ieee754-types.