Contents

NDArray.ScalarType.float8e4m3fn

An 8-bit floating-point type with 4 exponent bits and 3 mantissa bits, without a sign bit.

Declaration

case float8e4m3fn

Discussion

FP8 E4M3FN (Finite Number) format emphasizes precision over range with additional mantissa bits. This type is used for quantization and in some activation functions.

See Also

Defining 8-bit floating-point types