Binary Encoding of Data Types
Explore the methods of binary encoding for various data types within databases, including primitive numeric types, strings using ASCII and UTF formats, booleans, enums, and complex structures like records and arrays. Understand byte ordering, floating point representation, and string formats to better grasp how databases store and organize data on disk.
Binary encoding represents information in an optimized binary format on memory or disk. This section covers how different data types are encoded and stored on disk.
Primitive types
This section discusses the encoding format of different primitive types.
Numeric digits without a decimal point
Numeric types without a decimal point, such as integer and long, are first converted into their corresponding binary values in 0s and 1s. Integer values occupy 4 bytes, and long values occupy 8 bytes. Then, the endianness dictates the order of sequencing multiple bytes of a given binary representation in the memory or disk.
There are two types of endianness:
Little-endian: A little-endian system first stores the least significant byte of the value, followed by the most significant byte.
Big-endian: A big-endian system first stores the most significant byte of the value, followed by the least significant byte.
The illustration below provides example methods for checking the endianness of a machine in different programming languages:
The approach above works for unsigned numeric digits. The signed numeric digits will have a 0 as ...