Computer Science

Computer Systems

I. Data Storage

Endianness

Big-endian systems store the most significant byte of a word in the smallest address and the least significant byte is stored in the largest address. Little-endian systems, in contrast, store the least significant byte in the smallest address.[1]

For example, there is a number in hexadecimal: 0x01234567, with the most significant byte 0x01 and least significant byte 0x67. It is stored within the address range 0x100 through 0x103.[2]

Big-endian:

Address 0x100 0x101 0x102 0x103
Value 0x01 0x23 0x45 0x67

Little-endian:

Address 0x100 0x101 0x102 0x103
Value 0x67 0x45 0x23 0x01

Different processors may follow different conventions. For example the Intel x86 and x86-64 series of processors use the little-endian format while the Motorola 6800 and 68k series of processors use the big-endian format. And newer versions of ARM processors support bi-endian.[1] The most important thing is the consistency. One should keep the used convention in mind when (1) binary data are communicated over a network between different machines; (2) looking at the byte sequences representing integer data, say inspecting machine-level code generated by a disassembler; (3) programs are written that circumvent the normal type system, say using a data type cast in C.[2]

II. Data Representation

Integer Representation

Two’s-complement encodings (and others).

Floating-Point Numbers

IEEE floating-point representation

Character Encoding

ASCII and Unicode standards are most commonly used for text encoding. See more details in [3].

Instruction Encoding

Instruction codings are usually different across different processors or even the same processor running different operating systems.[2]