UTF-8

UTF-8 (8-bit Unicode Transformation Format) is a variable-width character encoding standard that is widely used for representing the characters of the Unicode character set. It was designed as a replacement for ASCII and other single-byte character encodings, with the goal of supporting all characters used in the world's writing systems.

How it works ?

In UTF-8, each character is represented by one to four bytes, depending on the complexity of the character.

ASCII characters are still represented by a single byte, but other characters require more bytes to represent all of their unique details.

The advantage of UTF-8 is that it is backwards compatible with ASCII and can be used with existing ASCII-based systems, while also supporting a much wider range of characters.

Resources

Last updated