Encoding Unicode - UTF-8
- Variable number of bytes per character
- Code points U+0000 - U+007F use one byte
- (ASCII is a subset of UTF-8)
- Number of bytes indicated by leading '1' bits in first byte
- Continuation bytes always start with '10'
1 byte character 0xxxxxxx 7 bit payload
2 byte character 110xxxxx 10xxxxxx 11 bit payload
3 byte character 1110xxxx 10xxxxxx 10xxxxxx 16 bit payload
4 byte character 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21 bit payload