Understanding UTF-8
- Unicode ‘code points’
- £ = U+00A3
- € = U+20AC
- Variable byte encoding
1 byte char 0xxxxxxx 7 bit payload 2 byte char 110xxxxx 10xxxxxx 11 bit payload 3 byte char 1110xxxx 10xxxxxx 10xxxxxx 16 bit payload U+0000 - U+007F 1 byte U+0080 - U+07FF 2 bytes U+0800 - U+FFFF 3 bytes U+10000 - U+1FFFFF 4 bytes