Understanding UTF-8
- Unicode ‘code points’
- £ = U+00A3
- € = U+20AC
- Variable byte encoding
1 byte char 0xxxxxxx 7 bit payload 2 byte char 110xxxxx 10xxxxxx 11 bit payload 3 byte char 1110xxxx 10xxxxxx 10xxxxxx 16 bit payload
1 byte char 0xxxxxxx 7 bit payload 2 byte char 110xxxxx 10xxxxxx 11 bit payload 3 byte char 1110xxxx 10xxxxxx 10xxxxxx 16 bit payload