The UTF-8 is a variable-length encoding. In the case of UTF-8, it means that storing one code point requires one to four bytes. But, In MySQL’s encoding called “utf8” only stores a maximum of three bytes per code point. In the modern web / mobile applications, we have to support for storing not only language characters but also symbols and emojis, Let me show you below some very weird issues faced using MySQL “utf8” :
mysql> SET NAMES utf8; # just to emphasize that the connection charset is set to `utf8` Query OK, 0 rows affected (0.00 sec) mysql> UPDATE custfeeds.reactions SET reacted = 'super like 👍' WHERE id = 13015; Query OK, 1 row affected, 1 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 1 mysql> SELECT reactions FROM custfeeds.reactions WHERE id = 13015; +-------------+ | reactions | +-------------+ | super liked | +-------------+ 1 row in set (0.00 sec) mysql> SHOW WARNINGS; +---------+------+------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+------------------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xF0\x9D\x8C\x86' for column 'reactions' at row 731 | +---------+------+------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
MySQL’s utf8 charset can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported. MySQL 5.5.3 released utf8mb4 encoding to solve this problem (https://dev.mysql.com/doc/relnotes/mysql/5.5/en/news-5-5-3.html) so utf8mb4 charset is not a MySQL 8 new feature (yes, It’s now default charset from MySQL8)
MySQL 8 has by default utf8mb4 character set
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+--------------------+ | Variable_name | Value | +--------------------------+--------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8 | | collation_connection | utf8mb4_0900_ai_ci | | collation_database | utf8mb4_0900_ai_ci | | collation_server | utf8mb4_0900_ai_ci | +--------------------------+--------------------+ 10 rows in set (0.01 sec)
How things changed in MySQL 8 with utf8mb4 character set ?
mysql> UPDATE custfeeds.reactions SET reacted = 'super like 👍' WHERE id = 13015; Query OK, 1 row affected, 0 warning (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0
mysql> SELECT reactions FROM custfeeds.reactions WHERE id = 13015; +-------------+ | reactions | +-------------+ | super liked👍 | +-------------+ 1 row in set (0.00 sec)
Conclusion
Traditionally MySQL is built for scaling web-scale database infrastructure operations, In the modern web applications / mobile apps. , emojis and a multitude of charsets / collation needs to coexist. To address this compelling need, in MySQL 8.0 default character set has been changed from latin-1 to utf8mb4.