MySQL 8 default character set is utf8mb4

The UTF-8 is a variable-length encoding.  In the case of UTF-8, it means that storing one code point requires one to four bytes. But, In MySQL’s encoding called “utf8” only stores a maximum of three bytes per code point. In the modern web / mobile applications, we have to support for storing not only language characters but also symbols and emojis, Let me show you below some very weird issues faced using MySQL “utf8” :

mysql> SET NAMES utf8; # just to emphasize that the connection charset is set to `utf8`
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE custfeeds.reactions SET reacted = 'super like 👍' WHERE id = 13015;
Query OK, 1 row affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 1

mysql> SELECT reactions FROM custfeeds.reactions WHERE id = 13015;
+-------------+
| reactions   |
+-------------+
| super liked |
+-------------+
1 row in set (0.00 sec)


mysql> SHOW WARNINGS;
+---------+------+------------------------------------------------------------------------------+
| Level   | Code | Message                                                                      |
+---------+------+------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xF0\x9D\x8C\x86' for column 'reactions' at row 731 |
+---------+------+------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MySQL’s utf8 charset can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported. MySQL 5.5.3 released utf8mb4 encoding to solve this problem (https://dev.mysql.com/doc/relnotes/mysql/5.5/en/news-5-5-3.html)  so utf8mb4 charset is not a MySQL 8 new feature (yes, It’s now default charset from MySQL8)

MySQL 8 has by default utf8mb4 character set

mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_0900_ai_ci |
| collation_database       | utf8mb4_0900_ai_ci |
| collation_server         | utf8mb4_0900_ai_ci |
+--------------------------+--------------------+
10 rows in set (0.01 sec)

How things changed in MySQL 8 with utf8mb4 character set ?

mysql> UPDATE custfeeds.reactions SET reacted = 'super like 👍' WHERE id = 13015;
Query OK, 1 row affected, 0 warning (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> SELECT reactions FROM custfeeds.reactions WHERE id = 13015;
+-------------+
| reactions   |
+-------------+
| super liked👍 |
+-------------+
1 row in set (0.00 sec)

Conclusion

Traditionally MySQL is built for scaling web-scale database infrastructure operations, In the modern web applications / mobile apps. , emojis and a multitude of charsets / collation needs to coexist. To address this compelling need, in MySQL 8.0 default character set has been changed from latin-1 to utf8mb4.

About MinervaDB Corporation 88 Articles
Independent and vendor neutral consulting, support, remote DBA services and training for MySQL, MariaDB, Percona Server, PostgreSQL and ClickHouse with core expertize in performance, scalability and high availability . We are an virtual corporation, all of us work from home on multiple timezones and stay connected via Email, Skype, Google Hangouts, Phone and IRC supporting over 250 customers worldwide
UA-155183614-1