utf8mb4 in mysql

🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱

在爬一些比較潮的網頁時難免會抓到emoji字元。
這在寫入mysql時會發生問題,因為mysql預設是使用utf8,必須改成utf8mb4才能避免寫入emoji時的錯誤。

1
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F…

mysql.ini

1
2
3
4
5
6
7
8
9
10
11
[client]
default-character-set=utf8mb4

[mysql]
default-character-set=utf8mb4

[mysqld]
character-set-client-handshake=FALSE
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'

restart mysql後,沒意外可以看到相關的character-set都變成utf8mb4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.19-log |
+------------+
1 row in set (0.00 sec)

mysql> show variables where variable_name like 'character%' or variable_name like 'collation%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.6\share\charsets\ |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+---------------------------------------------------------+
11 rows in set (0.00 sec)

database/table schema

舊的schema可以繼續保持CHARACTER SET=utf8,而針對需要寫入emoji的庫/表/欄alter成CHARACTER SET=utf8mb4。

1
2
3
ALTER DATABASE database_name CHARACTER SET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

java JDBC connector

  1. 使用mysql提供的JDBC driver,要確保mysql connector的版本高於5.1.13才支援utf8mb4。
  2. 以往在jdbc.uri裡加入的characterEncoding=utf8,一定要去除※重要※