🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱
在爬一些比較潮的網頁時難免會抓到emoji字元。
這在寫入mysql時會發生問題,因為mysql預設是使用utf8,必須改成utf8mb4才能避免寫入emoji時的錯誤。1
java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F…
mysql.ini
1 | [client] |
restart mysql後,沒意外可以看到相關的character-set都變成utf8mb41
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25mysql> select version();
+------------+
| version() |
+------------+
| 5.6.19-log |
+------------+
1 row in set (0.00 sec)
mysql> show variables where variable_name like 'character%' or variable_name like 'collation%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.6\share\charsets\ |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+---------------------------------------------------------+
11 rows in set (0.00 sec)
database/table schema
舊的schema可以繼續保持CHARACTER SET=utf8,而針對需要寫入emoji的庫/表/欄alter成CHARACTER SET=utf8mb4。1
2
3ALTER DATABASE database_name CHARACTER SET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE table_name CHANGE column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
java JDBC connector
- 使用mysql提供的JDBC driver,要確保mysql connector的版本高於5.1.13才支援utf8mb4。
- 以往在jdbc.uri裡加入的characterEncoding=utf8,一定要去除。※重要※