As you probably already know, MySQL 8.0 supports Unicode 9.0 and our default charset moved from latin1 to utf8mb4.
Recently, Gabi (certainly working on an amazing presentation for PHPWorld) asked me something about how the client deals with the emojis while checking Morgo’s post about 8.0 RC1.
I didn’t know the answer and had to play with it, here is the outcome of these tests.
First let’s try to use the standard MySQL client and paste the dophin emoji from emojipedia:
So we can see that the input is decoded (codepoints) but the ouptut is indeed the 🐬 (dolphin).
Now if you try to just use the codepoints by typing them directly in the string what does happen ?
As you can see, you can’t type it yourself. If you want to insert an emoji, you need to use the emoji by pasting it.
On the new MySQL Shell, emojis are also visible as such in the input:
Voilà ! Now you have more info on how emojis are handled in the MySQL clients.
Also a good demonstration for https://bugs.mysql.com/bug.php?id=7616
But the extra space(s) remain 🙁
It’s great that you can use the native MySQL client but if you try the official ODBC client you’ll see that getting emojis or any other Unicode character residing outside plane 0 (i.e. whose code point value is higher than 65535) with ODBC is a gruesome experience. The only way to make it work, it seems, is to use binary encoding. That is not the expected experience of a modern database client backed (and sold) by a large corporation like Oracle…
Hi Dror,
Thank you for you comment, I’m not familiar with ODBC (which to my sense is not a modern databases client) 😉 but I will transfer your frustration and comment to the developers.
Cheers,
Thanks lefred. I guess that ODBC got a bad name on the Linux/Unix world for lack of a platform selected ODBC driver manager (as exists on Windows, for example) and due to inconsistency in the ABI (with some drivers using UTF8, some using UTF16 or even UTF32 – in many cases regardless of the platform’s wchar_t definition). Still, most database vendors offer an ODBC driver and this makes writing a generic client that can work with many databases feasible. It is here that the MySQL ODBC driver fails and it fails even in Windows-land where most others just get it right (well, not with UTF-8 because Windows does not support it as a common encoding, but rather with UTf-16 or with the wide characters as bound using SQL_C_WCHAR).
Regardless of the above, the ODBC interface is the most modern common database client compatible with C/C++ (just like JDBC is the most modern common database client compatible with Java). Here ‘common’ refers to coverage of databases.
Thx Dror for those details. As I said I’ve sent your comment to the ODBC driver team. But feel free to also submit a bug. This is the only way we will be able to fix your issue. Also don’t forget that using the ANSI version of the driver wouldn’t be appropriate for unicode, in case you do.
Thx
yeah, i have tons of foundness remembering the night i spent learning about emoji support in mysql … as mysql chose utf8mb4 as default while keeping latin1 as default for mysqldump, it was fun and creative to fix a migrated database which lost all its stored emoji, went live and got new emoji without warning, having the database owner telling me after the fact that he used emoji in his base. So so much fun (hint: or not). 💩💩💩
Xidit: I hope you will have better night now 😉
Everything was fixed in the end, luckily the original databases were still intact and we were able to merge the old original with the addition from migrated ones to get a result with all emojis. I learned that day not to assume mysqldump had the same default charset as mysql.
I sleep better now, thanks, but i strangely took a liking toward emoticons. 🙂
[…] Read More (Community content) […]