-
-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid string for charset utf8mb4
#8893
Comments
I was finally able to replicate the issue here, and I think I've found a way to fix it. First, we should convert the column to a ALTER TABLE Products MODIFY name BLOB; From here, we are free to modify the strings, so we can replace the bad string with the appropriate string for UPDATE Products SET name = UNHEX('446F6C744C6162C2AE') WHERE name = UNHEX('446F6C744C6162AE'); Once the strings in question have been replaced, we'll convert back to our ALTER TABLE Products MODIFY name TEXT; Our standard SELECT name FROM Products;
/*
+----------+
| name |
+----------+
| DoltLab® |
+----------+
*/ Let me know if this resolves the issue for you! |
It's worth mentioning that we don't have to use Also, if returning the column back to its standard SELECT CONVERT(name USING utf8mb4), HEX(name) FROM Products; Any strings with the placeholder |
For me, the |
I wonder if there's some other invalid state, since I made sure it would work on my repro beforehand. You're specifically referring to this ALTER TABLE Products MODIFY name BLOB; I doubt this will work, but we should still try it for the sake of thoroughness. What if we use ALTER TABLE Products MODIFY name VARBINARY(16000); If this still does not work, would you mind pushing your repository to DoltHub? Or some sanitized subset that may remove any private information, but still exhibits the problem? It would help with debugging immensely. |
Feel free to adjust the title to something more meaningful
I've got a table in the following form:
Into which we forcefully inserted strings with a wrong encoding (
latin1
) so that we get the following output:Querying
Regular selection does not work (unsurprisingly)
Conversion does kinda work but not as intended
Binary encoding does roughly as expected:
Forcing binary interpretation before converting to
latin1
does not improve results:Casting plain refuses to do anything:
Casting to
BINARY
yields the same result asCONVERT(name USING binary)
.Converting a binary-casted string also yields the same result as the CONVERT-CONVERT strategy.
Replacing the faulty bytes also does not work.
Version
Tested using
Dolt 1.49.3
The text was updated successfully, but these errors were encountered: