Skip to content
This repository has been archived by the owner on May 13, 2024. It is now read-only.

Fix UTF-8 / wide string conversion functions #221

Closed
wants to merge 11 commits into from

Conversation

grorp
Copy link
Member

@grorp grorp commented Jul 9, 2023

Fixes #216. Depends on minetest/minetest#13650.

The UTF-8 / wide string conversion functions in IrrlichtMt are broken on Windows (at least on my machine). Fortunately, Minetest has working ones. This PR moves the working UTF-8 / wide string conversion functions from Minetest to IrrlichtMt so that they can be used by both Minetest and IrrlichtMt. The broken ones have been removed.

I'd be interested in better ideas on how to fix this.

To do

This PR is a Work in Progress.

  • Move the iconv dependency from Minetest to IrrlichtMt

How to test

Verify that special characters are shown correctly in all contexts and on all platforms, I guess...
+ Verify that #216 is fixed.

@sfan5
Copy link
Member

sfan5 commented Jul 9, 2023

Given that Irrlicht is to be merged eventually I think it's more worthwhile to adjust Irrlicht to fit Minetest and not the other way around.

This PR moves the iconv dependency from Minetest to IrrlichtMt. Do I have to change anything because of that?

definitely

@grorp
Copy link
Member Author

grorp commented Jul 9, 2023

Given that Irrlicht is to be merged eventually I think it's more worthwhile to adjust Irrlicht to fit Minetest and not the other way around.

Yes, you're right, but how?

definitely

OK :)

@grorp grorp marked this pull request as draft July 9, 2023 19:00
@sfan5
Copy link
Member

sfan5 commented Jul 9, 2023

Yes, you're right, but how?

Not sure if that works but maybe we can get away with referencing the MT functions as extern in Irrlicht and then have MT provide them.

@grorp
Copy link
Member Author

grorp commented Jul 9, 2023

Not sure if that works but maybe we can get away with referencing the MT functions as extern in Irrlicht and then have MT provide them.

This works when building MT, but breaks the unit tests... Would putting a stub into the unit tests suffice?

EDIT: No, that doesn't suffice. Actually, MT only seems to compile on Linux. On other platforms, it doesn't even compile... Doing it this way doesn't seem to be possible.

@sfan5
Copy link
Member

sfan5 commented Jul 20, 2023

I guess we have to copy-paste them over then. I'd prefer if this happened in a way that adds the fewest amount of code feasible.

@sfan5
Copy link
Member

sfan5 commented Aug 3, 2023

Different idea: maybe we can use std::wstring_convert for the time being so we don't have to copy so much stuff over.
https://stackoverflow.com/questions/7469296/#answer-7561991
Only issue is that it's deprecated in c++17 but by the time that becomes relevant Irrlicht and MT are hopefully merged.

@grorp grorp force-pushed the unify-utf8-wide-conversion branch from 64b45b6 to 32e0e26 Compare August 30, 2023 18:43
@grorp grorp changed the title Unify/fix conversion between UTF-8 and wide strings across Minetest and IrrlichtMt Fix UTF-8 / wide string conversion functions Aug 30, 2023
@grorp
Copy link
Member Author

grorp commented Aug 30, 2023

All builds work fine, except the no-SDL MSVC builds. They fail with a completely unrelated error message:

D:\a\irrlicht\irrlicht\source\Irrlicht\os.cpp(307,14): error C3861: '_localtime_s': identifier not found [D:\a\irrlicht\irrlicht\source\Irrlicht\IRROTHEROBJ.vcxproj]

I can reproduce this locally, but I have no idea what's going on.

@grorp
Copy link
Member Author

grorp commented Aug 30, 2023

The fun thing is that _localtime_s doesn't seem to exist according to the Internet. All I can find is:

  • localtime
  • _localtime32
  • _localtime64
  • localtime_s
  • _localtime32_s
  • _localtime64_s

os.h uses localtime on all platforms, even on Windows with SDL. The only platform on which os.h uses _localtime_s is Windows without SDL.

I've now simply replaced _localtime_s with localtime. I hope that works, not sure what would be the actual solution.

@grorp
Copy link
Member Author

grorp commented Aug 30, 2023

My unit test (grorp/minetest#3) still fails on Windows and passes on Linux. On windows, keyboard input and window title have no problems with UTF-8 anymore, but copy-paste is completely broken (only tested with SDL, not without). Everything still works on Linux.

@grorp
Copy link
Member Author

grorp commented Aug 30, 2023

I'm closing this pull request as I can't currently get it to work. I will reopen it if I can get it to work.

Btw, I think the problem isn't that some of the conversion functions are broken, but that the Irrlicht functions work with the "locale-defined multibyte encoding" and the Minetest functions always work with UTF-8. We mix and match them in our code, especially at Minetest <-> Irrlicht boundaries.

@grorp grorp closed this Aug 30, 2023
@sfan5
Copy link
Member

sfan5 commented Aug 30, 2023

We mix and match them in our code, especially at Minetest <-> Irrlicht boundaries.

I've removed any use multibyte (as opposed to UTF-8) from Minetest years ago: minetest/minetest@c834d2ab2
So the only issue here is that Irrlicht still uses multibyte somewhere.

My unit test (grorp/minetest#3) still fails on Windows and passes on Linux.

Hmm, why isn't the following test impacted? https://github.com/minetest/irrlicht/blob/master/examples/AutomatedTest/test_string.cpp#L169
Testing with emoji might not be optimal as it's outside of the BMP and I'm not sure how that interacts with surraogate handling on Windows (is it guaranteed to work?)

I've now simply replaced _localtime_s with localtime. I hope that works, not sure what would be the actual solution.

We should be using localetime_s / localtime_r (as we do in MT) anyway, I'll prepare a PR for that tomorrow.

@grorp
Copy link
Member Author

grorp commented Aug 31, 2023

I meant that we mix and match the Minetest and Irrlicht conversion functions. For example:

utf8_to_wide is applied to the input for setWindowCaption (wstrgettext also uses utf8_to_wide)

https://github.com/minetest/minetest/blob/4252f9d4d5041957c06406e691c6cda24f866875/src/client/clientlauncher.cpp#L211-L215

The SDL version of setWindowCaption applies wStringToMultibyte to its input (I think this should be fine, as the wide encoding used by both sets of conversion functions is the same, right?)

However, the multibyte-encoded (as opposed to UTF-8) string is now passed to SDL_SetWindowTitle, which always expects UTF-8 according to https://wiki.libsdl.org/SDL2/SDL_SetWindowTitle:

This string is expected to be in UTF-8 encoding.

//! sets the caption of the window
void CIrrDeviceSDL::setWindowCaption(const wchar_t* text)
{
core::stringc textc;
core::wStringToMultibyte(textc, text);
SDL_SetWindowTitle(Window, textc.c_str());
}

Obviously, this can't work. The "locale-defined multibyte encoding" seems to be Windows-1252 on my machine:

PS> [System.Text.Encoding]::Default
IsSingleByte      : True
BodyName          : iso-8859-1
EncodingName      : Westeuropäisch (Windows)
HeaderName        : Windows-1252
WebName           : Windows-1252
WindowsCodePage   : 1252
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1252

But I'm doing a lot of guesswork here.

@sfan5
Copy link
Member

sfan5 commented Aug 31, 2023

I meant that we mix and match the Minetest and Irrlicht conversion functions. [...]

Ah, you're correct then.

The "locale-defined multibyte encoding" seems to be Windows-1252 on my machine:

Indeed, Windows still defaults to Latin-1 or other funny encodings depending on region.
There's a way to tell Windows that you want it to use UTF-8, which I've done (minetest/minetest@3fa8232) but I don't know if this applies really everywhere.

@grorp
Copy link
Member Author

grorp commented Oct 4, 2023

I now have a working version of this, it's just waiting for #246 to be fixed: grorp/irrlicht@058bddb...fixnofix

@Zughy
Copy link
Member

Zughy commented Oct 4, 2023

@grorp would that fix minetest/minetest#13646 ?

@grorp
Copy link
Member Author

grorp commented Oct 5, 2023

would that fix minetest/minetest#13646?

yes

@grorp grorp deleted the unify-utf8-wide-conversion branch December 18, 2023 16:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SDL 2 on Windows: Problems with Unicode/UTF-8
3 participants