Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-English characters not showing correctly in SMF 2.1 RC4 (but they did in RC2) #21

Open
rcutanda opened this issue Aug 13, 2021 · 10 comments

Comments

@rcutanda
Copy link

Hi,

I have been using Sphinx for SMF v1.1 in my SMF 2.1 RC2 forum for a very long time without any problem. I have recently upgraded to RC4 and now non-English characters are not showing correctly in the search results:

image

In the previous example the highlighted words are

método
¡
recién
edición
quizá
fotografía

I am running sphinx-2.2.11-1 in CentOS 8.

Thank you!

@jdarwood007
Copy link
Member

Check out issue #16

There is a charset table in there. Try that out and see if it helps

@rcutanda
Copy link
Author

@jdarwood007 Thank you so much for your suggestion. I have tried different "charset_table" options in my conf file but the result of the search are always the same. The problem does not seem to be Sphinx not indexing contents with special characters but simply not showing them correctly. As I mentioned in my first post, this is not an issue with RC2, only with RC4, so my guess is that something in the code of RC4 is interfering with sphinx.

Thank you once again.

@Sesquipedalian
Copy link
Member

@rcutanda, do you see the same issues in the output when you enable the "Show results as messages" search option?
Screen Shot 2021-08-16 at 2 26 02 PM

@rcutanda
Copy link
Author

Good question... with that option enabled I actually get a blank page with no source code at all.

@Sesquipedalian
Copy link
Member

What happens if you make the following change?

Find:

if (!preg_match_all('~ (-?)("[^"]+"|[^" ]+)~', ' ' . $string , $tokens, PREG_SET_ORDER))

Replace with:

if (!preg_match_all('~ (-?)("[^"]+"|[^" ]+)~u', ' ' . $string , $tokens, PREG_SET_ORDER))

@rcutanda
Copy link
Author

if (!preg_match_all('~ (-?)("[^"]+"|[^" ]+)~u', ' ' . $string , $tokens, PREG_SET_ORDER))

Thank you for the suggestion. However, same result: a blank page with the "Show result as messages" option enabled and incorrect Spanish characters otherwise.

Regards,

@jdarwood007
Copy link
Member

@rcutanda I still can't reproduce this.
What is the charset you are using?

@jdarwood007
Copy link
Member

Are you also able to try sphinx search 3.5? In newer versions they removed the option to specify utf8 and are forcing utf8.

@rcutanda
Copy link
Author

rcutanda commented Feb 19, 2023

Hello, @jdarwood007:

Installing SMF 2.1 final or 2.1.1 (I can't remember which) solved the issue. I am running Sphinx 2.2.11, as I could never make v3 work. My charset is UTF-8 for a Spanish forum.

Regards.

@jdarwood007
Copy link
Member

Well you should upgrade to 2.1.3 then. I have introduced Manticore support, which seems to have better support for UTF-8. For Sphinx v3, I had to rebuild the entire index, I couldn't reuse the old one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants