Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review relevance of search for localized guides #103

Closed
yrodiere opened this issue Dec 20, 2023 · 6 comments
Closed

Review relevance of search for localized guides #103

yrodiere opened this issue Dec 20, 2023 · 6 comments
Assignees

Comments

@yrodiere
Copy link
Member

yrodiere commented Dec 20, 2023

Hey @ynojima , @rdnovell,

A few days ago we merged support for localized guides in search.quarkus.io (see #71). Thanks again for your help on that! And thanks to @marko-bekhta for implementing it :)

I don't expect search to be any worse than the current (substring-based) implementation, so we're probably going to integrate this in quarkus.io very soon.

Assuming we fix #102, after we deploy to quarkus.io and localized sites get updated, you should be able to test the new search on the localized sites. It should look similar to what you'll find on this preview for the English version.

I'm opening this issue to collect your feedback on how relevant search is in localized sites, and to discuss potential improvements (we'll open other issues/PRs when we come up with something). The immediate goal is to be at least as good as javascript search, which as I said above shouldn't be too hard. The longer term goal is to be substantially better.

Note that search might not use the remote service in some cases, e.g. if it's unreachable, or takes too long to answer (bad connection, ...). In that case it will use local javascript-based search like it used to. You can generally spot this easily on the UI by checking if there are any highlighted terms in the search results: if so, it's a remote search, otherwise it's probably a javascript search (not always, but close enough).

If you want to test search right now, you can, but probably only from the command line. Assuming you have jq installed (sudo dnf install jq), this script should allow you to do it:

QUARKUS_LANGUAGE=ja; HIT_LIMIT=5; while read TEXT; do curl -s -XGET -H 'Content-Type: application/json' 'https://search.quarkus.io/api/guides/search' -G --data-urlencode "q=$TEXT" --data-urlencode "language=$QUARKUS_LANGUAGE" | jq ".hits | .[0:$HIT_LIMIT]" ; done

Change QUARKUS_LANGUAGE to what you want (ja/es/...), then run the script above, then type search terms (as you would in a web form), and press enter to see the results.

Note:

  1. This script is slower than what you'll experience in the browser, for some reason. I didn't have time to investigate why, but this should be good enough for testing.
  2. Where possible, matching terms should be highlighted in the results, i.e. surrounded by <span class=\"highlighted\">...</span>.
  3. The "content" field in the results is obviously not the full content of the guide, but snippets of matching content (if any).
@ynojima
Copy link
Member

ynojima commented Dec 20, 2023

As far as I tested with the command-line version, the search works really well with Japanese. It's awesome!
I'm looking forward to see it lands to the localized sites.

@rdnovell
Copy link

For ES site I tested:

  1. Search for a full spanish word
  2. Search for a not full word like "aplicacion" and get results for "aplicaciones"
  3. Search for English words
  4. Search for a phrase like "El conector AMQP de mensajería reactiva"

All is going good, except point 4 looks like the search was done word by word, is that expected?

@yrodiere
Copy link
Member Author

Thanks all for checking.

except point 4 looks like the search was done word by word, is that expected?

Yes, that's expected.

For phrase search, we'd need to change the code a bit, but this can be done.

It would require specific syntax though; instead of:

El conector AMQP de mensajería reactiva

You would have to type:

"El conector AMQP de mensajería reactiva"

Not sure how useful that would be 🤔

BTW @marko-bekhta it seems you disabled all such syntax constructs here, but I think we can re-enable all operators except the "whitespace" operator, no?

@marko-bekhta
Copy link
Collaborator

it seems you disabled all such syntax constructs here, but I think we can re-enable all operators except the "whitespace" operator, no?

Yeah, what you are saying seems to make sense; we'd need to test it to be sure, as that simpleQueryString was having problems with synonyms and stop words, alternatively maybe we could add a match phrase should clause..

@yrodiere
Copy link
Member Author

Yeah, what you are saying seems to make sense; we'd need to test it to be sure, as that simpleQueryString was having problems with synonyms and stop words

Ok, we'll try that eventually then 👍

alternatively maybe we could add a match phrase should clause..

No no no, please no. I don't want to enter the rabbit hole of parsing query strings.

@yrodiere
Copy link
Member Author

yrodiere commented Jan 8, 2024

except point 4 looks like the search was done word by word, is that expected?

Yes, that's expected.

For phrase search, we'd need to change the code a bit, but this can be done.

It would require specific syntax though; instead of:

El conector AMQP de mensajería reactiva

except point 4 looks like the search was done word by word, is that expected?

Yes, that's expected.

For phrase search, we'd need to change the code a bit, but this can be done.

It would require specific syntax though; instead of:

El conector AMQP de mensajería reactiva

You would have to type:

"El conector AMQP de mensajería reactiva"

@marko-bekhta addressed this in #114: phrase search should now work as expected, provided you surround your phrase with double quotes.

I'll close this for now as I don't see any other problem to address. Feel free to reopen or open another issue if you see anything!

And thanks again for testing :)

@yrodiere yrodiere closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants