-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding for Elasticsearch count pushdown #23425
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenSearch connector probably has the same issue. Can you apply the fix there, too?
...lasticsearch/src/test/java/io/trino/plugin/elasticsearch/BaseElasticsearchConnectorTest.java
Outdated
Show resolved
Hide resolved
119bf26
to
3855a2a
Compare
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Good call, yes, just reproduced there with the same set of tests. Pushed the same fix there. I also applied the text blocks change/suggestions too, agree that it looks much cleaner. For context, I followed the same structure from the surrounding tests -- so they could likely get a similar refactoring. |
Lastly, I've submitted the CLA, but I guess it might take a couple of days to hear back. |
@martint - looks like this is just pending CLA processing before it can merge. |
@cla-bot check |
The cla-bot has been summoned, and re-checked this pull request! |
Description
We've identified that when a COUNT(*) query was pushed down and contained special characters, the QueryBuilder string was being handled as ISO-8859-1 and causing parsing issues for Elasticsearch.
Additional context and related issues
For example, this query:
In case the "country" field is a keyword, would result in:
The source for the problem was the
new StringEntity(sourceBuilder.toString())
, which uses https://github.com/apache/httpcomponents-core/blob/rel/v4.4.16/httpcore/src/main/java/org/apache/http/entity/ContentType.java#L106-L107 and defaults to ISO.Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: