-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query in the frontend does not find proper indexed document #277
Comments
In the meantime, the app has been updated twice, current version is 26.0.2 -- I have repeated the procedure of fresh setup and rebuilding the index. Search in the frontend (and using |
I've encountered a similar issue whether I'm using Nextcloud 26.0.4 or 27.0.1, both with the latest versions of fulltextsearch. A fresh full indexing shows no results either in the GUI or with the occ command. However, I found that the problem only occurs for LDAP accounts. Local users in Nextcloud, whether admin or not, do not have the problem. |
While experimenting with the new versions, I recognize a somewhat weird behaviour:
But if I look at the document in question, the length (6348384) is the length of the
the commandline search succeeds, and in the web frontend I get a result (and interestingly, the 2nd line of the result does not show some content snippet, but rather the raw base64 string) So my question is: can it be that the query issued by occ or the web frontend simply invokes a search on the wrong property ( |
Further experimenting with NC 26.0.5, ES 8.x ... When I change the file
then I get two changes:
Please somebody verify. For me, fulltextsearch with Elasticsearch 8.x has never worked. |
Now that I have changed the above file... it turns out that this is not a real solution; even if it makes the search in files content working, it breaks DECK and perhaps other content providers... as soon as I change the search and highlight field names from So my conclusion for now: the content providers provide their contents differently to the index process, and the searcher can either search the one or the other. Or someone knowledgeable (e.g. not me) please step up and change the query generation such that all content can be found. Or change the indexing process such that the queries find everything. PLEASE, is there somebody listening? The current state of fulltextsearch in Nextcloud is that it does not work at all (or somebody prove me wrong). If I should provide logs or anything to diagnose the problem please let me know. (and YES I know, it's your spare time: it's mine too.) |
Generally fulltextsearch works for me (NC27.0.2). Do you encounter your issue with specific files only or with all files within your cloud? Did you try to completely rebuild your index? |
To summarize:
So now, I changed the SearchMappingProvider.php and changed all occurrences of 'content' into 'content.attachment'. Now the situation has reversed partly:
From my humble understanding, there must be a basic difference between how documents of different providers are indexed, and how they can be found. And I'm not involved enough to find the right place to correct it. You say it works with Nextcloud 27? I won't upgrade right now, but I can try to compare if the versions of the three fulltextsearch apps carry significant changes between 26.0.2 and 27.0.. Or maybe the Deck app has changed and indexes differently now? |
I don't assume this is a general issue because it seems to work on my instances. How does your search query look like? Maybe it would also be helpful if you could provide some sample file and specific steps how to reproduce this issue so that others can double check. |
I'm having -almost- the same problem. The main differences are:
The main part of the error message is:
But I think this error message is misleading. When I change max_analyzed_offset to 99999, I get the same error message, just with the hint that I should set max_analyzed_offset below 99999. My search word consists of 10 letters. Funny thing is, that I can enter the first 3 letters, and nextcloud finds the correct documents, but when entering the 4th letter the error appears (while in elasticsearch the whole word works). This is the reason why I don't expect this just a problem of a few installations, I assume it is a general issue |
I'd think it can be understood this way: the error message asks you to set the query parameter to less than the index setting, but what you did was to decrease the index setting instead. With the curl statement in #277 (comment) I have done it the other way: I have increased the index setting. The problem is that Nextcloud's query does not contain a highlight limit at all, so it asks for highlighting in unlimited lengths, and the index refuses it if the given document has a too-long content property which exceeds its own limit. |
Hey folks, wanted to give some feedback from my end here. The @gpgmailencrypt I think you're right and this app currently does not set the The @it25fg I checked your example from #277 (comment) and what I can see is that in my personal ES index the {
"_index" : "nextcloud_index",
"_id" : "files:4848622",
"_score" : 10.761789,
"_source" : {
"owner" : "",
"groups" : [ ],
"circles" : [ ],
"metatags" : [
"files_external"
],
"source" : "files_external",
"title" : "path/to/file.txt",
"users" : [
"__all"
],
"content" : "This is a plaintext non base64",
"tags" : [ ],
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "ro",
"content_length" : 174
},
"provider" : "files",
"subtags" : [ ],
"parts" : {
"comments" : ""
},
"links" : [
""
],
"share_names" : [ ],
"hash" : "0d7b2cd93a9ca0ac89e199a5f4ad2208"
}
} So to me it looks like there's something wrong in the way your documents have been created. As a temporary fix you could try to optionally add // ...
"should": [
{
"match_phrase_prefix": {
"content": "demonstrates"
}
},
{
"match_phrase_prefix": {
"attachment.content": "demonstrates"
}
},
// ... But you'd need to patch the PHP code for that. Here I'd say the we'd rather need to investigate why your index documents look different than mine. |
@it25fg I did double check on nextcloud 27.0.2 with fulltextsearch-elasticsearch 27.0.2 and elasticsearch 8.9.1. The index looks like shown by @R0Wi (content field filled properly). The highlight.max_analyzed_offset / max_analyzed_offset issue seem not to be directly related. Maybe these two issues should be separated so that it's easier to follow up/fix. |
Many thanks @XueSheng-GIT @R0Wi for the enlightenment. So it looks like my index was built wrong? But that's not strictly me who did it: it is the same Fulltextsearch app you all have, and it was the same And there, I have it! When I first tried to update ES to 8.x (not knowing that it could not work with NC 25) I came into the situation that
And with the update of NC to 26, I did not remove these requests (because I simply did not know that they weren't necessary anymore). Now I have removed them from my 'rebuild all' script, and the attachment pipeline now is:
which clearly explains that the extracted text is stored back into the 'content' field. My index is now being rebuilt. I'll soon be back with new results... |
Many thanks for all the insights and help! Now that |
Nextcloud 26.0.4, ElasticSearch 8.6.1
I can manually verify that indexing (rebuilt with upgrade to ES 8.6.1) and live indexing works. With a web query (a simple one, without further filtering) the document can be found. But
occ fulltextsearch:search
(or, the NC web gui search) does not find it: neither by the owner nor by any of the sharees.The only thing that puzzles me is that the owner is not mentioned in the
_source/users[]
array... is this correct? In theshare_names
dictionary, the owner is present, and of course in the_source/owner
property.If the above observation is not relevant then there can only be something wrong with the query issued by the Nextcloud app (not sure which of the three components).
I can debug queries and responses: my search daemon is local and I have switched off SSL, so I can simply capture it by
tcpdump
. How to proceed here?The text was updated successfully, but these errors were encountered: