Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise indexlimit to given limit from elasticsearch #4277

Merged
merged 1 commit into from
Apr 13, 2021

Conversation

Kathrin-Huber
Copy link
Contributor

@Kathrin-Huber Kathrin-Huber commented Mar 15, 2021

Elasticsearch is configured not to return more than 10'000 results (or to search behind that number).

It is not advised to change this value, but we can nevertheless get amounts up to this limit.

@Kathrin-Huber Kathrin-Huber requested a review from solth March 15, 2021 11:29
@henning-gerhardt
Copy link
Collaborator

I'm unsure, if your change is really fixing this problem or only moving it to an other issue.

Between: If this changed method is called with a value for parameter size then this value is used and this value can be bigger then 10.000 which will cause the next issue.

@Kathrin-Huber
Copy link
Contributor Author

I'm unsure, if your change is really fixing this problem or only moving it to an other issue.

Between: If this changed method is called with a value for parameter size then this value is used and this value can be bigger then 10.000 which will cause the next issue.

But we already had this problem before, and this PR doesn't change anything on it. (a different PR may fix this Problem)
There may be a problem, if an offset is added to the max value, but no method call is adding an offset, to a null size, so currently we are save. (but this could also be catched in a separat PR)

@henning-gerhardt
Copy link
Collaborator

Ok. I thought this PR should fix this issue for all time.

Copy link
Member

@solth solth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will all documents be included in the resulting excel file if the system contains more than 10.000 documents? I obviously do not have enough processes to test this, but the linked issue description does say that all processes should be included in the excel file, not that the limit should just be increased from 1.000 to 10.000.

@matthias-ronge
Copy link
Collaborator

I don't know exactly, but somewhere there is a limit to the maximum number of rows in an Excel spreadsheet, right? Certainly not at 10,000 yet, but...

@Kathrin-Huber
Copy link
Contributor Author

That's right, only 10'000 documents will be included. but it is just not possible to get more than this (even with a loop, or paging ect...)
A filter may help, or sorting, so one can add the first 10'000 and the last 10'000 but this are only hotfixes. (like this PR)

Honestly, I hope that #4208 will fix this.

This PR will at least add significant more documents to the excel sheet than 1000.

@solth
Copy link
Member

solth commented Apr 7, 2021

I understand that the proposed change mitigates the problem a little, but it does not solve the linked issue. Therefore I would suggest to remove the link to the issue - so that it isn't closed when this PR gets merged - if no other changes are made to the pull request.

With the limitations ElasticSearch 5 brings we should perhaps add a warning dialog that pops up if the system contains more than 10.000 processes and that informs the user that the resulting Excel file will not be complete because of technical restrictions.

@andre-hohmann
Copy link
Collaborator

Maybe, i should create a new issue. The main problem ist, that it is not possible to retrieve more then 10.000 processes in Kitodo.Production. In some cases more hits have to be retrieved, for example:

  • Retrieval of all processes in a project to answer the question of the projects size
  • Retrieval of all issues of a newspaper to add metadata like collection, rights information
  • ...

It seems that the creation of the Excel file with more then 10.000 processes would be possible, if the more processes could be retrieved.

@andre-hohmann
Copy link
Collaborator

I have created a issue: #4331.
From my point of view, the issue #4099 should only be closed, when there is no limitation in the result list anymore. Nevertheless i mentioned this problem in #4331.

I strongly agree with @solth:

With the limitations ElasticSearch 5 brings we should perhaps add a warning dialog that pops up if the system contains more than 10.000 processes and that informs the user that the resulting Excel file will not be complete because of technical restrictions.

@Kathrin-Huber Kathrin-Huber merged commit 4f290c0 into kitodo:master Apr 13, 2021
@Kathrin-Huber Kathrin-Huber deleted the fill_excel_file branch April 13, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants