Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too common to trigger a too many open files error #3038

Open
Kerollmops opened this issue Nov 10, 2022 · 2 comments
Open

Too common to trigger a too many open files error #3038

Kerollmops opened this issue Nov 10, 2022 · 2 comments
Labels
bug Something isn't working as expected milli Related to the milli workspace

Comments

@Kerollmops
Copy link
Member

When indexing a big dataset it is too common to trigger a too many open files. This error is thrown when indexing. This is likely to be produced by grenad and the extractors of milli that generate a lot of files. The dataset I was using is a 33M lines of JSON that is about 14GiB, sending in one single batch. It should also be triggered by having indexed a lot of documents and changing the settings, this will force the re-indexation of the full dataset.

We designed this new indexation system with @ManyTheFish to reduce the amount of RAM the engine was using and therefore stop the number of crashes (killed by the OS) witnessed by our users. We did a good job even if we can do better (see #3037).

I want to explore a new exploration on our extractors by @loiclec at meilisearch/milli#656. This refactoring should bring more efficient RAM usage by the extractor without using too much memory, speeding up the indexation process and reducing the number of created files.

@Kerollmops Kerollmops added the bug Something isn't working as expected label Nov 10, 2022
@Kerollmops Kerollmops changed the title Too common to trigger a _too many open files_ error Too common to trigger a too many open files error Nov 10, 2022
@curquiza curquiza added this to the v1.0.0 milestone Nov 10, 2022
@curquiza curquiza added the milli Related to the milli workspace label Nov 10, 2022
@loiclec
Copy link
Contributor

loiclec commented Nov 15, 2022

Regarding meilisearch/milli#656 , part of the reason that I did not continue developing it (besides time) is that it would increase the amount of memory used during indexing (up to the defined limit).

In theory, it is not a problem as we still stay within the memory usage limit. But in practice, it reduces our margin of error. So we need to be very confident about memory management both inside the data extractors and for any code that can be run parallel to indexing (most importantly, search queries I guess).

@curquiza curquiza removed this from the v1.0.0 milestone Jan 4, 2023
@ManyTheFish
Copy link
Member

ManyTheFish commented Jul 11, 2023

Pinging from triage,

Severity: there is the possibility to manually increase the limit using ulimit

Size: to fix the bug we need to do more investigations (around 1day):

  • is it possible for Meilisearch to increase the limit?
  • if not possible, we'll have to reduce the number of file Opened by Meilisearch which is an L to XL size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected milli Related to the milli workspace
Projects
None yet
Development

No branches or pull requests

4 participants