Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Replace Elasticsearch by OpenSearch? #4896

Closed
stweil opened this issue Jan 4, 2022 · 11 comments
Closed

RFC: Replace Elasticsearch by OpenSearch? #4896

stweil opened this issue Jan 4, 2022 · 11 comments
Assignees
Labels
dependencies Pull requests that update a dependency file development fund 2023 A candidate for the Kitodo e.V. development fund.

Comments

@stweil
Copy link
Member

stweil commented Jan 4, 2022

Elasticsearch changed the license in 2021.

OpenSearch is a fork which continues to use the old license.

We might consider switching to OpenSearch. That should be easy from the Open Source releases of Elasticsearch, but I expect that it will become more and more difficult with newer releases and larger differences.

Links:
https://github.com/opensearch-project/
https://de.wikipedia.org/wiki/OpenSearch_(Software)
https://opensearch.org/blog/technical-posts/2021/10/moving-from-opensource-elasticsearch-to-opensearch/

@henning-gerhardt
Copy link
Collaborator

Personally I approve this request but the current implementation and usage is heavily depending on the used version of ElasticSearch/OpenSearch and so I don't know if there is a easy switch to OpenSearch usage possible.

So far as I know there is an process to "migrate" from direct ElasticSearch usage to a usage over hibernate-search with integration of ElasticSearch. So far as I know about hibernate-search this tool can use different versions of ElasticSearch and maybe OpenSearch by only switching configuration values. I would prefer this solution as we then a lousy coupled to ElasticSearch or OpenSearch without changing code inside Kitodo.Production to switch the used search server implementation.

@solth
Copy link
Member

solth commented Jan 4, 2022

It seems HibernateSearch is indeed compatible with OpenSearch (see https://hibernate.atlassian.net/browse/HSEARCH-4212 for details). Since all direct ElasticSearch libraries and packages will be removed from the Kitodo.Production repository with the switch to HibernateSearch, this should then indeed resolve the licensing issue.

@matthias-ronge matthias-ronge added 3.x dependencies Pull requests that update a dependency file labels Feb 16, 2022
@solth solth removed the 3.x label Jul 7, 2022
@solth solth added the development fund 2023 A candidate for the Kitodo e.V. development fund. label Jan 19, 2023
@stweil
Copy link
Member Author

stweil commented Mar 21, 2023

According to the documentation recent versions of Hibernate Search support both Elasticsearch and OpenSearch. so supporting both in Kitodo.Production might be an easy task.

@thomaslow
Copy link
Collaborator

I have some questions regarding the migration to hibernate-search. Since this issue is mentioned in the recent announcement of the development fund 2023, I'll ask them here.

If I remember correctly, migrating to hibernate-search was already experimented with as part of the development fund 2021, see #4208.

@solth Would it be possible for you to summarize what was done and learned in 2021?

  • What problems were revealed?
  • Why did you choose to upgrade to ElasticSearch 6 instead of migrating over to hibernate-search at the time?
  • Were you able to confirm that migrating to hibernate-search would actually improve (or at least have a comparable) indexing performance?

Also, there is a public hibernate-search branch that was started in 2021.

  • What is the current status of this branch?
  • Can this branch be used as the basis for any new work in the development fund 2023?

Thank you and Cheers!

@solth
Copy link
Member

solth commented May 12, 2023

Yes, of course. As you mentioned, the first attempt to replace ElasticSearch with HibernateSearch was done in the context of #4208 where the actual goal was to update ElasticSearch to version 7 (which was succesful).

At that time we hoped the required changes for the migration to HibernateSearch would be manageable and could be performed in the context of the same issue with little extra effort. That turned out to be wrong, though. Instead, the necessary changes proved to be extensive (as you can see in the number of changes made in the branch you mentioned: https://github.com/effective-webwork/kitodo-production/tree/hibernate-search) so we never came around to actually finish the transition to HibernateSearch.

In my experience, the main challenge in the transition to HibernateSearch was the incompatibility of ElasticSearch QueryBuilder objects with the HibernateSearch syntax. The later uses so called SearchPredicates instead of QueryBuilders, which in turn are created by SearchFactory instances. AFAIK these factories only support a lambda method style syntax to create SearchPredicates and once created, those SearchPredicate instances cannot be extended by further clauses or filters anymore. Since that is exactly what is currently done in Kitodo.Production, though, where ES QueryBuilder objects are passed between and augmented in many interconnected classes like SearchService, FilterService and the service classes for the individual object types (most notably ProcessService), refactoring all those QueryBuilder related functions in a way that the SearchFactory variable within the lambda expression can be passed to other functions was a major hassle.

I recently rebased the HibernateSearch branch to resolve conflicts with the current master branch. It is a WIP but I think it can be used as a base for the integration of HibernateSearch in Kitodo.Production. It does already load list entries like processes via HibernateSearch and the indexing on the indexing page is done using the HibernateSearch MassIndexer.
What I cannot say, though, is whether it is the best approach or if rewriting the whole filter and query architecture from ground up to better accomodate the new syntax would be a better way to proceed.

Concerning the performance, the version in that branch is currently quite a bit slower than the current master branch, but that is perhaps due to suboptimal building of quries/search predicates. Indexing the whole index using the MassIndexier is much faster, though.

One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.

@henning-gerhardt
Copy link
Collaborator

henning-gerhardt commented May 12, 2023

One thing that is worth noting is that using HibernateSearch we can get rid of DTO objects because HibernateSearch does load index data directly into bean objects, which should simplify the code in many places considerably.

This DTOs was introduced to avoid a possible publication of the Hibernate beans with database credentials in the UI when an error occur or through some manipulation of the UI to access them on client side.

@thomaslow
Copy link
Collaborator

@solth
Thank you for your summary. I wasn't aware of the query building problem. It seems to be possible to logically combine SearchPredicates. But I'm not sure whether that is sufficient to solve the problems you mentioned above.

@solth
Copy link
Member

solth commented May 16, 2023

I think the main problem I encountered was that the current ElasticSearch classes like QueryBuilder are very deeply integrated and used at many different locations in the Kitodo.Production source code, so removing and replacing them completely with new classes from HibernateSearch - that are constructed in a totally different manner - was more difficult than I thought.

Perhaps there are other approaches to replace ElasticSearch with HibernateSearch instead of trying to keep the current class and method architecture of Production related to filters and searching and directly using HibernateSearch objects in all those locations. Maybe it is easier to not use HibernateSearch objects in all those data service classes like TaskService or ProcessService but instead encapsulate all required data in new custom objects and pass those to the final search / filter services that then create a HibernateSearch SearchPredicate without the need to maintain and pass such an object through all layers of the application.

@stweil
Copy link
Member Author

stweil commented Jul 6, 2024

@matthias-ronge, what is the status of task #5760? When do we expect that Hibernate Search (with OpenSearch) will have replaced Elasticsearch?

Would intermediate support of OpenSearch help if this task still takes some time? I started my own OpenSearch branch yesterday which now passes mvn install. The CI tests are still failing.

@stweil
Copy link
Member Author

stweil commented Jul 18, 2024

I now finished a draft pull request #6131 for OpenSearch which seems to work.

@stweil
Copy link
Member Author

stweil commented Nov 15, 2024

Support for both Elasticsearch and OpenSearch was added in PR #6131, so I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file development fund 2023 A candidate for the Kitodo e.V. development fund.
Development

No branches or pull requests

5 participants