-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add resilience to remote index being down during list files #1418
Comments
@nathanlcarlson is working on a change for these File queries to go to the originating esg-search with distrib=false. |
@alaniwi Yea, it also seems this same pattern is used for wget script retrieval As I understand it, there should not be a need for CoG to ever query a remote esg-search API. |
Hi, the reason why I was suggesting it only as a fallback is as follows.
Here is a possible alternative suggestion that could be used to mitigate this if you prefer never to query a remote search API. In the local For example, where it currently has:
(or in our case something like it could be changed to something like:
then
where the hostname of the master would be determined as follows (in practice the following rules would be used to build a mapping dictionary or similar during service initialisation, which is subsequently used for any lookups):
Then CoG could be changed to add the
If there is a version of esg-search that contains this feature, then based on the above configuration, this would then cause esg-search to query What you you think? (If you think it would work, then maybe the thing to do would be to use this issue for the CoG changes but paste the esg-search related changes into a separate issue at https://github.com/ESGF/esg-search/issues .) |
Hello @alaniwi , I agree with the need to improve this situation.
Making any of the above, large-scale, changes would be difficult though. Whereas your suggestions are more realistic. |
Thanks for your response. In my proposed solution, CoG does not really need to be aware of implementation, beyond the fact it you should not use The one addition which will need to be made from the point of view of CoG is to add the |
@nathanlcarlson I have update my long comment above, as regards the hostname of the master shard. See "where the hostname of the master would be determined..." and the following bulleted list. |
Hi @alaniwi, we opted to do the following: #1420 Certainly, there could be a parameter that clients specify to make the query more efficient by not requiring a search across the entire collection ( |
My hypothesis is that doing a "distrib=true" for file queries would add negligible load to an index server. The shards should very quickly determine that the files aren't present and return 0 results. |
When "list files" is clicked, it seems that the value of
index_node
from the dataset metadata is checked, and this is used to send a query to that index, with the following format:https://....../esg-search/search?type=File&dataset_id=......&format=application%2Fsolr%2Bjson&offset=0&limit=10&distrib=false
If a dataset record originates from a remote Solr shard, then this can fail if the remote index is down, even though a local replica for that shard may be available. That is to say, the local replica contains both the
datasets
andfiles
cores but CoG does not attempt to utilise the locally heldfiles
info.How about changing it so that in the event of an unsuccessful response from the remote index (e.g. a 500 or a timeout), it falls back to trying the same search on the local index node? This fallback search would need to omit the
distrib=false
, making it more expensive because other unrelated shards are also queried, which is why I am suggesting it only as a fallback.The text was updated successfully, but these errors were encountered: