-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deploy docs and add performance benchmark #42
base: develop
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
e4e6813
to
acc239f
Compare
7ea2bb8
to
abe6317
Compare
049c9a7
to
f709c65
Compare
f709c65
to
156f54a
Compare
📚 Documentation preview will be available at: https://developmentseed.github.io/titiler-cmr/pr-previews/pr-42/ Status: ✅ Preview is ready! |
Thanks @hrodmn ! I'm still trying to parse everything that is here but most importantly I want to understand how to interpret the time series benchmarks:
|
03de8c7
to
3fa0b95
Compare
After talking about this with the team a few weeks ago I think it does make sense to be more surgical in the benchmarking tests rather than throwing the massive requests at it when we know there will be an error. The Lambda configuration has a concurrency limit of 998, and since the endpoints start to get unstable beyond 500 time points (due to other resource constraints) I think it makes sense to set a maximum time series length of 500 while we consider how/if we want to make the system more efficient.
I think you are correct that it depends on the chunking scheme of the dataset. All of this happens at the rio-tiler level, here is the moment where a full dataset gets cropped down to the bounding box in the I will do some testing on this but I hope that if there were some spatial chunking the full dataset is not loaded into memory in order to do that crop operation.
I am adding some tests that use the MUR-SST dataset and I still need to drill into the details but the limits are much lower for that dataset so I will try to update guidance for users based on the resolution of the input dataset.
Yes the resolution of the output image has an impact on the reduce step in the map/reduce process. If we are trying to combine many high resolution images in the original time series request I think we are likely to hit memory and/or timeout limits. I need to go to the Lambda logs to figure out exactly what is going on, though. |
e408669
to
e52d5b2
Compare
This PR adds changes for a) deploying the documentation to GH pages, b) adds an API benchmarking process (and report in the docs) for the time series endpoints. After talking about it some more with the team I think we can be more surgical in the benchmark approach (i.e. predict limits based on Lambda configuration parameters), but the exercise was still helpful for understanding when things fall apart and which knobs you can turn to get time series requests that return a response before the Lambda breaks down. There are definitely some modifications or alternative approaches that are worth considering to improve the capabilities of this application!
A preview of the docs for this PR are available here: https://developmentseed.org/titiler-cmr/pr-previews/pr-42/
I'm not sure if I'll keep the PR preview feature alive since it is so easy to deploy the docs locally (
uv sync && uv run mkdocs serve -o
) but I'll leave it up for now so others can take a look at the rendered site.