Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/fsspec filecache #45

Closed
wants to merge 78 commits into from
Closed
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
a2ccfe6
Add fsspec caching
abarciauskas-bgse Oct 20, 2023
4b815d2
Add deploy
abarciauskas-bgse Oct 20, 2023
154fada
Update ci.yml
abarciauskas-bgse Oct 20, 2023
b435c2d
Change VPC name
abarciauskas-bgse Oct 20, 2023
3857b33
Reduce package size
abarciauskas-bgse Oct 20, 2023
8eae5d3
Install cftime
abarciauskas-bgse Oct 20, 2023
ed80d6e
remove cramjam
abarciauskas-bgse Oct 20, 2023
5ec1700
Fix test cache setup
abarciauskas-bgse Oct 20, 2023
170f838
Remove cramjam dependency
abarciauskas-bgse Oct 20, 2023
484fa72
Remove compression middleware
abarciauskas-bgse Oct 20, 2023
b152f25
Fix conditional
abarciauskas-bgse Oct 20, 2023
f0dc349
Fix some errors
abarciauskas-bgse Oct 21, 2023
1b73eef
Add clear cache route
abarciauskas-bgse Oct 22, 2023
01739ae
Fix clear cache route
abarciauskas-bgse Oct 22, 2023
cc1b137
Fix s3fs when caching is disabled
abarciauskas-bgse Oct 23, 2023
d42231a
Cleanup code
abarciauskas-bgse Oct 25, 2023
128466b
Add docstrings
abarciauskas-bgse Oct 25, 2023
b8e123c
Install diskcache
abarciauskas-bgse Oct 25, 2023
7bf8f70
Install h5py
abarciauskas-bgse Oct 25, 2023
4168551
Fix diskcache implementation
abarciauskas-bgse Oct 25, 2023
e45544b
Fix linting
abarciauskas-bgse Oct 25, 2023
7fd9433
Fix conditional
abarciauskas-bgse Oct 25, 2023
c69117a
Fix protocol
abarciauskas-bgse Oct 26, 2023
527a04f
Add transposition to dims and nc4 support
abarciauskas-bgse Oct 26, 2023
f81c006
Fix transpose code
abarciauskas-bgse Oct 26, 2023
7dd884a
Fix merge conflicts
abarciauskas-bgse Nov 2, 2023
e41f95e
Working netcdf + tests
abarciauskas-bgse Nov 3, 2023
e11dceb
Remove VPC
abarciauskas-bgse Nov 3, 2023
acb2fc4
Add back VPC
abarciauskas-bgse Nov 3, 2023
2775f77
Add drop dim to histogram
abarciauskas-bgse Nov 3, 2023
c7a238a
Add drop dim to tilejson reader
abarciauskas-bgse Nov 3, 2023
4a6615e
Add option to open files from s3
abarciauskas-bgse Nov 3, 2023
ff07016
Add chunking
abarciauskas-bgse Nov 4, 2023
276ff0a
Remove h5pyd
abarciauskas-bgse Nov 6, 2023
9e373f4
Fix pre-commit
abarciauskas-bgse Nov 6, 2023
2b4398a
Remove dask
abarciauskas-bgse Nov 6, 2023
8daa7cb
Add dask, remove some of its deps
abarciauskas-bgse Nov 6, 2023
79ab833
Remove some more libs
abarciauskas-bgse Nov 6, 2023
0cbe84c
Add back click
abarciauskas-bgse Nov 6, 2023
314cb81
Add back pytz
abarciauskas-bgse Nov 6, 2023
fc3059c
Use EFS for libs
abarciauskas-bgse Nov 6, 2023
de2a660
Use efs only for numpy
abarciauskas-bgse Nov 6, 2023
14dff77
Run mkdir
abarciauskas-bgse Nov 6, 2023
e200df9
Run mkdir with -p
abarciauskas-bgse Nov 6, 2023
8185b09
Use sys path append
abarciauskas-bgse Nov 6, 2023
7d61aa7
Move path append to main.py
abarciauskas-bgse Nov 6, 2023
b087486
Try another append method
abarciauskas-bgse Nov 6, 2023
b217e94
Try sys path append
abarciauskas-bgse Nov 6, 2023
5994173
print directories
abarciauskas-bgse Nov 6, 2023
5e4a509
Remove copying numpy to EFS
abarciauskas-bgse Nov 7, 2023
1e48f03
Install some packages with no binary
abarciauskas-bgse Nov 7, 2023
35d257f
Use no-binary for all
abarciauskas-bgse Nov 7, 2023
2c95a2d
No binary for only numpy and pydantic
abarciauskas-bgse Nov 7, 2023
cb81889
Revert changes
abarciauskas-bgse Nov 7, 2023
ee2d32f
Try using blockcache
abarciauskas-bgse Nov 7, 2023
83378e1
Remove dask
abarciauskas-bgse Nov 7, 2023
1c44089
Try dask again
abarciauskas-bgse Nov 7, 2023
814f881
Maybe fix caching
abarciauskas-bgse Nov 8, 2023
2f62c7d
Linting
abarciauskas-bgse Nov 8, 2023
f244a34
Remove diskcache library
abarciauskas-bgse Nov 9, 2023
8c13790
Add dask
abarciauskas-bgse Nov 9, 2023
9547a08
Add chunks
abarciauskas-bgse Nov 9, 2023
607cae2
Add file and test to check failure
abarciauskas-bgse Nov 9, 2023
ed08474
Use da.load()
abarciauskas-bgse Nov 9, 2023
07e0d8f
Run linting
abarciauskas-bgse Nov 9, 2023
dc934b4
Different cache directories
abarciauskas-bgse Nov 9, 2023
d121b75
Fix formatting
abarciauskas-bgse Nov 9, 2023
8d66605
Modify fsspec efs directory
abarciauskas-bgse Nov 9, 2023
a30ee64
Some changes
abarciauskas-bgse Nov 10, 2023
e297d39
Don't allow for transposed coordinates
abarciauskas-bgse Nov 11, 2023
0967b83
Fix linting
abarciauskas-bgse Nov 11, 2023
7e11748
Use filecache everywhere
abarciauskas-bgse Nov 11, 2023
8c00dcd
Fix linting
abarciauskas-bgse Nov 11, 2023
928d770
Use blockcache
abarciauskas-bgse Nov 11, 2023
9089e4a
Use filecache for references
abarciauskas-bgse Nov 12, 2023
247f323
Make dir exists ok
abarciauskas-bgse Nov 13, 2023
1ebdfda
Move remote test to another file
abarciauskas-bgse Nov 13, 2023
218ef69
Add remote test
abarciauskas-bgse Nov 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
branches:
- main
- dev
- feat/fsspec-filecache
tags:
- 'v*'
paths:
Expand Down Expand Up @@ -48,13 +49,20 @@ jobs:
python -m pip install pre-commit
pre-commit run --all-files

- name: run tests without cache
if: ${{ matrix.python-version == env.LATEST_PY_VERSION }}
env:
TITILER_XARRAY_ENABLE_FSSPEC_CACHE: FALSE
run: python -m pytest --cov titiler.xarray --cov-report term-missing -s -vv

- name: Run tests
run: python -m pytest --cov titiler.xarray --cov-report term-missing -s -vv

deploy:
needs: [tests]
#needs: [tests]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev' || startsWith(github.ref, 'refs/tags/v')
#if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev' || startsWith(github.ref, 'refs/tags/v')
if: github.ref == 'refs/heads/feat/fsspec-filecache'

defaults:
run:
Expand Down Expand Up @@ -88,6 +96,16 @@ jobs:
python -m pip install --upgrade pip
python -m pip install -r requirements-cdk.txt

# Build and deploy to the feature environment whenever there is a push to main or dev
- name: Build & Deploy Feature2
if: github.ref == 'refs/heads/feat/fsspec-filecache'
run: npm run cdk -- deploy titiler-xarray-feature2 --require-approval never
env:
TITILER_XARRAY_PYTHONWARNINGS: ignore
TITILER_XARRAY_DEBUG: True
STACK_ALARM_EMAIL: ${{ secrets.ALARM_EMAIL }}
STACK_STAGE: feature2

# Build and deploy to the development environment whenever there is a push to main or dev
- name: Build & Deploy Development
if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/dev'
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,6 @@ cdk.out/
node_modules
cdk.context.json
*.nc

*cache*
*logs.txt
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ virtualenv .venv

python -m pip install -e . uvicorn
source .venv/bin/activate
uvicorn titiler.xarray.main:app --reload
FSSPEC_CACHE_DIRECTORY="fsspec_cache" uvicorn titiler.xarray.main:app --reload
```

To access the docs, visit http://127.0.0.1:8000/docs.
Expand Down
56 changes: 56 additions & 0 deletions infrastructure/aws/cdk/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@
import os
from typing import Any, Dict, List, Optional

import aws_cdk
from aws_cdk import App, CfnOutput, Duration, Stack, Tags
from aws_cdk import aws_apigatewayv2_alpha as apigw
from aws_cdk import aws_cloudwatch as cloudwatch
from aws_cdk import aws_cloudwatch_actions as cloudwatch_actions
from aws_cdk import aws_ec2 as ec2
from aws_cdk import aws_efs as efs
from aws_cdk import aws_iam as iam
from aws_cdk import aws_lambda
from aws_cdk import aws_logs as logs
Expand Down Expand Up @@ -54,9 +57,51 @@ def __init__(
permissions = permissions or []
environment = environment or {}

vpc = ec2.Vpc(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove this VPC

self,
"titiler-xarray-vpc",
max_azs=2, # Default is all AZs in the region
nat_gateways=1,
cidr="10.0.0.0/16",
# Define custom CIDR range for each subnet type
subnet_configuration=[
ec2.SubnetConfiguration(
name="Public", subnet_type=ec2.SubnetType.PUBLIC, cidr_mask=24
),
ec2.SubnetConfiguration(
name="Private",
subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT,
cidr_mask=24,
),
],
)

# Create and attach a file system
file_system = efs.FileSystem(
self,
"EfsFileSystem",
vpc=vpc,
lifecycle_policy=efs.LifecyclePolicy.AFTER_7_DAYS, # Or choose another policy
performance_mode=efs.PerformanceMode.GENERAL_PURPOSE,
)

access_point = file_system.add_access_point(
"AccessPoint",
path="/export/lambda",
create_acl={"owner_uid": "1001", "owner_gid": "1001", "permissions": "750"},
posix_user={
"uid": "1001",
"gid": "1001",
},
)

lambda_function = aws_lambda.Function(
self,
f"{id}-lambda",
vpc=vpc,
vpc_subnets=ec2.SubnetSelection(
subnet_type=ec2.SubnetType.PRIVATE_WITH_NAT
),
runtime=runtime,
code=aws_lambda.Code.from_docker_build(
path=os.path.abspath(context_dir),
Expand All @@ -69,6 +114,17 @@ def __init__(
timeout=Duration.seconds(timeout),
environment={**DEFAULT_ENV, **environment},
log_retention=logs.RetentionDays.ONE_WEEK,
ephemeral_storage_size=aws_cdk.Size.gibibytes(10),
filesystem=aws_lambda.FileSystem.from_efs_access_point(
access_point, "/mnt/efs"
), # Mounting it to /mnt/efs in Lambda
)

file_system.connections.allow_default_port_from(lambda_function)
file_system.grant(
lambda_function,
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientWrite",
)

for perm in permissions:
Expand Down
5 changes: 3 additions & 2 deletions infrastructure/aws/lambda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,16 @@ COPY titiler/ titiler/
# we have to force using old package version that seems `almost` compatible with Lambda env botocore
# https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
RUN pip install --upgrade pip
RUN pip install . "rio-tiler>=5.0.0" "cftime" "mangum>=0.10.0" "pandas==1.5.3" "botocore==1.29.76" "aiobotocore==2.5.0" "s3fs==2023.4.0" "fsspec==2023.4.0" "zarr==2.14.2" "xarray==0.19.0" -t /asset --no-binary pydantic
RUN pip install . "mangum>=0.10.0" "botocore==1.29.76" "aiobotocore==2.5.0" -t /asset --no-binary pydantic

# Reduce package size and remove useless files
RUN cd /asset && find . -type f -name '*.pyc' | while read f; do n=$(echo $f | sed 's/__pycache__\///' | sed 's/.cpython-[0-9]*//'); cp $f $n; done;
RUN cd /asset && find . -type d -a -name '__pycache__' -print0 | xargs -0 rm -rf
RUN cd /asset && find . -type f -a -name '*.py' -print0 | xargs -0 rm -f
RUN find /asset -type d -a -name 'tests' -print0 | xargs -0 rm -rf
RUN rm -rdf /asset/numpy/doc/ /asset/bin /asset/geos_license /asset/Misc
RUN rm -rdf /asset/boto3* /asset/botocore*
RUN rm -rdf /asset/boto3*
RUN rm -rdf /asset/botocore*

COPY infrastructure/aws/lambda/handler.py /asset/handler.py

Expand Down
9 changes: 6 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,18 +25,21 @@ classifiers = [
]
dynamic = ["version"]
dependencies = [
"cftime",
"diskcache",
"h5netcdf",
"h5pyd",
"xarray",
"rioxarray",
"zarr",
"fsspec",
"s3fs",
"requests",
"aiohttp",
"requests",
"pydantic==2.0.2",
"titiler.core>=0.14.1,<0.15",
"starlette-cramjam>=0.3,<0.4",
"pydantic-settings~=2.0"
"pydantic-settings~=2.0",
"pandas==1.5.3",
]

[project.optional-dependencies]
Expand Down
16 changes: 16 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
"""titiler.xarray tests configuration."""

import os
import shutil

import pytest
from fastapi.testclient import TestClient

Expand All @@ -13,3 +16,16 @@ def app(monkeypatch):

with TestClient(app) as client:
yield client


def pytest_sessionstart(session):
"""Setup before tests run."""
test_cache_dir = "fsspec_test_cache"
os.environ["TITILER_XARRAY_FSSPEC_CACHE_DIRECTORY"] = test_cache_dir
os.mkdir(test_cache_dir)


def pytest_sessionfinish(session, exitstatus):
"""Cleanup step after all tests have been run."""
print("\nAll tests are done! Cleaning up...")
shutil.rmtree(os.environ["TITILER_XARRAY_FSSPEC_CACHE_DIRECTORY"])
13 changes: 13 additions & 0 deletions tests/test_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
test_zarr_store = os.path.join(DATA_DIR, "test_zarr_store.zarr")
test_reference_store = os.path.join(DATA_DIR, "reference.json")
test_netcdf_store = os.path.join(DATA_DIR, "testfile.nc")
test_remote_netcdf_store = "https://nex-gddp-cmip6.s3-us-west-2.amazonaws.com/NEX-GDDP-CMIP6/GISS-E2-1-G/historical/r1i1p1f2/pr/pr_day_GISS-E2-1-G_historical_r1i1p1f2_gn_1950.nc"
test_unconsolidated_store = os.path.join(DATA_DIR, "unconsolidated.zarr")
test_pyramid_store = os.path.join(DATA_DIR, "pyramid.zarr")

Expand All @@ -28,6 +29,14 @@
"params": {"url": test_netcdf_store, "variable": "data", "decode_times": False},
"variables": ["data"],
}
test_remote_netcdf_store_params = {
"params": {
"url": test_remote_netcdf_store,
"variable": "pr",
"decode_times": False,
},
"variables": ["pr"],
}
test_unconsolidated_store_params = {
"params": {
"url": test_unconsolidated_store,
Expand Down Expand Up @@ -73,6 +82,10 @@ def test_get_variables_netcdf(app):
return get_variables_test(app, test_netcdf_store_params)


def test_get_variables_remote_netcdf(app):
return get_variables_test(app, test_remote_netcdf_store_params)


def test_get_variables_unconsolidated(app):
return get_variables_test(app, test_unconsolidated_store_params)

Expand Down
8 changes: 8 additions & 0 deletions titiler/xarray/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,13 @@ def tiles_endpoint( # type: ignore
description="Whether to expect and open zarr store with consolidated metadata",
),
] = True,
anon: Annotated[
Optional[bool],
Query(
title="anon",
description="Use credentials to access data when false.",
),
] = True,
) -> Response:
"""Create map tile from a dataset."""
tms = self.supported_tms.get(tileMatrixSetId)
Expand All @@ -242,6 +249,7 @@ def tiles_endpoint( # type: ignore
time_slice=time_slice,
tms=tms,
consolidated=consolidated,
anon=anon,
) as src_dst:

image = src_dst.tile(
Expand Down
47 changes: 32 additions & 15 deletions titiler/xarray/main.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
"""titiler app."""

import logging
import os
import shutil

import rioxarray
import xarray
import zarr
from fastapi import FastAPI
from starlette import status
from starlette.middleware.cors import CORSMiddleware
from starlette_cramjam.middleware import CompressionMiddleware

import titiler.xarray.reader as reader
from titiler.core.errors import DEFAULT_STATUS_CODES, add_exception_handlers
from titiler.core.factory import AlgorithmFactory, TMSFactory
from titiler.core.middleware import (
Expand Down Expand Up @@ -67,18 +68,6 @@
allow_headers=["*"],
)

app.add_middleware(
CompressionMiddleware,
minimum_size=0,
exclude_mediatype={
"image/jpeg",
"image/jpg",
"image/png",
"image/jp2",
"image/webp",
},
)

app.add_middleware(
CacheControlMiddleware,
cachecontrol=api_settings.cachecontrol,
Expand All @@ -91,7 +80,7 @@
app.add_middleware(
ServerTimingMiddleware,
calls_to_track={
"1-xarray-open_dataset": (xarray.open_dataset,),
"1-xarray-open_dataset": (reader.xarray_open_dataset,),
"2-rioxarray-reproject": (rioxarray.raster_array.RasterArray.reproject,),
},
)
Expand All @@ -107,3 +96,31 @@
def ping():
"""Health check."""
return {"ping": "pong!"}


@app.get(
"/clear_cache",
description="Clear Cache",
summary="Clear Cache.",
operation_id="clear cache",
tags=["Clear Cache"],
)
def clear_cache():
"""
Clear the cache.
"""
print("Clearing the cache...")
cache_dir = os.path.expanduser(api_settings.fsspec_cache_directory)
if os.path.exists(cache_dir):
# Iterate over each directory and file in the root of the EFS
for root_dir, dirs, files in os.walk(cache_dir, topdown=False):
for name in files:
file_path = os.path.join(root_dir, name)
os.remove(file_path)
print(f"Deleted file: {file_path}")

for name in dirs:
dir_path = os.path.join(root_dir, name)
shutil.rmtree(dir_path)
print(f"Deleted directory: {dir_path}")
return {"message": "cache cleared"}
Loading
Loading