- AWS s3 tools
- Rasterio from S3 investigations
- Utilities for data visualizations in notebooks
This repository provides a number of small libraries and CLI tools.
Full list of libraries, and install instructions:
odc.ui
tools for data visualization in notebook/labodc.index
extra utils for working with datacube databaseodc.geom
geometry utils and prototypesodc.algo
algorithms (GeoMedian wrapper is here)odc.io
common IO utilities, used by apps mainlyodc.aws
AWS/S3 utilities, used by apps mainlyodc.aio
faster concurrent fetching from S3 with async, used by appsodc.dscache
experimental key-value store wherekey=UUID
,value=Dataset
odc.dtools
tools/experiments in the area of dask.distributed/dask<>datacube integrationodc.ppt
parallel processing helper methods, internal lib
Installation requires using custom package repo https://packages.dea.ga.gov.au
.
pip install --extra-index-url="https://packages.dea.ga.gov.au" \
odc-ui \
odc-index \
odc-geom \
odc-algo \
odc-io \
odc-aws \
odc-aio \
odc-dscache \
odc-dtools
NOTE: on Ubuntu 18.04 default pip
version is awfully old and does not
support --extra-index-url
command line option, so make sure to upgrade pip
first: pip3 install --upgrade pip
.
- For cloud (AWS only)
pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-cloud
- For cloud (GCP, THREDDS and AWS)
pip install --extra-index-url="https://packages.dea.ga.gov.au" 'odc-apps-cloud[GCP,THREDDS]'
- For
dc-index-from-tar
(indexing to datacube from tar archive)pip install --extra-index-url="https://packages.dea.ga.gov.au" odc-apps-dc-tools
NOTE: cloud tools depend on aiobotocore
which has a dependency on a specific
version of botocore
, boto3
also depends on a specific version of botocore
as a result having both aiobotocore
and boto3
in one environment can be a bit
tricky. The easiest way to solve this is to install aiobotocore[awscli,boto3]
before
anything else, which will pull in a compatible version of boto3
and awscli
into the
environment.
pip install -U 'aiobotocore[awscli,boto3]'
s3-find
list S3 bucket with wildcards3-to-tar
fetch documents from S3 and dump them to tar archivegs-to-tar
search GS for documents and dump them to tar archivedc-index-from-tar
read yaml documents from tar archive and add them to datacube
Example:
#!/bin/bash
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'
s3-find "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
Fastest way to list regularly placed files is to use fixed depth listing:
#!/bin/bash
# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'
s3-find --skip-check "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
When using Google Storage:
#!/bin/bash
# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz