Skip to content

Commit

Permalink
update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
nicokant committed Dec 23, 2024
1 parent f4f27aa commit 2d620c9
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 42 deletions.
1 change: 0 additions & 1 deletion docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Summary

- [Setup the script](./setup.md)
- [Query the data](./query.md)

18 changes: 0 additions & 18 deletions docs/src/query.md

This file was deleted.

28 changes: 5 additions & 23 deletions docs/src/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,39 +13,21 @@
### How to use
```bash
nix-shell
make param1=value
```

#### Available parameters
from `.env`:
- `PG_HOST` hostname for postgresql connection
- `PG_USER` username for postgresql connection
- `ELEVATION_MODEL` path to the raster that contains the elevation model

from command line:
- `YEAR` year to process
- `MONTH` month to process
- `PG_DBNAME` database to connect to


## Tips
It's possible to use GDAL to merge on the fly different raster datasets
```bash
gdalbuildvrt dem.vrt /path/to/dir/*.tif
task -a
```

### How does it work?
The software relies on different technologies to efficiently work, in particular to overcome issues of scalability the procedure works by partitioning the dataset on the fly and then parallelizing the operations.
- Nix, for creating a reproducible environment
- Makefile, for describing a pipeline
- Taskfile, for describing a pipeline
- GNU Parallel, for running tasks on partitions in parallel
- GDAL, for extracting data from postgres and for efficiently compute the pixel value of the DEM
- DuckDB, for ultra efficient data operations in-memory
- GDAL, for efficiently compute the pixel value of the DEM
- DuckDB, for efficient data operations in-memory, postgres extraction
- Apache Parquet format, for storing intermediate and final results


A short description of the procedure follows:
- convert the `track` table in postgis (containing linestrings) to a local parquet file with GDAL
- convert the `track` table in postgis (containing linestrings) to a local parquet file with DuckDB
- chunk the parquet in partitions of N elements in-memory
- using GNU Parallel a DuckDB query is run on each chunk, the query produces a row for each point in the linestring in the chunk and outputs to a parquet file
- each chunk is then sent to GDAL and the pixel value of the raster at the coordinates of each point is computed in parallel
Expand Down

0 comments on commit 2d620c9

Please sign in to comment.