Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bigger dtypes in routed models to expand number of supported pixels and implement docker container #1434

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
576f4c4
Adding a 2-phase docker build for testing.
phargogh Oct 19, 2023
495ea32
Dockerfile syntaxerror RE:#1431
phargogh Oct 19, 2023
8cac295
Installing apt packages with -y. RE:#1431
phargogh Oct 19, 2023
b9d38db
Allowing invest repo and build ref to be provided as build args. RE:#…
phargogh Oct 19, 2023
c9d72de
Trying to quote the strings instead to try to get default arg values …
phargogh Oct 19, 2023
3195875
Build contexts are not shared between images. RE:#1431
phargogh Oct 19, 2023
44d5b1e
Adding venv, making dirname depend on repo. RE:#1431
phargogh Oct 19, 2023
55cb3d3
Updating requirements and removing wheel after build. RE:#1431
phargogh Oct 19, 2023
a90a795
Adding a docker container workflow building job.
phargogh Oct 19, 2023
efa1df2
Creating the environment file. RE:#1431
phargogh Oct 19, 2023
6bfd67f
Removing the branch restriction for building the container. RE:#1431
phargogh Oct 19, 2023
bb64267
Sanitizing the github ref. RE:#1431
phargogh Oct 19, 2023
a3f951f
Using sed for substring replacement instead of bash builtings. GHA d…
phargogh Oct 19, 2023
77be2d4
Not removing the wheel, not writeable. RE:#1431
phargogh Oct 19, 2023
beea25d
Adding requests to requirements. RE:#1431
phargogh Oct 19, 2023
781789d
Adding requirements to the list of triggering files. RE:#1431
phargogh Oct 19, 2023
5f5817e
Adding note about the docker container to HISTORY. RE:#1431
phargogh Oct 19, 2023
c181d28
Building latest pygeoprocessing and installing into container. RE:#1431
phargogh Oct 19, 2023
f6ea5ec
Passing invest repo and version to the build image. RE:#1431
phargogh Oct 19, 2023
eab9919
Sediment deposition now uses more memory, but can handle larger indexes.
phargogh Oct 19, 2023
2d6cc1e
Noting change in HISTORY for SDR. RE:#1431
phargogh Oct 19, 2023
0d647f3
Improving the number of pixels that can be included in the analysis.
phargogh Oct 19, 2023
6acc0a7
Upping stack/heap memory usage for indexes in seasonal water yield's …
phargogh Oct 19, 2023
db1a05e
Using the latest pygeoprocessing commit on main.
phargogh Oct 19, 2023
c4b9e0f
Merge branch 'main' into bugfix/1431-sdr-and-probably-other-routed-mo…
phargogh Oct 21, 2023
88d409e
Merge branch 'main' of https://github.com/natcap/invest into bugfix/1…
phargogh Oct 23, 2023
ef2b0bf
Merge branch 'bugfix/1431-sdr-and-probably-other-routed-models-have-i…
phargogh Oct 23, 2023
274f669
Monitoring a few other paths for rebuilding docker image.
phargogh Oct 24, 2023
e6f850c
Replacing native pygeoprocessing builds with better yml generation.
phargogh Oct 24, 2023
39ebcb3
Clearing pip cache during the build.
phargogh Oct 24, 2023
65fd138
Rebuilding container if conda yml script changes. RE#1431
phargogh Oct 24, 2023
4ca297d
Merge branch 'main' into bugfix/1431-sdr-and-probably-other-routed-mo…
phargogh Oct 24, 2023
f7119e8
Adding a compiler when we are building from source.
phargogh Oct 24, 2023
ce7c59d
Removing cxx-compiler and git after build completes. RE#1431
phargogh Oct 24, 2023
544e647
Merge branch 'main' of https://github.com/natcap/invest into bugfix/1…
phargogh Oct 24, 2023
1da2138
Merge branch 'bugfix/1431-sdr-and-probably-other-routed-models-have-i…
phargogh Oct 24, 2023
4800f4a
Specifying the base prefix. RE:#1431
phargogh Oct 24, 2023
8bfd8d0
Attempting to address a build warning about setuptools_scm/setuptools…
phargogh Oct 24, 2023
4c82149
Allowing git lfs install to fail.
phargogh Oct 24, 2023
8301c75
Restoring pygeoprocessing 2.4.0 requirement.
phargogh Oct 24, 2023
76c84ba
Explicitly prohibiting uploading to the registry when part of a PR, j…
phargogh Oct 24, 2023
ab8400f
Removing the pip constraint on pygeoprocessing. RE:#1431
phargogh Oct 24, 2023
89d840a
Adding a compiler if on linux or it's needed.
phargogh Oct 24, 2023
d9aeced
Fixing history message heading. RE:#1431
phargogh Oct 25, 2023
cd492aa
Removing failure-case handling for git lfs install. RE:#1431
phargogh Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .github/workflows/build-docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Build containers

on:
workflow_dispatch:
push:
paths:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paths here is a filter indicating we only run when a file in one of these paths has changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. So the idea here was to only rebuild the container if there's something that will change the output of a model, like the source code, requirements, etc.

- 'docker/*'
- '.github/workflows/build-docker.yml'
- 'src/**/*'
- 'requirements*.txt'
- 'pyproject.toml'
- 'setup.py'
- 'scripts/convert-requirements-to-conda-yml.py'

concurrency:
# make sure only one run of this workflow for a given PR or a given branch
# can happen at one time. previous queued or started runs will be cancelled.
# github.workflow is the workflow name
# github.ref is the ref that triggered the workflow run
# on push, this is refs/heads/<branch name>
# on pull request, this is refs/pull/<pull request number>/merge
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

defaults:
run:
shell: bash -l {0}

env:
# Customize this name if needed.
# The repo name is a very reasonable default!
CONTAINER_NAME: invest

jobs:
build:
name: Build containers
runs-on: ubuntu-latest
permissions:
packages: write
steps:
- uses: actions/checkout@v3

- name: login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Create environment file
run: |
python scripts/convert-requirements-to-conda-yml.py requirements.txt > docker/environment.yml

- name: Build docker
run: |
# Replace / (invalid tag character) with .
SANITIZED_REF="$(echo ${{github.ref_name}} | sed 's|/|.|g')"
cd docker && docker build \
--build-arg="INVEST_REPO=${{ github.repository }}" \
--build-arg="INVEST_VERSION=${{ github.sha }}" \
-t ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}:latest \
-t ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}:${{ github.sha }} \
-t ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}:${SANITIZED_REF} \
.

- name: Test that GDAL and InVEST import
run: |
docker run --rm ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}:latest python -c "from osgeo import gdal"
docker run --rm ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}:latest python -m natcap.invest --version

- name: Push docker
if: github.event_name != 'pull_request'
run: docker image push --all-tags ghcr.io/${{ github.repository_owner }}/${{ env.CONTAINER_NAME }}
14 changes: 14 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ Unreleased Changes
have been replaced with ``numpy.prod``.
https://github.com/natcap/invest/issues/1410
* Add support for python 3.11 (`#1103 <https://github.com/natcap/invest/issues/1103>`_)
* Adding a docker container that is built on each commit where a change to
model code, requirements, or the docker configuration has been made.
https://github.com/natcap/invest/issues/1115
* Vector geometry types will now be validated for all models
(`#1374 <https://github.com/natcap/invest/issues/1374>`_)
* Datastack archives will now be correctly extracted
Expand All @@ -53,9 +56,20 @@ Unreleased Changes
* Fixed a task dependency issue where NDR would crash because of a race
condition when run with ``n_workers > 0``.
https://github.com/natcap/invest/issues/1426
* Fixed an issue in NDR's effective retention where, on rasters with more
than 2^31 pixels, the model would crash with an error relating to a
negative (overflowed) index. https://github.com/natcap/invest/issues/1431
* SDR
* RKLS, USLE, avoided erosion, and avoided export rasters will now have
nodata in streams (`#1415 <https://github.com/natcap/invest/issues/1415>`_)
* Fixed an issue in SDR's sediment deposition where, on rasters with more
than 2^31 pixels, the model would crash with an error relating to a
negative (overflowed) index. https://github.com/natcap/invest/issues/1431
* Seasonal Water Yield
* Fixed an issue in Seasonal Water Yield's baseflow routing and local
dcdenu4 marked this conversation as resolved.
Show resolved Hide resolved
recharge functions where, on rasters with more than 2^31 pixels, the
model would crash with an error relating to a negative (overflowed)
index. https://github.com/natcap/invest/issues/1431
* Urban Cooling
* Fixed a bug where model would error out if any feature in the buildings
vector was missing a geometry; now they will be skipped
Expand Down
1 change: 1 addition & 0 deletions docker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
environment.yml
26 changes: 26 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# syntax=docker/dockerfile:1

# Build the InVEST wheel in a separate container stage
FROM debian:12.2 as build
ARG INVEST_VERSION="main"
ARG INVEST_REPO="natcap/invest"
Comment on lines +5 to +6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, we're updating these placeholders from the GHA workflow call.

RUN apt update && apt install -y python3 python3-dev python3-pip python3-build build-essential git python3.11-venv
RUN cd / && \
git clone https://github.com/${INVEST_REPO}.git && \
cd $(basename ${INVEST_REPO}) && \
git checkout ${INVEST_VERSION} && \
python3 -m build

# Create the container for distribution that has runtime dependencies.
FROM mambaorg/micromamba:1.5.0-bookworm-slim
COPY --from=build /invest/dist/*.whl /tmp/

# The environment.yml file will be built during github actions.
COPY --chown=$MAMBA_USER:$MAMBA_USER environment.yml /tmp/environment.yml
RUN micromamba install -y -n base -c conda-forge -f /tmp/environment.yml && \
micromamba clean --all --yes && \
/opt/conda/bin/python -m pip install /tmp/*.whl && \
/opt/conda/bin/python -m pip cache purge && \
micromamba remove -y -n base cxx-compiler git

ENTRYPOINT ["/usr/local/bin/_entrypoint.sh"]
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ invest = "natcap.invest.cli:main"
# that we can provide a much easier build experience so long as GDAL is
# available at runtime.
requires = [
'setuptools>=45', 'wheel', 'setuptools_scm>=6.4.0', 'cython', 'babel',
'setuptools>=61', 'wheel', 'setuptools_scm>=8.0', 'cython', 'babel',
'oldest-supported-numpy'
]
build-backend = "setuptools.build_meta"
Expand Down
5 changes: 3 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,12 @@ numpy>=1.11.0,!=1.16.0
Rtree>=0.8.2,!=0.9.1
shapely>=2.0.0
scipy>=1.9.0
pygeoprocessing==2.4.0 # pip-only
taskgraph[niced_processes]>=0.11.0 # pip-only
pygeoprocessing==2.4.0
taskgraph>=0.11.0
psutil>=5.6.6
chardet>=3.0.4
pint
Babel
Flask
flask_cors
requests
36 changes: 33 additions & 3 deletions scripts/convert-requirements-to-conda-yml.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# encoding=UTF-8
"""convert-requirements-to-conda-yml.py"""

import sys
import argparse
import platform
import sys

YML_TEMPLATE = """channels:
- conda-forge
Expand Down Expand Up @@ -46,8 +47,7 @@ def build_environment_from_requirements(cli_args):
requirements_files = args.req

pip_requirements = set()
# conda likes it when you list pip if you're using pip.
conda_requirements = set(['pip'])
conda_requirements = set()
for requirement_file in requirements_files:
with open(requirement_file) as file:
for line in file:
Expand All @@ -58,7 +58,37 @@ def build_environment_from_requirements(cli_args):
continue

if line.endswith('# pip-only'):
# Conda prefers that we explicitly include pip as a
# requirement if we're using pip.
conda_requirements.add('pip')

pip_requirements.add(line)

# If an scm needs to be installed for pip to clone to a
# revision, add it to the conda package list.
#
# Bazaar (bzr, which pip supports) is not listed;
# deprecated as of 2016 and not available on conda-forge.
install_compiler = False
for prefix, scm_conda_pkg in [("git+", "git"),
("hg+", "mercurial"),
("svn+", "subversion")]:
if line.startswith(prefix):
conda_requirements.add(scm_conda_pkg)
install_compiler = True
break # The line can only match 1 prefix

# It's less common (like for pygeoprocessing) to have linux
# wheels. Install a compiler if we're on linux, to be
# safe.
if platform.system() == 'Linux' or install_compiler:
# Always make the compiler available
# cxx-compiler works on all OSes.
# NOTE: do not use this as a dependency in a
# conda-forge package recipe!
# https://anaconda.org/conda-forge/cxx-compiler
conda_requirements.add('cxx-compiler')

else:
conda_requirements.add(line)

Expand Down
17 changes: 9 additions & 8 deletions src/natcap/invest/ndr/ndr_core.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -390,11 +390,11 @@ def ndr_eff_calculation(
cdef int *col_offsets = [1, 1, 0, -1, -1, -1, 0, 1]
cdef int *inflow_offsets = [4, 5, 6, 7, 0, 1, 2, 3]

cdef int n_cols, n_rows
cdef long n_cols, n_rows
flow_dir_info = pygeoprocessing.get_raster_info(mfd_flow_direction_path)
n_cols, n_rows = flow_dir_info['raster_size']

cdef stack[int] processing_stack
cdef stack[long] processing_stack
stream_info = pygeoprocessing.get_raster_info(stream_path)
# cell sizes must be square, so no reason to test at this point.
cdef float cell_size = abs(stream_info['pixel_size'][0])
Expand Down Expand Up @@ -427,13 +427,14 @@ def ndr_eff_calculation(
cdef _ManagedRaster to_process_flow_directions_raster = _ManagedRaster(
to_process_flow_directions_path, 1, True)

cdef int col_index, row_index, win_xsize, win_ysize, xoff, yoff
cdef int global_col, global_row
cdef int flat_index, outflow_weight, outflow_weight_sum, flow_dir
cdef int ds_col, ds_row, i
cdef long col_index, row_index, win_xsize, win_ysize, xoff, yoff
cdef long global_col, global_row
cdef unsigned long flat_index
cdef long outflow_weight, outflow_weight_sum, flow_dir
cdef long ds_col, ds_row, i
cdef float current_step_factor, step_size, crit_len
cdef int neighbor_row, neighbor_col, neighbor_outflow_dir
cdef int neighbor_outflow_dir_mask, neighbor_process_flow_dir
cdef long neighbor_row, neighbor_col
cdef int neighbor_outflow_dir, neighbor_outflow_dir_mask, neighbor_process_flow_dir
cdef int outflow_dirs, dir_mask

for offset_dict in pygeoprocessing.iterblocks(
Expand Down
16 changes: 7 additions & 9 deletions src/natcap/invest/sdr/sdr_core.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,6 @@ cdef int *ROW_OFFSETS = [0, -1, -1, -1, 0, 1, 1, 1]
cdef int *COL_OFFSETS = [1, 1, 0, -1, -1, -1, 0, 1]


cdef int is_close(double x, double y):
return abs(x-y) <= (1e-8+1e-05*abs(y))

# this is a least recently used cache written in C++ in an external file,
# exposing here so _ManagedRaster can use it
cdef extern from "LRUCache.h" nogil:
Expand Down Expand Up @@ -435,7 +432,7 @@ def calculate_sediment_deposition(
# is the original pixel `x`
cdef int *inflow_offsets = [4, 5, 6, 7, 0, 1, 2, 3]

cdef int n_cols, n_rows
cdef long n_cols, n_rows
flow_dir_info = pygeoprocessing.get_raster_info(mfd_flow_direction_path)
n_cols, n_rows = flow_dir_info['raster_size']
cdef int mfd_nodata = 0
Expand All @@ -444,11 +441,12 @@ def calculate_sediment_deposition(
sdr_path)['nodata'][0]
cdef float e_prime_nodata = pygeoprocessing.get_raster_info(
e_prime_path)['nodata'][0]
cdef int col_index, row_index, win_xsize, win_ysize, xoff, yoff
cdef int global_col, global_row, flat_index, j, k
cdef int seed_col = 0
cdef int seed_row = 0
cdef int neighbor_row, neighbor_col
cdef long col_index, row_index, win_xsize, win_ysize, xoff, yoff
cdef long global_col, global_row, j, k
cdef unsigned long flat_index
cdef long seed_col = 0
cdef long seed_row = 0
cdef long neighbor_row, neighbor_col
cdef int flow_val, neighbor_flow_val, ds_neighbor_flow_val
cdef int flow_weight, neighbor_flow_weight
cdef float flow_sum, neighbor_flow_sum
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -415,17 +415,20 @@ cpdef calculate_local_recharge(
"""
cdef int i_n, flow_dir_nodata, flow_dir_mfd
cdef int peak_pixel
cdef int xs, ys, xs_root, ys_root, xoff, yoff, flow_dir_s
cdef int xi, yi, xj, yj, flow_dir_j, p_ij_base
cdef int win_xsize, win_ysize, n_dir
cdef int raster_x_size, raster_y_size
cdef long xs, ys, xs_root, ys_root, xoff, yoff
cdef int flow_dir_s
cdef long xi, yi, xj, yj
cdef int flow_dir_j, p_ij_base
cdef long win_xsize, win_ysize
cdef int n_dir
cdef long raster_x_size, raster_y_size
cdef double pet_m, p_m, qf_m, et0_m, aet_i, p_i, qf_i, l_i, l_avail_i
cdef float qf_nodata, kc_nodata

cdef int j_neighbor_end_index, mfd_dir_sum
cdef float mfd_direction_array[8]

cdef queue[pair[int, int]] work_queue
cdef queue[pair[long, long]] work_queue
cdef _ManagedRaster et0_m_raster, qf_m_raster, kc_m_raster

cdef numpy.ndarray[numpy.npy_float32, ndim=1] alpha_month_array = (
Expand Down Expand Up @@ -554,7 +557,7 @@ cpdef calculate_local_recharge(
break
if peak_pixel:
work_queue.push(
pair[int, int](xs_root, ys_root))
pair[long, long](xs_root, ys_root))

while work_queue.size() > 0:
xi = work_queue.front().first
Expand Down Expand Up @@ -685,7 +688,7 @@ cpdef calculate_local_recharge(
if (xi_n < 0 or xi_n >= raster_x_size or
yi_n < 0 or yi_n >= raster_y_size):
continue
work_queue.push(pair[int, int](xi_n, yi_n))
work_queue.push(pair[long, long](xi_n, yi_n))


def route_baseflow_sum(
Expand Down Expand Up @@ -717,13 +720,14 @@ def route_baseflow_sum(
cdef float target_nodata = -1e32
cdef int stream_val, outlet
cdef float b_i, b_sum_i, l_j, l_avail_j, l_sum_j
cdef int xi, yi, xj, yj, flow_dir_i, p_ij_base
cdef long xi, yi, xj, yj
cdef int flow_dir_i, p_ij_base
cdef int mfd_dir_sum, flow_dir_nodata
cdef int raster_x_size, raster_y_size, xs_root, ys_root, xoff, yoff
cdef long raster_x_size, raster_y_size, xs_root, ys_root, xoff, yoff
cdef int n_dir
cdef int xs, ys, flow_dir_s, win_xsize, win_ysize
cdef int stream_nodata
cdef stack[pair[int, int]] work_stack
cdef stack[pair[long, long]] work_stack

# we know the PyGeoprocessing MFD raster flow dir type is a 32 bit int.
flow_dir_raster_info = pygeoprocessing.get_raster_info(flow_dir_mfd_path)
Expand Down Expand Up @@ -783,7 +787,7 @@ def route_baseflow_sum(
break
if not outlet:
continue
work_stack.push(pair[int, int](xs_root, ys_root))
work_stack.push(pair[long, long](xs_root, ys_root))

while work_stack.size() > 0:
xi = work_stack.top().first
Expand Down Expand Up @@ -871,4 +875,4 @@ def route_baseflow_sum(
if (0xF & (flow_dir_j >> (
4 * FLOW_DIR_REVERSE_DIRECTION[n_dir]))):
# pixel flows here, push on queue
work_stack.push(pair[int, int](xj, yj))
work_stack.push(pair[long, long](xj, yj))