Skip to content

Commit

Permalink
Merge pull request #9 from MAAP-Project/gedi-more-filtering
Browse files Browse the repository at this point in the history
  • Loading branch information
chuckwondo authored Jun 1, 2022
2 parents e7f6ce9 + 7e70e54 commit 5fca930
Show file tree
Hide file tree
Showing 9 changed files with 403 additions and 29 deletions.
3 changes: 3 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
default: true
MD024: # no-duplicate-heading/no-duplicate-header
allow_different_nesting: true
15 changes: 13 additions & 2 deletions gedi-subset/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,18 @@ variation of [Semantic Versioning], with the following difference: each version
is prefixed with `gedi-subset-` (e.g., `gedi-subset-0.1.0`) to allow for
distinct lines of versioning of independent work in sibling directories.

## [0.1.0] - 2022-05-26
## [gedi-subset-0.2.0] - 2022-06-01

## Added

- Added inputs `columns` and `query` to refine filtering/subsetting. See
`gedi-subset/README.md` for details.

## Changed

- Improved performance of subsetting/filtering logic, resulting in ~5x speedup.

## [gedi-subset-0.1.0] - 2022-06-01

### Added

Expand All @@ -17,4 +28,4 @@ distinct lines of versioning of independent work in sibling directories.
[Keep a Changelog]:
https://keepachangelog.com/en/1.0.0/
[Semantic Versioning]:
https://semver.org/spec/v2.0.0.html
https://semver.org/spec/v2.0.0.html
27 changes: 23 additions & 4 deletions gedi-subset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,25 @@ At a high level, the GEDI subsetting algorithm does the following:

To run a GEDI subsetting DPS job, you must supply the following inputs:

- `aoi`: URL to a GeoJSON file representing your area of interest
- `aoi` (**required**): URL to a GeoJSON file representing your area of interest
- `columns`: Comma-separated list of column names to include in output file.
(**Default:**
`agbd, agbd_se, l2_quality_flag, l4_quality_flag, sensitivity, sensitivity_a2`)
- `query`: Query expression for subsetting the rows in the output file.
**IMPORTANT:** The `columns` input must contain at least all of the columns
that appear in this query expression, otherwise an error will occur.
(**Default:** `l2_quality_flag == 1 and l4_quality_flag == 1 and sensitivity >
0.95 and sensitivity_a2 > 0.95"`)
- `limit`: Maximum number of GEDI granule data files to download (among those
that intersect the specified AOI)
that intersect the specified AOI). (**Default:** 10,000)

**IMPORTANT:** When supplying input values via the ADE UI, for convenience, to
accept _all_ default values, you may leave _all_ optional inputs blank.
However, if you supply a value for _any_ optional input, you must enter a dash
(`-`) as the input value for _all other_ optional inputs. This ensures that
the input values remain correctly ordered for the underlying script to which the
inputs are supplied. Otherwise, your job may fail due to invalid script
arguments, or might produce unpredictable results.

If your AOI is a publicly available geoBoundary, see
[Getting the GeoJSON URL for a geoBoundary](#getting-the-geojson-url-for-a-geoboundary)
Expand Down Expand Up @@ -233,7 +249,7 @@ able to register the new version of the algorithm, as follows, within the ADE:
1. Pull the latest code from GitHub (to obtain merged PR, if necessary):

```bash
git pull origin
git pull origin main
git checkout main
```

Expand All @@ -242,6 +258,7 @@ able to register the new version of the algorithm, as follows, within the ADE:

```bash
git push --all ade
git push --tags ade
```

1. In the ADE's File Browser, navigate to
Expand All @@ -263,7 +280,9 @@ able to register the new version of the algorithm, as follows, within the ADE:

Country Boundaries from:

Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866. <https://doi.org/10.1371/journal.pone.0231866>
Runfola, D. et al. (2020) geoBoundaries: A global database of political
administrative boundaries. PLoS ONE 15(4): e0231866.
<https://doi.org/10.1371/journal.pone.0231866>

[geoBoundaries]:
https://www.geoboundaries.org
Expand Down
6 changes: 5 additions & 1 deletion gedi-subset/algorithm_config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
description: Subset GEDI L4A granules within an area of interest (AOI)
algo_name: gedi-subset
version: gedi-subset-0.1.0
version: gedi-subset-0.2.0
environment: ubuntu
repository_url: https://repo.ops.maap-project.org/data-team/maap-documentation-examples.git
docker_url: mas.maap-project.org:5000/root/ade-base-images/r:latest
Expand All @@ -11,5 +11,9 @@ disk_space: 20GB
inputs:
- name: aoi
download: True
- name: columns
download: False
- name: query
download: False
- name: limit
download: False
40 changes: 35 additions & 5 deletions gedi-subset/gedi_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
import os.path
import warnings
from typing import Any, Callable, Mapping, Sequence, TypeVar, Union
from typing import Any, Callable, List, Mapping, Sequence, TypeVar, Union

import h5py
import numpy as np
Expand Down Expand Up @@ -67,7 +67,7 @@ def df_assign(col_name: str, val: Any, df: _DF) -> _DF:

@curry
def append_message(extra_message: str, e: Exception) -> Exception:
message, *other_args = e.args if e.args else ("",)
message, *other_args = e.args if e.args else ("",) # pytype: disable=bad-unpacking
new_message = f"{message}: {extra_message}" if message else extra_message
e.args = (new_message, *other_args)

Expand Down Expand Up @@ -177,7 +177,7 @@ def spatial_filter(beam, aoi):

@curry
def subset_h5(
path: Union[str, os.PathLike], aoi: gpd.GeoDataFrame, filter_cols: Sequence[str]
path: Union[str, os.PathLike], aoi: gpd.GeoDataFrame, filter_cols: Sequence[str], expr: str
) -> gpd.GeoDataFrame:
"""
Extract the beam data only for the aoi and only columns of interest
Expand Down Expand Up @@ -226,10 +226,10 @@ def subset_h5(
col_val.append(value[:][indices].tolist())

# create a pandas dataframe
beam_df = pd.DataFrame(map(list, zip(*col_val)), columns=col_names)
beam_df = pd.DataFrame(map(list, zip(*col_val)), columns=col_names).query(expr)
# Inserting BEAM names
beam_df.insert(
0, "BEAM", np.repeat(str(v), len(beam_df.index)).tolist()
0, "BEAM", np.repeat(v[5:], len(beam_df.index)).tolist()
)
# Appending to the subset_df dataframe
subset_df = pd.concat([subset_df, beam_df])
Expand All @@ -250,6 +250,36 @@ def subset_h5(
return subset_gdf


def subset_hdf5(
path: str,
aoi: gpd.GeoDataFrame,
columns: Sequence[str],
expr: str,
) -> gpd.GeoDataFrame:
def subset_beam(beam: h5py.Group) -> gpd.GeoDataFrame:
def append_series(path: str, value: Union[h5py.Group, h5py.Dataset]) -> None:
if (name := path.split("/")[-1]) in columns:
series.append(pd.Series(value, name=name))

series: List[pd.Series] = []
beam.visititems(append_series)
df = pd.concat(series, axis=1).query(expr)
df.insert(0, "BEAM", beam.name[5:])

x, y = df.lon_lowestmode, df.lat_lowestmode
df.drop(["lon_lowestmode", "lat_lowestmode"], axis=1, inplace=True)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(x, y), crs="EPSG:4326")

return gdf[gdf.geometry.within(aoi.geometry[0])]

with h5py.File(path) as hdf5:
beams = (value for key, value in hdf5.items() if key.startswith("BEAM"))
beam_dfs = (subset_beam(beam) for beam in beams)
beams_df = pd.concat(beam_dfs, ignore_index=True, copy=False)

return beams_df


def write_subset(infile, gdf):
"""
Write GeoDataFrame to Flatgeobuf
Expand Down
Loading

0 comments on commit 5fca930

Please sign in to comment.