Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cogify): Update the cogify create cog to support topo raster. BM-1116 #3388

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Wentao-Kuang
Copy link
Contributor

@Wentao-Kuang Wentao-Kuang commented Jan 8, 2025

Motivation

As a Basemaps user, I want to consume the NZTopo 50 & 250 Maps as a tile service.

Modifications

This work defines a new CLI command. The command's purpose is to generate STAC files for an NZTopo Map Series imagery collection. We have designed the command to process the following collections stored in the AWS S3 TopoReleaseArchive directory:

s3://topographic-upload/TopoReleaseArchive/NZTopo50_GeoTif_Gridless/
s3://topographic-upload/TopoReleaseArchive/NZTopo250_GeoTif_Gridless/

There is also a new Argo workflow in development that depends on this work. The workflow's purpose is to automate the standardisation of an NZTopo Map Series imagery collection. The workflow executes this command as a step in its process flow.

CLI Command: cog Updates

New Arguments

  • topo

    Add --topo for cog cli to create cogs for topo raster maps.

    Argument topo
    Type boolean
    Required no
    Options true or false

CLI Command: topo

Arguments

The arguments that can be passed to the command are as follows:

  • Title

    The name/title of the Map Series imagery collection.

    Argument title
    Type string
    Required yes
    Example Raster Topographic Maps 50k
  • Source

    The source directory URL of the Map Series imagery collection.

    Argument source
    Type string
    Required yes
    Example s3://topographic-upload/TopoReleaseArchive/NZTopo50_GeoTif_Gridless/
  • Target

    The target directory URL into which to save the generated directory structure of StacItem and StacCollection files.

    Argument target
    Type string
    Required yes
    Example s3://linz-workflows-scratch/<date>/<hash>/
  • Scale

    The scale of the Map Series imagery collection's map sheets.

    Argument scale
    Type string
    Required yes
    Options topo25, topo50, or topo250
  • Resolution

    The resolution of the Map Series imagery collection's map sheets.

    Argument resolution
    Type string
    Required yes
    Options gridded_600dpi or gridless_600dpi
  • Latest Only

    A flag used to indicate whether all of the generated files should be saved to the target location, or only that of each map sheet's latest version.

    Argument latestOnly
    Type boolean
    Required no
    Options true or false

Process

The command processes a collection as follows:

  1. Loads the collection of GeoTiff images into memory
  2. Extracts the map code, version, and metadata (e.g. EPSG) from each image
  3. Identifies the latest version of each map sheet by code
  4. Groups the images by EPSG
  5. Generates a StacItem file for each image, and a StacCollection file for each EPSG grouping
  6. Structures the groups of StacItem and StacCollection files into a directory tree
  7. Saves the files as they are structured to a target location

Outputs

The command groups the images by EPSG and then structures the generated StacItem and StacCollection files, as illustrated:

Schema Example

The command then saves the generated tree of folders and files into the target location directory.

Limitations

  • Tile Matrices

    For each image in a collection, the command extracts the EPSG of the image and converts it to a runtime Epsg Enum value. This value is then mapped to the EPSG's corresponding Tile Matrix definition. At this time, there is no such Tile Matrix definition for the Chatham Islands EPSG code, 3793. The task will need to be updated once Basemaps supports this definition.

@Wentao-Kuang Wentao-Kuang added the container build pull request container for testing label Jan 8, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 9, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 10, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 10, 2025
@Wentao-Kuang Wentao-Kuang marked this pull request as ready for review January 12, 2025 21:01
@Wentao-Kuang Wentao-Kuang requested a review from a team as a code owner January 12, 2025 21:01
@Wentao-Kuang Wentao-Kuang changed the title feat(cogify): Update the cogify create cog to support topo raster. feat(cogify): Update the cogify create cog to support topo raster. BM-1116 Jan 12, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 13, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 13, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 13, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 27, 2025
@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 27, 2025
*
* @returns an array of StacItem objects
*/
async function loadTiffsToCreateStacs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite a huge function and would be very easy to mix the ordering of args it would be best to use a context object here.


/**
* @param source: Source directory URL from which to load tiff files
* @example TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO? would be best to just remove this comment?


// write stac items into an JSON array
if (args.forceOutput || isArgo()) {
const targetURL = isArgo() ? fsa.toUrl('/tmp/topo-stac-creation/') : args.target;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally its best to not upper case variables targetURL -> targetUrl

As you get into weird screaming words when you join two or more together HTMLURL


// tiles.json makes the tiff files
await fsa.write(new URL('tiles.json', targetURL), JSON.stringify(stacItemPaths, null, 2));
await fsa.write(new URL('brokenTiffs.json', targetURL), JSON.stringify(brokenTiffs, null, 2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do any of the other outputs use camelCasing.json ?


const isPowerOfTwo = (x: number): boolean => (x & (x - 1)) === 0;
const DEFAULT_TRIM_PIXEL_RIGHT = 1.7; // 1.7 pixels to trim from the right side of the topo raster imagery
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we generally use TitleCase here, I wonder why the linter is not picking this up?

if (source.length === 0) throw new Error('No source files given for :' + targetVrt.href);
return {
output: targetVrt,
command: 'gdalbuildvrt',
args: [urlToString(targetVrt), ...source.map(urlToString)],
args: [addalpha ? ['-addalpha'] : undefined, urlToString(targetVrt), ...source.map(urlToString)]
.filter((f) => f != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if flatMap would work here

.flatMap(f => f != null ? String(f): [])


[['foo'], undefined, ['bar'], [undefined, 'baz']].flatMap(f => f != null ? String(f): [])
[ 'foo', 'bar', ',baz' ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works with one value in the nested array, and not working with others with multiple values like [ ['-of', 'vrt'], ]

/**
* The width of a map sheet in pixels.
*/
'source.width': number;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. inside json objects property names can cause some tools to have lots of problems, some tools use it to expect object nesting { source: { width: 100 } } could these be moved or renamed?

We should also be prefixing keys that we are adding eg linz_basemaps:map_code

const code = Epsg.tryGet(epsg);

if (code != null) {
logger?.info({ found: true, method: 'direct' }, 'extractEpsgFromTiff()');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to log which tiff, tiff.source.url.href in these log messages.

}

if (img.valueGeo(TiffTagGeo.GTRasterTypeGeoKey) === RasterTypeKey.PixelIsPoint) {
throw new Error("'Pixel is Point' raster grid spacing is not supported");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a good idea to throw with the tiff location so its easier to track broken tiffs.

try {
const size = tiff.images[0]?.size ?? null;

logger?.info({ found: size }, 'extractSizeFromTiff()');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing booleans and numbers will cause issues with logging.

} from './extract.js';
import { brokenTiffs, ByDirectory, TiffItem } from './types.js';

const slugs: { [key in EpsgCode]?: string } = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const slugs: { [key in EpsgCode]?: string } = {
const slugs: Partial<Record<EpsgCode, string>> = {

would this work?

const source = tiff.source.url;
const { mapCode, version } = extractMapCodeAndVersion(source.href, logger);

const bounds = extractBoundsFromTiff(tiff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if one of these throws its going to break the loop early, we should process all items and collect all errors before exiting.

const latestURL = new URL(`${latest.mapCode}_${latest.version}.json`, allTargetURL);

// add link to all items pointing to the latest version
allStacItems.forEach((stacItem) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: we generally perffer for const (item of allStacItems) over .forEach

const byDirectory = new ByDirectory<TiffItem>();

// create items for each tiff and store them into 'all' by {epsg} and {map code}
for (const tiff of tiffs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be duplicated above and should be simplified.


export const brokenTiffs = { noBounds: [] as string[], noEpsg: [] as string[], noSize: [] as string[] };

export class TiffItem {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just be a object, there is not much point in making a class that only contains global state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original motive for TiffItem existing as a class rather than an object was to allow the use of JSON.stringify on the ByDirectory<TiffItem> instance. With this, we output an itemsByDirectory.json file that looks something like this:

{
  "all": {
    "2193": {
      "250-01": [
        "250-01_v1-01",
        ...
      ],
      "250-02": [
        "250-02_v1-01",
        ...
      ],
      ...
    },
    "3793": {
      "250-31": [
        "250-31_v1-01",
        ...
      ]
    }
  },
  "latest": {
    "2193": {
      "250-01": "250-01_v1-03",
      "250-02": "250-02_v1-03",
      ...
    },
    "3793": {
      "250-31": "250-31_v1-04"
    }
  }
}

This file serves as a sanity check, but does not provide any particular utility. If the file doesn't provide any value, I'm happy to convert the TiffItem class to an object.

@Wentao-Kuang Wentao-Kuang added container build pull request container for testing and removed container build pull request container for testing labels Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container build pull request container for testing
Development

Successfully merging this pull request may close these issues.

3 participants