Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id #396

Open
FlorisCalkoen opened this issue Jan 12, 2023 · 1 comment

Comments

@FlorisCalkoen
Copy link

I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?

Anyways... I then noticed that the datastrip id that is included in the item.id does not match the one that is provided in item.properties.s2:datastrip_id. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.

from copy import deepcopy

import pandas as pd
import planetary_computer
import pystac_client

def items_to_dataframe(items):
    _items = []
    for i in items:
        _i = deepcopy(i)
        _items.append(_i)
    df = pd.DataFrame(pd.json_normalize(_items))
    for field in ["properties.datetime"]:
        if field in df:
            df[field] = pd.to_datetime(df[field])
    df = df.sort_values("properties.datetime")
    return df


catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

roi = {
    "type": "Polygon",
    "coordinates": [
        [
            [146.0678527, -15.3746464],
            [147.0909455, -15.3765786],
            [147.0913918, -16.369226],
            [146.0632786, -16.3671625],
            [146.0678527, -15.3746464],
        ]
    ],
}


search = catalog.search(
    collections=["sentinel-2-l2a"],
    intersects=roi,
    datetime="2022-01-01/2022-11-01",
)

items = search.item_collection()

items_ = [i.to_dict() for i in items]
df = items_to_dataframe(items_)


def split_id(x):
    return pd.Series(x.id.split("_"))


df[
    [
        "mission_id",
        "product_level",
        "datetake_start_time",
        "relative_orbit_number",
        "tilenumber",
        "id_datastrip",
    ]
] = df.apply(split_id, axis=1)

# two examples for which I found data which same data, but different datastrips
SAME_DATA_DIFFERENT_DATASTRIP = [
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
    "S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]

df_ = df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()

# makes it a bit easier to see the difference
def split_s2_datstrip(x):
    return x["properties.s2:datastrip_id"].split("_")[6]


df_["s2_datastrip"] = df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]
id_datastrip s2_datastrip
20220227T190716 20220227T190717
20220212T221526 20220212T221527
@FlorisCalkoen FlorisCalkoen changed the title Sentinel 2A: datastrip element in item-id does not match datastrip element in metadata Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id Jan 12, 2023
@TomAugspurger
Copy link

Thanks for the report. IIUC, there are potentially two issues:

  1. Duplicate data: where identical(?) data was downlinked to two stations and so given two different STAC IDs. Is that correct? I'll look into the pair of IDs you provided.
  2. The Planetary Computer's item IDs not matching the datastrip IDs.

For the second point, we use the sentinel-2 stactools package. That generates its ID at https://github.com/stactools-packages/sentinel2/blob/e99b50755b1934a40179310a450f64ce9e5cdf72/src/stactools/sentinel2/product_metadata.py#L107-L134. I'm not too familiar with the details, but I don't think that's intended to be identical to the s2:datastrip_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants