You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?
Anyways... I then noticed that the datastrip id that is included in the item.id does not match the one that is provided in item.properties.s2:datastrip_id. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.
fromcopyimportdeepcopyimportpandasaspdimportplanetary_computerimportpystac_clientdefitems_to_dataframe(items):
_items= []
foriinitems:
_i=deepcopy(i)
_items.append(_i)
df=pd.DataFrame(pd.json_normalize(_items))
forfieldin ["properties.datetime"]:
iffieldindf:
df[field] =pd.to_datetime(df[field])
df=df.sort_values("properties.datetime")
returndfcatalog=pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
roi= {
"type": "Polygon",
"coordinates": [
[
[146.0678527, -15.3746464],
[147.0909455, -15.3765786],
[147.0913918, -16.369226],
[146.0632786, -16.3671625],
[146.0678527, -15.3746464],
]
],
}
search=catalog.search(
collections=["sentinel-2-l2a"],
intersects=roi,
datetime="2022-01-01/2022-11-01",
)
items=search.item_collection()
items_= [i.to_dict() foriinitems]
df=items_to_dataframe(items_)
defsplit_id(x):
returnpd.Series(x.id.split("_"))
df[
[
"mission_id",
"product_level",
"datetake_start_time",
"relative_orbit_number",
"tilenumber",
"id_datastrip",
]
] =df.apply(split_id, axis=1)
# two examples for which I found data which same data, but different datastripsSAME_DATA_DIFFERENT_DATASTRIP= [
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]
df_=df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()
# makes it a bit easier to see the differencedefsplit_s2_datstrip(x):
returnx["properties.s2:datastrip_id"].split("_")[6]
df_["s2_datastrip"] =df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]
id_datastrip
s2_datastrip
20220227T190716
20220227T190717
20220212T221526
20220212T221527
The text was updated successfully, but these errors were encountered:
FlorisCalkoen
changed the title
Sentinel 2A: datastrip element in item-id does not match datastrip element in metadata
Sentinel 2A: datastrip in item.id does not match datastrip in item.properties.s2:datastrip_id
Jan 12, 2023
Thanks for the report. IIUC, there are potentially two issues:
Duplicate data: where identical(?) data was downlinked to two stations and so given two different STAC IDs. Is that correct? I'll look into the pair of IDs you provided.
The Planetary Computer's item IDs not matching the datastrip IDs.
I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?
Anyways... I then noticed that the datastrip id that is included in the
item.id
does not match the one that is provided initem.properties.s2:datastrip_id
. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.The text was updated successfully, but these errors were encountered: