Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLSP support #189

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

jespinosaar
Copy link

Dear @pdowler ,

This is what I have currently implemented for HLSP support in caom2-repo and icewind libraries. Let's iterate on this to find the most convenient solution. Thanks a lot for your support!

Copy link
Member

@pdowler pdowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary:

  • cannot selectively harvest deletion events with additional params
  • some (HLSP) content-specific code needs to be removed
  • there is a simple/crazy way to correctly express the intent in the config that almost works... maybe try to make that work?

@@ -295,8 +295,8 @@ public void setReadTimeout(int readTimeout) {
}


public List<DeletedObservation> getDeleted(String collection, Date start, Date end, Integer maxrec) {
return readDeletedEntityList(new DeletionListReader(), collection, start, end, maxrec);
public List<DeletedObservation> getDeleted(String collection, String telescope, Date start, Date end, Integer maxrec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selective DeletedObservation (event) harvesting can't really work because the DeletedObservation record only has identifier information. That was rather informal in CAOM-2.4 but in WD-CAOM-2.5 the deleted event classes are formally specified.

If you look at the output of

https://mast.stsci.edu/portal/ArchivePartner/meta-service/del-endpoint/HLSP?maxrec=10&telescope=FOO

you will see it does not filter by telescope. It is not really feasible to implement it.

This change (and others related to deletion events) should be reverted; processing all deleted events in the collection is extra work but it is safe.

@@ -224,4 +229,30 @@ public void run() {
}
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, these methods can't really be in here.


private final InitDatabase initdb;
private final HarvestSource src;
private final HarvestDestination dest;
private final List<String> collections;
private final List<String> telescopes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core problem: the collection(s) and telescope(s) are correlated, but this is just two independent lists to select what to harvest. With nothing more, you would have to try to harvest the cartesian product of collecions-vs-telescopes. That's why the extra content specific constant and code were added. That's not viable.

The configuration needs to provide pairs of {collection,telescope} and not two independent lists.


On a lark, I tried this config because it matches the intent better:

org.opencadc.icewind.repoService=ivo://mast.stsci.edu/caom2repo
org.opencadc.icewind.collection=HLSP?telescope=HST

and it almost worked. It only failed because the URL construction in the RepoClient ignored that ? and added a second one. I added a log message to show the URL:

2025-01-08 20:30:36.883[main] WARN  RepoClient - harvest URL: https://mastpartners.stsci.edu/portal/ArchivePartner/meta-service/obs-endpoint/HLSP?telescope=HST?maxrec=102&end=2025-01-08T20:25:36.882
2025-01-08 20:30:37.365[main] ERROR ObservationHarvester - unexpected exception
ca.nrc.cadc.net.TransientException: Error: requested telescope not available

The error is likely because the service gets telescope value HST?maxrec=102.

So, this could be made to work just by fixing the URL creation (a little more care at line 483):

surl = surl + "?maxrec=" + (rec + 1);

and then it should work with no other changes. It would admittedly be a hidden feature and I would not document it anywhere (not yet at least).


The other effect is that the HarvestSource identifier ends up being ivo://mast.stsci.edu/caom2repo?HLSP?telescope=HST which is slightly wrong (two query string markers) and that might cause an issue if that URI ever needs to be parsed. So I'd probably be more careful in that getIdentifier(String collection) method to check the collection value and ensure a valid URI, maybe collection.replaceAll('?','&')???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above idea works, I think RepoClient would fall down trying to generate the URL(s) to observations (which cannot have that telescope param after the collection name). It would probably be necessary to check the collection name for an optional query string and remove it for creating observation URLs.

@jespinosaar
Copy link
Author

Many thanks for your feedback, @pdowler ! I will try with that approach, providing the combination of collection and telescope and changing the code and URLs accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants