Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple copies of some deep stacks produced. #688

Open
wfastrononomer opened this issue Jan 19, 2025 · 9 comments
Open

Multiple copies of some deep stacks produced. #688

wfastrononomer opened this issue Jan 19, 2025 · 9 comments
Assignees

Comments

@wfastrononomer
Copy link

wfastrononomer commented Jan 19, 2025

Started looking into #687 and found that some deep stacks have been repeated.

/disk67/vsa/products/stacks/20241114_v2/e20241114_00170007310_dp_st.fit
/disk67/vsa/products/stacks/20241114_v2/e20241114_00170007310_dp_st_cat.fits
/disk68/vsa/products/stacks/20241204_v2/e20241204_00170007310_dp_st.fit
/disk68/vsa/products/stacks/20241204_v2/e20241204_00170007310_dp_st_cat.fits

But productID=7310 is in twice. Once processed on 20241114 and once on 20241204.

Apparently 2466 stack products repeated. Some more than once. Maybe some tiles too. Why?

@wfastrononomer
Copy link
Author

wfastrononomer commented Jan 21, 2025

Check why some have been rerun. Why isn't this picked up. If part way ProductProcessing should prevent this. If completed and ingested it is possible. Perhaps need to add more checks in cu13. Normally I expect cu13 to be run as part of AutoCurate, but we aren't doing that....

Check different cuEventIDs and then log files.

@wfastrononomer
Copy link
Author

Have to remove some too.

@wfastrononomer
Copy link
Author

select frameType,COUNT() from (select m.frameType,p.productID from ProgrammeFrame as p,Multiframe as m where p.programmeID=170 and p.productID>0 and p.releaseNum=2 and p.multiframeID=m.multiframeID and m.frameType like '%deep%stack' and m.deprecated=0 group by m.frameType,p.productID having COUNT()>1) as mprod group by frameType

| | frameType | | |

|>| deepstack | 2466 |<|
|>| tiledeepstack | 329 |<|

@esutorius
Copy link

esutorius commented Jan 21, 2025

That inner query interestingly gives results for productIDs > 1400 that haven't even been processed yet by cu13...

No, the maximum tile productID is 1400. There are productIDs >1400 but only for deep paw prints:

elect frameType,min(productID),max(productID),COUNT() from (select m.frameType,p.productID from ProgrammeFrame as p,Multiframe as m where p.programmeID=170 and p.productID>0 and p.releaseNum=2 and p.multiframeID=m.multiframeID and m.frameType like '%deep%stack' and m.deprecated=0 group by m.frameType,p.productID having COUNT()>1) as mprod group by frameType

| | frameType | | | | |

|>| deepstack | 4746 | 8424 | 2466 |<|
|>| tiledeepstack | 1001 | 1400 | 329 |<|

If you remove the 'having count(*) > 1' you'll see that there are already entries (having count = 1) for tiledeepstacks from 1400 up to productid 3711 while I'm running 1401-1600 only now without having ingested any tiles yet.

@wfastrononomer
Copy link
Author

wfastrononomer commented Jan 21, 2025

Okay - are these the ones that we have done for Dante?

Some maybe, but others are from runs where the connection or something else broke, it seems. I guess the easiest way would be to deprecate all but the latest date version of multiple files when we've finished processing all tiles.

@wfastrononomer
Copy link
Author

wfastrononomer commented Jan 21, 2025

I think I will try add a bit in to cu13 that checks the existing products have been produced already. That code is in Automator now, and it is good sometimes for testing that it isn't in cu13, but perhaps I will put it in for runs on the main archive (WSA, VSA, VSAVVV).

Also normally, ProductProcessing would take care of this, but we have done a mixture of processing and ingestion, so this issue has slipped through.

@esutorius
Copy link

Is there a keyword written to the fits files (stacks/tiles) to indicate that it is completely processed and ready for ingest? From the output of CU13 I gather that it is writing/updating the files multiple times, so how do we know despite all of the different breaks/connection issues when it's really done?

@wfastrononomer
Copy link
Author

Looking at cu13, there is a checkIngested option for this scenario already. I had forgotten this.

Run cu13 with option -i for VVVX in the future.

@wfastrononomer wfastrononomer moved this from To do to In progress in VVVX DR6 Jan 21, 2025
@tms-epcc
Copy link
Collaborator

tms-epcc commented Jan 24, 2025

24/JAN/24
@esutorius reported

  • see -i option above
  • final steps are
    • deprecate duplicated files and keep the last processed files

@github-project-automation github-project-automation bot moved this from In progress to Closed in VVVX DR6 Jan 24, 2025
@tms-epcc tms-epcc reopened this Jan 24, 2025
@tms-epcc tms-epcc moved this from Closed to In progress in VVVX DR6 Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

3 participants