-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor WFAU image retrieval to reject deprecated images #2809
base: main
Are you sure you want to change the base?
Changes from 5 commits
d070bdf
2fa5050
6d16328
bfd412e
e3e4905
3a7b298
c4db9fb
79fd880
75e4726
5ac89b0
ed94efe
8733870
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,147 @@ | ||
<HTML><HEAD> <script language="JavaScript" type="text/javascript"> | ||
<!-- | ||
window.focus(); | ||
//--> </script> | ||
</HEAD><h3>GetImage cut-out results</h3> | ||
<br><b>J2000 coords:</b> RA: 83.6330757 Dec:22.014436 | ||
<br><b>Programme:</b> All UKIDSS surveys | ||
<br><b>Filter:</b> all | ||
<br>Processing ... | ||
<br>Connecting to database: UKIDSSDR7PLUS<p> | ||
<table border="1"><tr bgcolor="#FFFFCC"><td><b>Link</b></td><td><b>multiframeID</b></td><td><b>frametype</b></td><td><b>obstype</b></td><td><b>filterid</b></td><td><b>shortname</b></td><td><b>dateObs</b></td><td><b>extNum</b></td> | ||
<tr bgcolor=#FFDDDD> | ||
<td><a href="http://surveys.roe.ac.uk/wsa/cgi-bin/getImage.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf_st.fit&mfid=1737581&extNo=4&lx=1339&hx=1638&ly=1952&hy=2251&rf=0&flip=1&uniq=556_22_18_31563_1&xpos=150.6&ypos=150.1&band=K&ra=83.6330757&dec=22.014436" target=show>show</a></td> | ||
<td nowrap>1737581</td> | ||
<td nowrap>leavstack</td> | ||
<td nowrap>OBJECT</td> | ||
<td nowrap>5</td> | ||
<td nowrap>K</td> | ||
<td nowrap>2007-10-11 13:12:05.5</td> | ||
<td nowrap>5</td> | ||
</table> | ||
1 rows returned. | ||
<?xml version="1.0" encoding="ISO-8859-1"?> | ||
|
||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" | ||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> | ||
|
||
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> | ||
<head> | ||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> | ||
<title>WSA ImageList</title> | ||
|
||
<script type="text/javascript" src="http://wsa.roe.ac.uk/imSwap.js"> | ||
</script> | ||
</head> | ||
|
||
<body bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#000080" alink="#FF0000" onLoad="swap(0);" > | ||
<table cellpadding="0" cellspacing="0" border="0"><tr><td><b>WSA ImageList</b></td><td> </td> | ||
<td><img name="imgMain" src="http://wsa.roe.ac.uk/static.gif" border="1"></td></table><p> | ||
Not logged in: links will only be returned for frames that are publicly accessible<p> | ||
Archive Listing<p>Searching...<br> | ||
<b>Survey: </b>UKIDSS Galactic Clusters Survey, GCS<br> | ||
<b>Waveband: </b>K<br> | ||
<b>Minimum RA:</b> 5.551333333333333 hours <b>Maximum RA:</b> 5.599333333333333 hours <br> | ||
<b>Minimum Dec:</b> 21.68116666666667 degrees <b>Maximum Dec:</b> 22.347833333333334 degrees <br> | ||
<script language="JavaScript" type="text/javascript"> <!-- | ||
swap(1); | ||
--> </script> | ||
<br><b>Using database:</b> UKIDSSDR11PLUS | ||
<p><table><tr><td align="right"><b>View</b> column link</td><td align="left">shows jpeg images of multiframe in a new window plus links to download file(s)</td> | ||
<tr><td align="right"><b>Img</b> column link</td><td align="left">download the <a href="http://wsa.roe.ac.uk/qa.html#compress">RICE compressed</a> FITS image file. Use <b>View</b> column link to retrieve uncompressed images.</td> | ||
<tr><td align="right"><b>Cat</b> column link</td><td align="left">download the FITS catalogue file.</td> | ||
</table> | ||
<br>begin row 1<br><table border="1"><tr bgcolor="#FFFFCC"><td align="center">View</td><td align="center">Img</td><td align="center">Cat</td> | ||
<td align="middle"><b>multiframeID</b></td> | ||
<td align="middle"><b>frameType</a></b></td> | ||
<td align="middle"><b>obstype</a></b></td> | ||
<td align="middle"><b>raBase</a></b></td> | ||
<td align="middle"><b>decBase</a></b></td> | ||
<td align="middle"><b>shortname</a></b></td> | ||
<td align="middle"><b>exptime</a></b></td> | ||
<td align="middle"><b>dateObs</a></b></td> | ||
<td align="middle"><b>project</a></b></td> | ||
<td align="middle"><b>numDetectors</b></td> | ||
<td align="middle"><b>ukirtRunNo</a></b></td> | ||
</tr> | ||
<tr bgcolor=#FFDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf.fit&cat=NONE&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01802_sf&noExt=4&MFID=1737551&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf.fit&MFID=1737551&rID=48">FITS</a></td> | ||
<td> </td> | ||
<td nowrap align="right">1737551</td> | ||
<td nowrap align="right">leav</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5777306</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:08:30.0</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1802</td> | ||
</tr> | ||
<tr bgcolor=#DDDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf_st.fit&cat=djoser:/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf_st_cat.fits&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01802_sf_st&noExt=4&MFID=1737553&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf_st.fit&MFID=1737553&rID=48">FITS</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=djoser:/disk05/wsa/ingest/fits/20071011_v1/w20071011_01802_sf_st_cat.fits&MFID=1737553&rID=48">FITS</a></td> | ||
<td nowrap align="right">1737553</td> | ||
<td nowrap align="right">leavstack</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5777306</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:08:30.0</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1802</td> | ||
</tr> | ||
<tr bgcolor=#FFDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01806_sf.fit&cat=NONE&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01806_sf&noExt=4&MFID=1737559&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01806_sf.fit&MFID=1737559&rID=48">FITS</a></td> | ||
<td> </td> | ||
<td nowrap align="right">1737559</td> | ||
<td nowrap align="right">leav</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5777306</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:09:18.6</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1806</td> | ||
</tr> | ||
<tr bgcolor=#DDDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf.fit&cat=NONE&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01818_sf&noExt=4&MFID=1737579&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf.fit&MFID=1737579&rID=48">FITS</a></td> | ||
<td> </td> | ||
<td nowrap align="right">1737579</td> | ||
<td nowrap align="right">leav</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5935528</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:12:05.5</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1818</td> | ||
</tr> | ||
<tr bgcolor=#FFDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf_st.fit&cat=djoser:/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf_st_cat.fits&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01818_sf_st&noExt=4&MFID=1737581&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf_st.fit&MFID=1737581&rID=48">FITS</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=djoser:/disk05/wsa/ingest/fits/20071011_v1/w20071011_01818_sf_st_cat.fits&MFID=1737581&rID=48">FITS</a></td> | ||
<td nowrap align="right">1737581</td> | ||
<td nowrap align="right">leavstack</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5935528</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:12:05.5</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1818</td> | ||
</tr> | ||
<tr bgcolor=#DDDDDD> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/display.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01822_sf.fit&cat=NONE&comp=djoser:/disk04/wsa/products/jpgs/20071011_v1/w20071011_01822_sf&noExt=4&MFID=1737587&rID=48" target=display>view</a></td> | ||
<td><a href="http://wsa.roe.ac.uk/cgi-bin/fits_download.cgi?file=/disk05/wsa/ingest/fits/20071011_v1/w20071011_01822_sf.fit&MFID=1737587&rID=48">FITS</a></td> | ||
<td> </td> | ||
<td nowrap align="right">1737587</td> | ||
<td nowrap align="right">leav</td> | ||
<td nowrap align="right">OBJECT</td> | ||
<td nowrap align="right">+5.5935528</td> | ||
<td nowrap align="right">+21.7913333</td> | ||
<td nowrap align="right">K</td> | ||
<td nowrap align="right">+10.000000</td> | ||
<td nowrap align="right">2007-10-11 13:12:53.8</td> | ||
<td nowrap align="right">U/UKIDSS/GCS21</td> | ||
<td nowrap align="right">4</td> | ||
<td nowrap align="right">1822</td> | ||
</tr> | ||
</table><br> row(s) 1 to 6 displayed.<br> | ||
<pre></pre><a href="javascript:history.go(-1)">Back to form</a> (uses Javascript)<script language="JavaScript" type="text/javascript"> | ||
<!-- | ||
swap(0) | ||
--> | ||
</script> | ||
</body></html> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,12 +6,13 @@ | |
import time | ||
from math import cos, radians | ||
import requests | ||
from bs4 import BeautifulSoup | ||
from bs4 import BeautifulSoup, XMLParsedAsHTMLWarning | ||
from io import BytesIO, StringIO | ||
|
||
import astropy.units as u | ||
import astropy.coordinates as coord | ||
import astropy.io.votable as votable | ||
from astropy.io import ascii | ||
|
||
from ..query import QueryWithLogin | ||
from ..exceptions import InvalidQueryError, TimeoutError, NoResultsWarning | ||
|
@@ -290,10 +291,49 @@ def get_images_async(self, coordinates, *, waveband='all', frame_type='stack', | |
show_progress=show_progress) | ||
for url in image_urls] | ||
|
||
def get_image_list(self, coordinates, *, waveband='all', frame_type='stack', | ||
image_width=1 * u.arcmin, image_height=None, | ||
radius=None, database=None, | ||
programme_id=None, get_query_payload=False): | ||
def get_image_list(self, coordinates, *, radius=None, ignore_deprecated=True, **kwargs): | ||
""" | ||
See `get_image_table` for a full list of options. | ||
|
||
This method will return _only_ the URLs requested as a list of URLs. | ||
|
||
Parameters | ||
---------- | ||
ignore_deprecated : bool | ||
If set (default: True), only images with the ``deprecated`` flag | ||
set to zero will be included | ||
|
||
Returns | ||
------- | ||
url_list : list of image urls | ||
|
||
""" | ||
image_table = self.get_image_table(coordinates, radius=radius, **kwargs) | ||
|
||
if ignore_deprecated and radius is None: | ||
image_urls = image_table[image_table['deprecated'] == 0]['Link'] | ||
elif radius is not None: | ||
image_urls = image_table['Img'] | ||
else: | ||
image_urls = image_table['Link'] | ||
|
||
# different links for radius queries and simple ones | ||
if radius is not None: | ||
image_urls = [link for link in image_urls if | ||
('fits_download' in link and '_cat.fits' | ||
not in link and '_two.fit' not in link)] | ||
else: | ||
# Not sure this is necessary any more (as of #2809), but it seems | ||
# harmless and I'm not removing it until I'm sure | ||
image_urls = [link.replace("getImage", "getFImage") | ||
for link in image_urls] | ||
|
||
return image_urls | ||
|
||
def get_image_table(self, coordinates, *, waveband='all', frame_type='stack', | ||
image_width=1 * u.arcmin, image_height=None, | ||
radius=None, database=None, | ||
programme_id=None, get_query_payload=False): | ||
""" | ||
Function that returns a list of urls from which to download the FITS | ||
images. | ||
|
@@ -337,7 +377,9 @@ def get_image_list(self, coordinates, *, waveband='all', frame_type='stack', | |
|
||
Returns | ||
------- | ||
url_list : list of image urls | ||
table : Table | ||
An astropy table containing the metadata table, including URLs, of | ||
the requested files. | ||
|
||
""" | ||
|
||
|
@@ -399,19 +441,43 @@ def get_image_list(self, coordinates, *, waveband='all', frame_type='stack', | |
return request_payload | ||
|
||
response = self._wfau_send_request(query_url, request_payload) | ||
self._penultimate_response = response | ||
response = self._check_page(response.url, "row") | ||
self._last_response = response | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this really necessary? Also, try to avoid overwriting variables to help future debugging, the second There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, this was for debugging, I'll remove it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeap, keeping There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fair enough, I could drop that. But when else do you get to use |
||
|
||
return self.parse_imagequery_page(response.text, radius=radius) | ||
|
||
def parse_imagequery_page(self, html_in, radius=None): | ||
""" | ||
Parse the image metadata page | ||
""" | ||
ahref = re.compile(r'href="([a-zA-Z0-9_\.&\?=%/:-]+)"') | ||
|
||
image_urls = self.extract_urls(response.text) | ||
# different links for radius queries and simple ones | ||
if radius is not None: | ||
image_urls = [link for link in image_urls if | ||
('fits_download' in link and '_cat.fits' | ||
not in link and '_two.fit' not in link)] | ||
else: | ||
image_urls = [link.replace("getImage", "getFImage") | ||
for link in image_urls] | ||
html = "\n".join([ | ||
# for radius searches, "FITS" needs to be s/FITS/url/ | ||
row.replace(">FITS<", ">{}<".format(ahref.search(row).groups()[0])) if ">FITS<" in row else | ||
row | ||
for row in html_in.split("\n")]) | ||
with warnings.catch_warnings(): | ||
# this is really html; the xml parser doesn't work | ||
warnings.simplefilter(action="ignore", category=XMLParsedAsHTMLWarning) | ||
soup = BeautifulSoup(html, features='html5') | ||
httb = soup.findAll('table')[2] | ||
firstrow = httb.findAll('tr')[0] | ||
for td in firstrow.findAll('td'): | ||
td.name = 'th' | ||
return ascii.read(str(httb), format='html') | ||
|
||
return image_urls | ||
else: | ||
html = "\n".join([ | ||
# for ascii.read: th -> header | ||
row.replace("td", "th") if row.startswith("<table border") else | ||
# "show" is the default, but we want the URLs | ||
row.replace(">show<", ">{}<".format(ahref.search(row).groups()[0])) if ">show<" in row else | ||
row | ||
for row in html_in.split("\n")]) | ||
return ascii.read(html, format='html') | ||
|
||
def extract_urls(self, html_in): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I commented this before in another module: I think methods like this should not be class methods, definitely not public ones, but instead private functions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I agree - happy to make this, and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can't make |
||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spell out kwargs, we try to get rid of this way of hiding them so we should not introduce more of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, ok. That's a lot of duplicated text, which I don't love, but fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but if it's public API, one would need useful docstrings, too. dumping kwarg out there is not exactly useful. OTOH, if it's an option of not getting slightly overlapping public API then certainly I would be in favour of not duplicating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_image_list
is more of an internal-pointing method. It is technically usable, and useful, to users, butget_image_table
is more useful and I prefer to point people there. I'd actually lean toward making_get_image_list
private before duplicating its docstring and keyword arguments. Let me know which you prefer