Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simbad: refactor to use TAP #2954

Merged
merged 28 commits into from
Jun 20, 2024
Merged

Simbad: refactor to use TAP #2954

merged 28 commits into from
Jun 20, 2024

Conversation

ManonMarchand
Copy link
Member

@ManonMarchand ManonMarchand commented Feb 22, 2024

Hi astroquery,

List of changes:

Adding criteria

Formerly, there was only one way to enter criteria on the output, and it was the query_criteria method.
This method is now deprecated, and works on a 'reproduce at the best of our capabilities but cannot garantie that everything will work' basis.

But it is replaced by a new criteria argument in every query_*** method (except query_tap because people can directly write that into their ADQL string).

This new interface looks like this:

from astroquery.simbad import Simbad
Simbad.query_region("NGC 3540", radius="1d", criteria="otype != 'Galaxy..'")
<Table length=568>
            main_id                     ra                dec         coo_err_maj coo_err_min coo_err_angle coo_wavelength     coo_bibcode    
                                       deg                deg             mas         mas          deg                                        
             object                  float64            float64         float32     float32       int16          str1             object      
------------------------------- ------------------ ------------------ ----------- ----------- ------------- -------------- -------------------
    Gaia DR3 762197128415141632 166.97067113513333  36.64013738050528      0.0392      0.0617            90              O 2020yCat.1350....0G
    Gaia DR3 761945928663296128  168.0368192975804  36.19277340338082      0.0493      0.0672            90              O 2020yCat.1350....0G
                    GRB 230606A           166.8095 36.484249999999996      2000.0      2000.0            --              X url:SWIFT          
                        CLS  37     168.2657239642     35.90652522302      0.8057      0.9704            90              O 2020yCat.1350....0G
         FIRST J110854.7+355104 167.22791666666666 35.851277777777774       570.0       510.0             0                1995ApJ...450..559B
         FIRST J110740.1+364843 166.91745833333334  36.81213888888889      1090.0      1090.0            90                1995ApJ...450..559B
                 [TKK2018] 3605          167.24513           37.00419          --          --            --              O 2018A&A...618A..81T
                 [TKK2018] 3621          167.66022           36.88636          --          --            --              O 2018A&A...618A..81T
            [T2015] nest 103154          167.92918           35.41882          --          --            --              O 2016A&A...588A..14T
    Gaia DR3 762127451160945024 167.38398588803042  36.33682265764583      0.0545      0.0578            90              O 2020yCat.1350....0G
...

Where the criteria string can be translated from the old syntax to the new one with the helper class:

from astroquery.simbad.utils import CriteriaTranslator
CriteriaTranslator.parse("region(GAL,180 0,2d) & otype = 'G' & (galdim_minaxis >= 10|galdim_majaxis >= 10)")
"CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', 86.40498828654475, 28.93617776179148, 2.0)) = 1 AND otype = 'G' AND (galdim_minaxis >= 10 OR galdim_majaxis >= 10)"

On the output

Adding columns to the output

Some 'votable_fields' options have changed. The old ones should still work and raise a warning indicating the new name in the TAP interface.

That'd look like this:

  • someone calling a votable field deprecated for years and that really does not exist in SIMBAD anymore:
from astroquery.simbad import Simbad
simbad = Simbad()
simbad.add_votable_field("einstein")
ValueError: 'einstein' is no longer a part of SIMBAD. It was moved into a separate VizieR catalog. It is possible to query it with the `astroquery.vizier` module.
  • someone calling with a column/table that has a different name between the old and new interfaces
from astroquery.simbad import Simbad
simbad = Simbad()
simbad.add_votable_field("id(1)")
<stdin>:1: DeprecationWarning: 'id(1)' has been renamed 'main_id'. You'll see it appearing with its new name in the output table

Every other possibility can be listed with:

from astroquery.simbad import Simbad
Simbad.list_votable_fields()
<Table length=97>
       name                                       description                                            type         
      object                                         object                                             object        
----------------- ---------------------------------------------------------------------------- -----------------------
              ids                                             all names concatenated with pipe                   table
         otypedef                               all names and definitions for the object types                   table
            ident                                        Identifiers of an astronomical object                   table
             flux                      Magnitude/Flux information about an astronomical object                   table
        allfluxes                             all flux/magnitudes U,B,V,I,J,H,K,u_,g_,r_,i_,z_                   table
          has_ref Associations between astronomical objects and their bibliographic references                   table
      mesDistance                   Collection of distances (pc, kpc or Mpc) by several means.                   table
...

(note that the number of possible options went from 107 to 97, among with 12 tables that really don't exist in Simbad since years. So the possibilities are now slightly bigger 🙂 )

Changes in output style

  • As hinted above, some columns don't have the same name in the two Simbad interfaces. They are very close though.
  • Unfortunately, the votable output of Simbad-TAP has lower case column names. This is a breaking change from previous interface.
  • The default output format for coordinates is degrees rather than sexadecimal like before.

All these could be hidden to the users by modifying the output table in query_tap on python side. Is it worth it?

Empty result

Empty result is valid in TAP. So the default ROW_LIMIT could not stay at zero to mean infinity. I copied VizieR module API and went with -1.

It also means that a query with no answer returns an empty table and not None like before. This broke JWST module testing, so this PR also has some changes here. Should not break anything, but I'm not an expert of their module.

Caching

Caching in the simbad module is now handled by python built-in lru_cache. This might be reverted if we add a caching mechanism to BaseVOQuery (something that'd keep things in a votable-xml format in the default astroquery cache folder maybe?). But I did not go all this way yet.

Changes to query_*** methods (except the criteria argument covered above)

Query_object

The script number ID in query_object is replaced by matched_id that contains the ID that corresponded to the wildcard expression.

It looks like this:

from astroquery.simbad import Simbad
simbad = Simbad()
simbad.add_to_output("otype")
simbad.ROW_LIMIT = 20
eso_clusters = simbad.query_object("ESO*", wildcard=True, criteria="otype='Cl*..'")
eso_clusters["main_id", "ra", "dec", "otype", "matched_id"]
<Table length=20>
     main_id               ra                 dec         otype   matched_id
                          deg                 deg                           
      object            float64             float64       object    object  
------------------ ------------------ ------------------- ------ -----------
         NGC  1776           74.66475  -66.42954166666667    Cl*  ESO  85-28
        ESO  57-81   96.2041666666667  -71.65833333333335    Cl*  ESO  57-81
         NGC  2097           86.02948           -62.78351    Cl*  ESO  86-28
         NGC  1865  78.10391666666668  -68.77200277777777    Cl*  ESO  56-78
         NGC   299 13.353083333333332  -72.19655555555556    Cl*   ESO  51-5
         NGC  1849  77.39450000000001  -66.31465833333334    Cl*  ESO  85-49
         NGC  1826  76.39174999999999  -66.22904166666666    Cl*  ESO  85-43
        ESO  85-41           76.35151           -63.28754    Cl*  ESO  85-41
         NGC  2121  87.05495833333332  -71.48111111111112    Cl*  ESO  57-40
         NGC  2019  82.98533333333333  -70.15902777777778    Cl* ESO  56-145
         NGC  1777              73.95  -74.28333333333333    Cl*   ESO  33-1
         NGC  2047  83.98316666666666  -70.18519722222223    Cl* ESO  56-167
Cl Westerlund    1             251.76 -45.852000000000004    Cl*  ESO 277-12
    Cl Terzan    5 267.02083333333337 -24.779999999999998    Cl*  ESO 520-27
         ESO  51-3 12.208333333333334  -69.86999999999999    Cl*   ESO  51-3
         NGC  2257  97.55258333333333  -64.32777777777777    Cl*  ESO  87-24
         NGC  2203  91.17570833333333  -75.43772222222222    Cl*   ESO  34-4
         NGC  1751  73.55379166666667  -69.80744444444444    Cl*  ESO  56-23
         NGC   416 16.995958333333334  -72.35569444444444    Cl*  ESO  29-32
         NGC   411  16.98183333333333  -71.76752777777777    Cl*  ESO  51-19

Query_objects

It now has a user_specified_id column as requested in #967 . The object_number_id replaces script_number_id (but this could be reverted, it's just strange as there is really one script so I took the liberty to change it)

Looks like this:

from astroquery.simbad import Simbad
Simbad.query_objects(("m1", "m2", "m503", "m21"))
<Table length=4>
main_id         ra                 dec         coo_err_maj coo_err_min coo_err_angle coo_wavelength     coo_bibcode     typed_id object_number_id
               deg                 deg             mas         mas          deg                                                                  
 object      float64             float64         float32     float32       int16          str1             object        object       int64      
------- ------------------ ------------------- ----------- ----------- ------------- -------------- ------------------- -------- ----------------
  M   1            83.6287             22.0147     18500.0     18500.0             0              R 1995AuJPh..48..143S       m1                1
  M   2 323.36258333333336 -0.8232499999999998          --          --            --              O 2010AJ....140.1830G       m2                2
                        --                  --          --          --            --                                        m503                3
  M  21            271.036 -22.505000000000003          --          --            --              O 2021A&A...647A..19T      m21                4

Note that the requested feature (I could not find the issue anymore, so maybe it's a complaint we received directly in CDS) that objects not found would return an empty line is there: M503 obviously does not exist.

Query_bibcode

The output is very different.

Former output:

from astroquery.simbad import Simbad
Simbad.query_bibcode("1894MNRAS..55...17L")
<Table length=1>
                                                               References                                                              
                                                                 str132                                                                
---------------------------------------------------------------------------------------------------------------------------------------
1894MNRAS..55...17L = DOI 10.1093/mnras/55.1.17\nMon. Not. R. Astron. Soc., 55, 17 (1894)\nLEWIS T.\nNote on the orbit of kappa Pegasi.

New output:

from astroquery.simbad import Simbad
Simbad.query_bibcode("1894MNRAS..55...17L")
<Table length=1>
      bibcode                doi          journal nbobject  page last_page               title                volume  year
       object               object         object  int32   int32   int32                 object               int32  int16
------------------- --------------------- ------- -------- ----- --------- ---------------------------------- ------ -----
1894MNRAS..55...17L 10.1093/mnras/55.1.17   MNRAS        1    17        -- Note on the orbit of kappa Pegasi.     55  1894

I know it's a very different output but I really hated the former one. I can try to reproduce the old one but ☹️

Also, it is now possible to retrieve the abstract with abstract=True.

Query_region, query_catalog, query_bibobj, query_object_ids

No big changes

In JWST

As Simbad is used in the tests, I just patched what I could how I could, may not be the prettiest way to go :/

Also, now an empty response returns an empty table and not None, so I reflected that in jwst core.py.

Issues linked to this PR

Fixes: #2198
Fixes: #1468
Fixes: #2041
Partially: #967 (It is done for query_objects, but not for query_region yet)
closes: #2292

@pep8speaks

This comment was marked as resolved.

@ManonMarchand ManonMarchand force-pushed the refactor_Simbad branch 2 times, most recently from 01a826c to 6b01523 Compare February 22, 2024 13:15
@ManonMarchand

This comment was marked as outdated.

@ManonMarchand ManonMarchand force-pushed the refactor_Simbad branch 2 times, most recently from 52edbf9 to 535ddfc Compare February 22, 2024 16:00
Copy link

codecov bot commented Feb 22, 2024

Codecov Report

Attention: Patch coverage is 97.12526% with 14 lines in your changes missing coverage. Please review.

Project coverage is 67.52%. Comparing base (0ffd5d4) to head (98ae3b4).
Report is 55 commits behind head on main.

Current head 98ae3b4 differs from pull request most recent head 6594ebb

Please upload reports for the commit 6594ebb to get more accurate results.

Files Patch % Lines
astroquery/simbad/core.py 95.95% 12 Missing ⚠️
astroquery/simbad/setup_package.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2954      +/-   ##
==========================================
+ Coverage   67.17%   67.52%   +0.34%     
==========================================
  Files         231      233       +2     
  Lines       18247    18224      -23     
==========================================
+ Hits        12258    12305      +47     
+ Misses       5989     5919      -70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ManonMarchand

This comment was marked as outdated.

@bsipocz bsipocz added the simbad label Feb 23, 2024
@bsipocz
Copy link
Member

bsipocz commented Feb 23, 2024

The currently minimal astropy version supported in astroquery is 4.2.1. Is the jump negotiable? I don't see a lot of deprecations in the change log between these two versions.

This won't make the cut for 0.4.7, and after that one is out I'm planning to bump the minimum required versions (at least to astropy 5.0, but maybe even something newer), so no need to do workaround for support for old versions.

@bsipocz
Copy link
Member

bsipocz commented Feb 23, 2024

astroquery/simbad/tests/test_deprecated_simbad.py:482:78: W504 line break after binary operator
and here I should break before? Which one do you prefer? I'm happy with both.

We enforce W504 and ignore W503 as break before operator is a bit more legible logic.

@ManonMarchand
Copy link
Member Author

Hello!

I'm almost in a un-draft state :)

Here are the remaining points that I'm unsure about:

  • which version of astroquery do you think this should be merged in? Is there a way to leave an extended changes description somewhere? Like a more narrative changelog?
  • what did I do wrong with the docs? I don't have failures locally (only warnings about other modules)

@ManonMarchand ManonMarchand force-pushed the refactor_Simbad branch 3 times, most recently from d79cddc to 5ca38bc Compare April 8, 2024 14:55
@ManonMarchand ManonMarchand marked this pull request as ready for review April 8, 2024 14:59
@bsipocz
Copy link
Member

bsipocz commented Apr 8, 2024

Sorry for the delay in responding:

which version of astroquery do you think this should be merged in? Is there a way to leave an extended changes description somewhere? Like a more narrative changelog?

Next release, 0.4.8. At some point I'll refactor the versioning and related infrastructure, but don't have an ETA when it happens.
For a more detailed narrative about the changes I would suggest to add a section in the narrative documentation. In practice that is the place users will notice features, I doubt that many refer to the changelog to obtain extended descriptions.

what did I do wrong with the docs? I don't have failures locally (only warnings about other modules)

I suspect this also run into the same pyvo related RTD failure and was unrelated to your changes.

@bsipocz bsipocz added this to the v0.4.8 milestone Apr 8, 2024
@ManonMarchand ManonMarchand marked this pull request as draft April 16, 2024 09:49
@ManonMarchand
Copy link
Member Author

Switching back to draft, I'm tweaking a bit the support of the legacy 'query_criteria'

@ManonMarchand ManonMarchand force-pushed the refactor_Simbad branch 2 times, most recently from d3981a5 to aaf9c5b Compare June 3, 2024 13:16
ManonMarchand and others added 26 commits June 20, 2024 15:15
and move from os to pathlib
this commit also adds a patch to simbad's query_objects in the tests
simbad calls lru_cache from python core library, so no cache_location
Co-authored-by: Adam Ginsburg <[email protected]>
This was not tested remotely. And for now, this is extremely slow. Thus, the remote test for the wildcards in 'query_objects' is skipped except if SKIP_SLOW is set to False.

On the other hand, when there is no wildcard in 'query_objects' there is a way faster method (join the small table on the left). So this PR also implements this.
To use this faster method, the 'construct_query' had to become a bit general and now accepts a 'from_table' argument that defaults to 'basic'.
@bsipocz bsipocz dismissed keflavich’s stale review June 20, 2024 22:37

Review has been addressed

@bsipocz bsipocz merged commit 091e6b3 into astropy:main Jun 20, 2024
8 checks passed
@ManonMarchand ManonMarchand deleted the refactor_Simbad branch June 21, 2024 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants