Skip to content

Commit

Permalink
Merge pull request #49 from lsst-dm/tickets/DM-47136
Browse files Browse the repository at this point in the history
DM-47136: Add documentation on versioning and adding columns/tables.
  • Loading branch information
ktlim authored Oct 26, 2024
2 parents d07190b + 9c542ac commit 8e1f5fc
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 14 deletions.
27 changes: 23 additions & 4 deletions doc/contributor-guide/adding-columns.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
###############
##############
Adding Columns
###############
##############

* Values should be usefully summarized
* Try to make everything into some kind of scalar
Structure
=========

- ConsDB content must relate to exposures or visits or observations structured like exposures. General time series should go in the Engineering and Facilities Database (EFD).
- ConsDB content should generally be scalar values. Large amounts of data, especially arrays or images or cubes, should generally go into the Large File Annex (LFA).
- Avoid arrays expressed as individual columns (e.g. ``something0``, ``something1``, ``something2``) where possible, as this increases the number of columns drastically (and there is `a limit <https://www.postgresql.org/docs/current/limits.html>`_), makes it hard to query (``SELECT`` clauses need to list all of these individually, and ``WHERE`` clauses may need to include large ``OR`` or ``AND`` conditions), and potentially requires a lot of database storage space.
- Columns should be named in all lowercase with underscore (``_``) separators, also known as "snake_case".

Data sources
============

- Columns added to the ``exposure`` and ``ccdexposure`` tables must be derived from the Header Service running at the Summit for a given instrument, which extracts information from the EFD in real time and is designed to provide information critical for Alert Production. (This service also populates the ``visit1`` and ``ccdvisit1`` views.) Changes must typically be coordinated with both the Header Service and the ConsDB teams, in addition to being added to `sdm_schemas <https://github.com/lsst/sdm_schemas>`__.
- The source for the ``exposure_efd*`` tables is the EFD Transformation service running at the US Data Facility, which extracts information from the EFD in batches and is designed for all other EFD data. It has its own configuration.
- Ensure that the data source for the table to which the column is being added will in fact produce that column.

Column descriptions
===================

- Make sure the description is understandable to a non-staff scientist, and try to avoid internal jargon.
- Include `units <https://www.ivoa.net/documents/VOUnits/>`__ for measurements. Note that these should follow IVOA standards, not Astropy unit standards.
- Include a `Unified Content Descriptor (UCD) <https://ivoa.net/documents/UCD1+/20230125/index.html>`__ indicating the meaning of the column.
9 changes: 5 additions & 4 deletions doc/contributor-guide/adding-tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
Adding Tables
##############

* Each source of data should have its own table(s)
* Each dimension combination (exposure, visit, exposure+detector, visit+detector, etc.) should have its own table(s)
* Normalize when possible
* De-normalize via views to make querying easier
- Each source of data should have its own table(s).
- Conversely, each new table being added should have its data source identified.
- Each dimension combination (exposure, visit, exposure+detector, visit+detector, etc.) should have its own table(s).
- Normalize when possible. Try not to repeat non-key columns between tables with the same dimensions.
- De-normalize via views to make querying easier.
21 changes: 16 additions & 5 deletions doc/contributor-guide/inserting-information.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,20 @@
Inserting Information
#####################

* Sasquatch
* REST API
* Direct Kafka messages
Four tools can be used to insert information into ConsDB.

* ConsDB client library in summit_utils
* ConsDB REST API
- `Sasquatch <https://sasquatch.lsst.io/user-guide/sendingdata.html>`__

- Sasquatch will be configured to write via a Kafka Connector to tables in ConsDB. This should become the preferred interface for data sources to insert information. It provides isolation from SQL details (and does not require a SQL client library), and it can be used from any programming language. The Kafka messaging system provides resiliency.
- `REST Proxy <https://sasquatch.lsst.io/user-guide/restproxy.html>`__
- `Direct Kafka messages <https://sasquatch.lsst.io/user-guide/directconnection.html>`__

- `ConsDB Python client library <https://github.com/lsst-sitcom/summit_utils/blob/main/python/lsst/summit/utils/consdbClient.py>`__ in ``summit_utils``

- This library is currently implemented using the Web service API, but it can be changed in the future to use Sasquatch.

- `ConsDB Web service API <https://usdf-rsp.slac.stanford.edu/consdb/docs/>`__

- The Web service API (pqserver) provides some of the same advantages as Sasquatch, but it does not provide any buffering, retries, or resiliency. We hope to phase out its usage when Sasquatch becomes available.

- Direct SQL ``INSERT``. This is discouraged. Appropriate credentials would have to be arranged.
3 changes: 2 additions & 1 deletion doc/developer-guide/standards-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
Standards and Practices
#######################

Standards and practices
* The consdb repository will be tagged using `calendar-based versioning <https://calver.org>`__. We are using a ``YY.0M.N`` format with a two-digit short year, a zero-padded month number, and a 1-based sequence number within a month.
* Tags should be annotated git tags (``git tag -a``).

0 comments on commit 8e1f5fc

Please sign in to comment.