-
Notifications
You must be signed in to change notification settings - Fork 2
/
CAOM.tex
297 lines (197 loc) · 11.8 KB
/
CAOM.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\title{Common Archive Observation Model}
\ivoagroup{Data Models}
\author[https://wiki.ivoa.net/twiki/bin/view/IVOA/PatrickDowler]{Patrick Dowler}
\editor{Patrick Dowler}
\editor{S\'everin Gaudet}
\previousversion[https://www.opencadc.org/caom2/]{CAOM-2.4}
\begin{document}
\begin{abstract}
???? Abstract ????
\end{abstract}
\section*{Acknowledgments}
???? Ack ????
\section*{Conformance-related definitions}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a
general term for a collection of federated resources that can be used
to conduct astronomical research, education, and outreach.
The \href{https://www.ivoa.net}{International
Virtual Observatory Alliance (IVOA)} is a global
collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
The Common Archive Observation Model (CAOM) is a metadata model that describes
astronomical data stored in archives and make that data findable and accessible
(the first two concepts of the FAIR principles: Findable, Accessible, Interoperable,
Reusable). CAOM places no constraints on the format of the data itself and thus does
not support or enable interoperable and reusable data.
\subsection{Role within the VO Architecture}
\begin{figure}
\centering
% As of ivoatex 1.2, the architecture diagram is generated by ivoatex in
% SVG; copy ivoatex/archdiag-full.xml to role_diagram.xml and throw out
% all lines not relevant to your standard.
% Notes don't generally need this. If you don't copy role_diagram.xml,
% you must remove role_diagram.pdf from SOURCES in the Makefile.
\includegraphics[width=0.9\textwidth]{role_diagram.pdf}
\caption{Architecture diagram for this document}
\label{fig:archdiag}
\end{figure}
Fig.~\ref{fig:archdiag} shows the role this document plays within the
IVOA architecture \citep{2021ivoa.spec.1101D}.
???? and so on, LaTeX as you know and love it. ????
\section{Use Cases}
TODO
\subsection{Describing Data}
The primary design goal of CAOM is to describe astronomical data so that
users (astronomers) can query archives and discover data that is suitable
for a specific research goal or project.
The metadata includes descriptive quantities to help users discover data,
such as coverage in position, energy, and time, logical data types like
images, spectra, and time series, and some some origin metadata like
telescope, instrument, and proposal information.
The model also includes a structure that captures some of the important relationships
between data products, such as different products of an observation that have had
different processing applied and new derived observations that are created by
combining data from several other observations. CAOM also describes the way a
data product is made up of one or more physically stored components while remaining
loosely coupled with the storage system itself.
CAOM provides a transparent way to express data access rights so users can see
which data they have access to and even query for data that will soon be available.
Most importantly, new kinds of data products and areas of study sometimes require
that the model must evolve (and new features) to better describe new data and enable
the queries that new research requires. CAOM has a well defined mechanism to support
evolution and still bring all legacy data forward with minimal effort.
\subsection{Implementation and Operations}
CAOM also supports a range of data management functions that make it implementable
and robust for large scale archive operations. The integrity of metadata that is
created, stored, and accessed can be verified using a metadata checksum algorithm.
The metadata checksums can also be used to optimise interactions like database
transactions and test database persistence and other forms of serialisation for
completeness and correctness.
The model is designed to support robust synchronisation of observation metadata
between the origin and other mirror sites through modification timestamps and
the use of metadata checksums to insure that metadata transport does not introduce
any corruption or incompleteless (details elsewhere ???).
\section{Model Overview}
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM1core.png}
\caption{Main CAOM classes}
\label{fig:core}
\end{figure}
\subsection{Observation}
An Observation is the result of some activity to create data. Data acquisition by a
telescope creates a SimpleObservation, usually with one Plane describing the raw
data.
\subsection{Plane}
A Plane is a single data product within an Observation, usually with one or more
Artifact(s) that correspond to resources (usually files) that are stored. Processing
that transforms the data in a Plane in some fashion usually creates a new Plane in
the same observation. Common processing like calibration to remove the instrument
signature change the calibration level of the plane.
\subsection{Artifact}
An Artifact holds a reference to an externally stored resource: a file, a
database object (schema or table), a collection of files (directory), etc.
\subsection{Part}
A Part describes a logical subcomponent of an Artifact. For example, a single
extension in a multi-extension FITS file, or a file inside a tar file would be parts
of such an Artifact. The meaning of Part depends on the type of resource that the
Artifact refers to, as described by the Artifact.contentType.
Note: Part is not shown in the UML diagrams. TBD: keep, deprecate, or remove?
\subsection{Chunk}
A Chunk describes a single data array using WCS (World Coordinate System) concepts.
Note: Chunk is not shown in the UML diagrams. TBD: keep, deprecate or remove?
\subsection{Data Types}
Some of the classes in the model are intended to be used as data types (e.g. columns
types in a database and exposed as such in a TAP service).
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM2datatypes.png}
\caption{CAOM Data Types}
\label{fig:datatypes}
\end{figure}
\subsection{Vocabularies}
CAOM uses a mixture of enumerations and vocabulary references. As it has evolved,
several concepts began as enumerations and were later converted to use a vocabulary
when it became clear that the IVOA Vocabularies process was a more appropriate way
to support the gradual evolution of set of concepts needed by the community.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM3vocabularies.png}
\caption{Enumerations and Vocabularies}
\label{fig:vocab}
\end{figure}
\subsection{Entity}
The Entity concept defines the common metadata necessary to persist and validate
instances of classes within the model. Practically, the entity classes in the
model are related by one-to-many composition and thus indicate a limit when
implementing (e.g. in a relational mapping, each entity has to be in a separate
table and one table per entity would be the minimum number of tables required
to persist complete instances).
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{src/uml/CAOM4entities.png}
\caption{CAOM Entities}
\label{fig:entity}
\end{figure}
% include external generated file so this tex doc is easy to edit
\input{generated.tex}
\appendix
\section{Changes from Previous Versions}
\subsection{Changes from OpenCADC CAOM-2.4}
\subsubsection{General Changes}
- change `Plane.position.bounds` to be mandatory
- change `Plane.energy.bounds` to be mandatory
- change `Plane.time.bounds` to be mandatory
- change `Plane.polarization.states` to require at least 1 value
The above changes mean that each of position/energy/time/polarization objects
have one mandatory field and queries using a single is not null constraint
can be used to detect if the object is present.
- add `ArtifactDescription` entity to support providing descriptions with links
(eg in a DataLink output)
- add `Artifact.descriptionID` to refer to a shared `ArtifactDescription`
- add `Proposal.reference` as optional proposal metadata (URI to web page, paper, etc)
- split `Entity` into a base `Entity` class with main properties and a `CaomEntity`
suitable for having child entities (by composition); one or both could be extracted
and re-used in other models (TBD)
\subsubsection{Radio Support}
For radio observations, many properties such as field-of-view, spatial and spectral resolution are dependent on frequency. Modern,
wideband facilities can have large frequency-dependent variation in these properties within a single observation.
- add `Plane.position.minBounds` (Shape) to describe variable coverage (bounds is already max bounds)
- add `Plane.position.maxAngularScale` (Interval) to describe min/max scale of signal/objects in the data
- add `Plane.energy.resolution` (double) to describe the absolute resolution (representative value, probably mean/pixel)
- add `Plane.energy.resolutionBounds` (Interval) to describe the min/max absolute resolution when it varies across the data
- add `Plane.time.exposureBounds` (Interval) to describe the min/max exposure time when it varies across the data
- change `Plane.energy.restwav` to `Plane.energy.rest` so the name makes sense with different profiles (quantities and units)
- remove `Plane.position.timeDependent` as it was only used to explain why Plane.position.bounds was null because of tracking mode
- add `Observation.telescope.trackingMode` and refer to a non-existent IVOA vocabulary to describe the
tracking/pointing of the telescope during the observation; null indicates sidereal tracking (for backwards compat)
- add `Plane.uv` (Visibility) to describe UV-plane (expect: only used when dataProductType=visibility)
- add `Plane.uv.distance` (Interval) to describe the min and max distance in the UV plane
- add `Plane.uv.distributionEccentricity` (double); mandatory or optional within Visibility?
- add `Plane.uv.distributionFill` (double); mandatory or optional within Visibility?
- change `Plane.polarization.states` to refer to a (non-existent) vocabulary (replaces PolarizationState enum) that could be extracted from WCS, ObsCore, and community usage/extensions
\subsubsection{Use of Identifiers}
- replace `Observation.observationID` (String) with `Observation.uri` (URI) to be the complete self contained identifier; values would be used in `DerivedObservation.members` to refer to other observations
- replace `Plane.productID` (String) with `Plane.uri` (URI) to be the complete self-contained identifier; values would be used in `Plane.provenance.inputs` to refer to other planes
- remove `Plane.creatorID` because it is essentially redundant vs Plane.uri
A `publisherID` value is strictly outside the core model because the value must be changed (generated) when CAOM metadata is synced from one publisher to a differnt publisher.
\subsubsection{Reconcile with IVOA Usage}
- change `Plane.dataProductType` to refer directly to the IVOA product-type vocabulary
- change `Artifact.productType` to refer directly to the IVOA DataLink Core (semantics) vocabulary
- change `Plane.observable.ucd` to refer directly to IVOA UCD1+
- add `Plane.position.calibration` and refer to a non-existent IVOA vocabulary that could be extracted from the ObsCore optional section
- add `Plane.energy.calibration` (as above)
- add `Plane.time.calibration` (as above)
- add `Plane.observable.calibration` (as above)
- remove SampledInterval in favour of separate Interval and Interval[] columns in Energy, Time, CustomAxis
- remove MultiPolygon in favour of separate Polygon and MultiShape columns; SegmentType and Vertex removed (unused)
% NOTE: IVOA recommendations must be cited from docrepo rather than ivoabib
% (REC entries there are for legacy documents only)
\bibliography{ivoatex/ivoabib,ivoatex/docrepo}
\end{document}