-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Start on notebook for using GCRCatalogs with data registry
- Loading branch information
1 parent
d403cfe
commit a8c3beb
Showing
1 changed file
with
236 additions
and
0 deletions.
There are no files selected for viewing
236 changes: 236 additions & 0 deletions
236
docs/source/tutorial_notebooks/query_gcr_datasets.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "302fd57e-ae4f-4a22-95ad-a69573212a98", | ||
"metadata": {}, | ||
"source": [ | ||
"<div style=\"overflow: hidden;\">\n", | ||
" <img src=\"images/DREGS_logo_v2.png\" width=\"300\" style=\"float: left; margin-right: 10px;\">\n", | ||
"</div>\n", | ||
"\n", | ||
"# Getting started: Part X - Using dataregistry with GCRCatalogs\n", | ||
"\n", | ||
"Here we show how to access catalogs belonging to the `GCRCatalogs` package via the information stored in the dataregistry.\n", | ||
"\n", | ||
"### What we cover in this tutorial\n", | ||
"\n", | ||
"In this tutorial we will learn how to:\n", | ||
"\n", | ||
"1) Find and read the catalogs using the standard GCRCatalogs interface\n", | ||
"2) Query catalog metadata directly using the data registry, then use that metadata to find and read catalogs\n", | ||
"\n", | ||
"### Before we begin\n", | ||
"\n", | ||
"Currently (November, 2024) the required versions of gcr-catalogs and dataregistry are only available in the `desc-python-bleed` kernel. Make sure you have selected that kernel while running this tutorial.\n", | ||
"\n", | ||
"If you haven't done so already, check out the [getting setup](https://lsstdesc.org/dataregistry/tutorial_setup.html) page from the documentation if you want to run this tutorial interactively." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a4eac203-202b-47fd-b896-a37f54fc6a57", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1) Using the usual GCRCatalogs interface\n", | ||
"\n", | ||
"Note that, using this method, we will not be calling any data registry services directly, but the data registry database still must be accessible. That means you must have gone through at least part of the tutorial setup referred to above, in particular the steps of creating a couple small files needed for authentication. See details [here](http://lsstdesc.org/dataregistry/installation.html#one-time-setup)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "29c7fb42-23c0-4026-a008-c9274a26ad0f", | ||
"metadata": {}, | ||
"source": [ | ||
"### Configuring GCRCatalogs \n", | ||
"\n", | ||
"A quick way to check everything is set up correctly is to run the first cell below, which should load the GCRCatalogs package, and print the package version. It should be at least 1.9.0." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9231e5d8-a8f5-4eba-a020-0a7933b9a24a", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import GCRCatalogs\n", | ||
"print(f\"Working with GCRCatalogs version: {GCRCatalogs.__version__}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c9b03081-ba94-46ed-9ef5-a6f0c48d6c4c", | ||
"metadata": {}, | ||
"source": [ | ||
"We need to tell `GCRCatalogs` whether to use the old-style metadata access method (reading config files) or to fetch metadata from the data registry. There are two ways to do this:\n", | ||
"\n", | ||
"1. Before running, set the environment variable `GCR_CONFIG_SOURCE`to one of the two allowed values: \"files\" or \"dataregistry\"\n", | ||
"2. Invoke the GCRCatalogs routine `ConfigSource.set_config_source`\n", | ||
"\n", | ||
"Here we use the second method.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f9901d89-b1d7-48c9-8110-ce16ecba3a7e", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Tell GCRCatalogs to use the data registry\n", | ||
"GCRCatalogs.ConfigSource.set_config_source(dr=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "17120e54-30d7-44a5-8875-d767ea7800d2", | ||
"metadata": {}, | ||
"source": [ | ||
"Now we can use any of the standard GCRCatalogs query routines." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "00c6d355-dca0-42a1-ae82-7fdbd1a46afa", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Find catalogs whose name starts with \"cosmo\"\n", | ||
"cosmos = GCRCatalogs.get_available_catalog_names(name_startswith=\"cosmo\")\n", | ||
"cosmos" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6adb2f98-6103-41ef-a183-75cea430358a", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load a catalog; find out something about it\n", | ||
"cat = GCRCatalogs.load_catalog(\"cosmoDC2_v1.1.4_small\")\n", | ||
"cat.native_filter_quantities" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8a1a2f80-2909-416e-86d3-4a9b71a1a9ef", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2) Using the data registry directly\n", | ||
"\n", | ||
"We learned how to connect to the DESC data registry in other tutorials using the `DataRegistry` class. Let's connect again using the defaults _except_ for the schema. Since the catalogs maintained by GCRCatalogs are stored in the DESC production shared area, their database entries are in the production schema, not in the (default) working schema.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "bfad9170-a524-4aef-9217-38470373a6b8", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from dataregistry import DataRegistry\n", | ||
"from dataregistry.schema import DEFAULT_SCHEMA_PRODUCTION\n", | ||
"\n", | ||
"# Establish connection to the production schema\n", | ||
"datareg = DataRegistry(schema=DEFAULT_SCHEMA_PRODUCTION)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e06e4d6f-de30-4c34-94bf-0756d3b7c017", | ||
"metadata": {}, | ||
"source": [ | ||
"### Dataset attributes\n", | ||
"\n", | ||
"Recall that a `DataRegistry` instance has a member `Query` which provides all the query services.\n", | ||
"\n", | ||
"As described in \"Getting started: Part 3 - Simple queries\" you can ask for values of attributes of datasets, subject to one or more filters. You can find out what those attributes (\"columns\" in database parlance) are with one of those services:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "54a52029-2908-4056-bc68-4a87f6c3e6df", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"all_columns = datareg.Query.get_all_columns()\n", | ||
"print(all_columns)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "fa586592-2c2e-428b-b443-33ca26038add", | ||
"metadata": {}, | ||
"source": [ | ||
"That is a list of __all__ columns from __all__ tables, maybe more than we bargained for. Let's restrict it to columns in the `dataset` table." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "04082abb-585e-4d68-a4c4-874d22be70be", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"dataset_columns = [col for col in all_columns if col.startswith('dataset.')]\n", | ||
"print(dataset_columns)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "raw", | ||
"id": "c3baa2fd-d0fa-4acc-96be-e030cb63cf47", | ||
"metadata": {}, | ||
"source": [ | ||
"Among the more interesting for our purposes are `name`, `relative_path`, `access_api` and `access_api_configuration`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "bdbe8537-6195-4239-bbb8-976daacdfab7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "desc-python-bleed", | ||
"language": "python", | ||
"name": "desc-python-bleed" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.7" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |