SITCOM-1786: Add CountRows command #10

ugyballoons · 2024-12-19T15:32:31Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

fred3m

I really like the idea of implementing the AggregateQuery. It's something that people have requested that doesn't exist yet and I think that it will be useful in a number of ways in the future.

But I'd like to see the PR (and the one for rubintv_visualization) make more use of it. For example, instead of having CountRowsCommand I think that it would be better to have something like AggregateQueryCommand that works for a general query. I don't think too much needs to change since you already designed AggregateQuery to handle more general commands.

Also, please implement tests for new features on the python side. I would test at least two commands, maybe count and sum (or max or min, or something like that). There is a mock database that is already setup during the tests so that should be doable.

python/lsst/rubintv/analysis/service/query.py

python/lsst/rubintv/analysis/service/commands/db.py

fred3m

Looks good except for the one comment that I made. Fix that (and the test) and it should be good to go.

python/lsst/rubintv/analysis/service/commands/db.py

fred3m

You're going to hate me for this, but I'm wondering yet again if there's a better way. We might just be able to update LoadColumnsCommand by giving it an option aggregator : str | None = None attribute (so add that line after https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/commands/db.py#L40-L75). Then update the database.query method to take the same optional argument and if it is non-null, update https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/database.py#L477 to

aggregate_func = getattr(sqlalchemy.func, aggregateor.lower())
query_model = sqlalchemy.select(*[aggregate_func(column) for column in table_columns]).select_from(select_from).where(query_model)

The nice thing about this is that you won't have to add any new commands or query options, on either the server side or client side, and it minimizes the duplication of code. It also put loading column data from queries in a single command so that if we make changes to the API later we only have one palce to do it.

fred3m · 2025-01-08T15:31:03Z

python/lsst/rubintv/analysis/service/commands/db.py

+        # Initialize a dictionary to store aggregate results
+        result = {}
+
+        for column in self.columns:


I think that you can modify this to run the same function on multiple columns (see my comments below)

fred3m · 2025-01-08T15:37:53Z

python/lsst/rubintv/analysis/service/query.py

+        base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from(
+            table
+        )


I think that you can actually run this on multiple columns to make this more efficient

Suggested change

base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from(

table

)

base_query = sqlalchemy.select(

*[aggregate_func(column).label(self.aggregate.lower())for column in columns]

).select_from(table)

Remove AggregatedQueryCommand

ugyballoons · 2025-01-10T16:42:23Z

There's a small amount of snail trail code in there from passing the aggregator reference around but you're absolutely right, this is smaller, clearer and more efficient code.

fred3m

Thanks for the changes Guy! I left a minor comment about docstrings and a request to updates the tests and creation of the select_from object. I'm worried that as is it won't quite work with queries using columns that are not in the columns parameter list.

fred3m · 2025-01-10T16:46:01Z

python/lsst/rubintv/analysis/service/database.py

    ) -> tuple[set[sqlalchemy.Column], set[str], list[sqlalchemy.Column]]:
        """Return the sqlalchemy models for a list of columns.

        Parameters
        ----------
        columns :
            The names of the columns in the database.
+        using_aggregator : bool


For this and other docstring updates in this module: there's no need to include the the type in the docstring since we're using type hints. The documenteer build system should be able to build the docs from the type hints,

fred3m · 2025-01-10T16:50:33Z

python/lsst/rubintv/analysis/service/database.py

        query_model = sqlalchemy.and_(*[col.isnot(None) for col in table_columns])
+        select_from = self.joins.build_join(table_names)


I don't know if this select clause can be here. It's possible that a query might use columns that were not in the original list of columns, so I'm not sure if the joins can be properly built in that case (for example imagine the query was over ra/dec but the column being searched is something else. That's why if query is not None we have to add the table names before querying.

I can see how this might be a problem for the aggregator. Maybe cache the table names before the query and data_id table names are optionally added and use that in your select clause and return dict?

I've moved the table joining to below the custom query logic so that all tables and columns are now included in the FROM clause.

fred3m · 2025-01-10T16:55:12Z

tests/test_command.py

@@ -37,6 +37,60 @@ def execute_command(self, command: dict, response_type: str) -> dict:
        return result["content"]


+class TestLoadColumnsWithAggregatorCommand(TestCommand):


Thanks for adding a test. Since all of the tests use the same columns and command, maybe you can include a setUp method to create them there once. Also, maybe run once without a query and once with, with a query that uses columns that you are not aggregating over, that way we can verify that the query method is working as expected.

fred3m

LGTM. Thanks for your patience and work!

Implement and register CountRowsCommand

adf73cc

ugyballoons requested a review from fred3m December 19, 2024 15:32

Run isort

865082f

fred3m requested changes Dec 19, 2024

View reviewed changes

python/lsst/rubintv/analysis/service/query.py Outdated Show resolved Hide resolved

python/lsst/rubintv/analysis/service/commands/db.py Outdated Show resolved Hide resolved

Generalise to aggregate query and provide tests

d80c203

ugyballoons requested a review from fred3m December 20, 2024 18:03

fred3m reviewed Dec 20, 2024

View reviewed changes

python/lsst/rubintv/analysis/service/commands/db.py Outdated Show resolved Hide resolved

ugyballoons added 3 commits December 20, 2024 18:35

Rewrite to include sub queries and docstrings

ae45418

Implement response_type correctly

6fec4af

Flake8

d8049eb

ugyballoons requested a review from fred3m January 7, 2025 15:43

fred3m requested changes Jan 8, 2025

View reviewed changes

Incorporate aggregated queries into main query

7652c6d

Remove AggregatedQueryCommand

fred3m requested changes Jan 10, 2025

View reviewed changes

ugyballoons added 3 commits January 15, 2025 17:15

Apply join after adding custom query

89f4d42

Refactor aggregator tests

f97269d

Run flake8

3ccdabd

ugyballoons force-pushed the tickets/SITCOM-1786 branch from 3104bff to 3ccdabd Compare January 15, 2025 17:28

Fix careless deletion

e6fc610

fred3m approved these changes Jan 15, 2025

View reviewed changes

ugyballoons merged commit 95a0fdb into main Jan 16, 2025
5 checks passed

ugyballoons deleted the tickets/SITCOM-1786 branch January 16, 2025 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SITCOM-1786: Add CountRows command #10

SITCOM-1786: Add CountRows command #10

ugyballoons commented Dec 19, 2024

fred3m left a comment

fred3m left a comment

fred3m left a comment

fred3m Jan 8, 2025

fred3m Jan 8, 2025

ugyballoons commented Jan 10, 2025

fred3m left a comment

fred3m Jan 10, 2025

fred3m Jan 10, 2025

fred3m Jan 10, 2025

ugyballoons Jan 15, 2025

fred3m Jan 10, 2025

fred3m left a comment

		query_model = sqlalchemy.and_(*[col.isnot(None) for col in table_columns])
		select_from = self.joins.build_join(table_names)

		@@ -37,6 +37,60 @@ def execute_command(self, command: dict, response_type: str) -> dict:
		return result["content"]


		class TestLoadColumnsWithAggregatorCommand(TestCommand):

SITCOM-1786: Add CountRows command #10

SITCOM-1786: Add CountRows command #10

Conversation

ugyballoons commented Dec 19, 2024

Checklist

fred3m left a comment

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment

fred3m Jan 8, 2025

Choose a reason for hiding this comment

fred3m Jan 8, 2025

Choose a reason for hiding this comment

ugyballoons commented Jan 10, 2025

fred3m left a comment

Choose a reason for hiding this comment

fred3m Jan 10, 2025

Choose a reason for hiding this comment

fred3m Jan 10, 2025

Choose a reason for hiding this comment

fred3m Jan 10, 2025

Choose a reason for hiding this comment

ugyballoons Jan 15, 2025

Choose a reason for hiding this comment

fred3m Jan 10, 2025

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment