Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SITCOM-1786: Add CountRows command #10

Merged
merged 11 commits into from
Jan 16, 2025
Merged

SITCOM-1786: Add CountRows command #10

merged 11 commits into from
Jan 16, 2025

Conversation

ugyballoons
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@ugyballoons ugyballoons requested a review from fred3m December 19, 2024 15:32
Copy link
Collaborator

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of implementing the AggregateQuery. It's something that people have requested that doesn't exist yet and I think that it will be useful in a number of ways in the future.

But I'd like to see the PR (and the one for rubintv_visualization) make more use of it. For example, instead of having CountRowsCommand I think that it would be better to have something like AggregateQueryCommand that works for a general query. I don't think too much needs to change since you already designed AggregateQuery to handle more general commands.

Also, please implement tests for new features on the python side. I would test at least two commands, maybe count and sum (or max or min, or something like that). There is a mock database that is already setup during the tests so that should be doable.

python/lsst/rubintv/analysis/service/query.py Outdated Show resolved Hide resolved
python/lsst/rubintv/analysis/service/commands/db.py Outdated Show resolved Hide resolved
@ugyballoons ugyballoons requested a review from fred3m December 20, 2024 18:03
Copy link
Collaborator

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for the one comment that I made. Fix that (and the test) and it should be good to go.

python/lsst/rubintv/analysis/service/commands/db.py Outdated Show resolved Hide resolved
@ugyballoons ugyballoons requested a review from fred3m January 7, 2025 15:43
Copy link
Collaborator

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to hate me for this, but I'm wondering yet again if there's a better way. We might just be able to update LoadColumnsCommand by giving it an option aggregator : str | None = None attribute (so add that line after https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/commands/db.py#L40-L75). Then update the database.query method to take the same optional argument and if it is non-null, update https://github.com/lsst-ts/rubintv_analysis_service/blob/main/python/lsst/rubintv/analysis/service/database.py#L477 to

aggregate_func = getattr(sqlalchemy.func, aggregateor.lower())
query_model = sqlalchemy.select(*[aggregate_func(column) for column in table_columns]).select_from(select_from).where(query_model)

The nice thing about this is that you won't have to add any new commands or query options, on either the server side or client side, and it minimizes the duplication of code. It also put loading column data from queries in a single command so that if we make changes to the API later we only have one palce to do it.

# Initialize a dictionary to store aggregate results
result = {}

for column in self.columns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can modify this to run the same function on multiple columns (see my comments below)

Comment on lines 290 to 292
base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from(
table
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can actually run this on multiple columns to make this more efficient

Suggested change
base_query = sqlalchemy.select(aggregate_func(column).label(self.aggregate.lower())).select_from(
table
)
base_query = sqlalchemy.select(
*[aggregate_func(column).label(self.aggregate.lower())for column in columns]
).select_from(table)

Remove AggregatedQueryCommand
@ugyballoons
Copy link
Contributor Author

There's a small amount of snail trail code in there from passing the aggregator reference around but you're absolutely right, this is smaller, clearer and more efficient code.

Copy link
Collaborator

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes Guy! I left a minor comment about docstrings and a request to updates the tests and creation of the select_from object. I'm worried that as is it won't quite work with queries using columns that are not in the columns parameter list.

) -> tuple[set[sqlalchemy.Column], set[str], list[sqlalchemy.Column]]:
"""Return the sqlalchemy models for a list of columns.

Parameters
----------
columns :
The names of the columns in the database.
using_aggregator : bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and other docstring updates in this module: there's no need to include the the type in the docstring since we're using type hints. The documenteer build system should be able to build the docs from the type hints,

query_model = sqlalchemy.and_(*[col.isnot(None) for col in table_columns])
select_from = self.joins.build_join(table_names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this select clause can be here. It's possible that a query might use columns that were not in the original list of columns, so I'm not sure if the joins can be properly built in that case (for example imagine the query was over ra/dec but the column being searched is something else. That's why if query is not None we have to add the table names before querying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see how this might be a problem for the aggregator. Maybe cache the table names before the query and data_id table names are optionally added and use that in your select clause and return dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the table joining to below the custom query logic so that all tables and columns are now included in the FROM clause.

@@ -37,6 +37,60 @@ def execute_command(self, command: dict, response_type: str) -> dict:
return result["content"]


class TestLoadColumnsWithAggregatorCommand(TestCommand):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a test. Since all of the tests use the same columns and command, maybe you can include a setUp method to create them there once. Also, maybe run once without a query and once with, with a query that uses columns that you are not aggregating over, that way we can verify that the query method is working as expected.

Copy link
Collaborator

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience and work!

@ugyballoons ugyballoons merged commit 95a0fdb into main Jan 16, 2025
5 checks passed
@ugyballoons ugyballoons deleted the tickets/SITCOM-1786 branch January 16, 2025 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants