Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update queries #163

Merged
merged 12 commits into from
Nov 26, 2024
Merged

Update queries #163

merged 12 commits into from
Nov 26, 2024

Conversation

stuartmcalpine
Copy link
Collaborator

Make more options for querying

  • There is now a ~= query operator that can utalise the .ilike filter to allow non-case-sensitive filering with wildcards (i.e., the % character).
  • dregs ls can now filter on the dataset name, including % wildcards, using the --name option.
  • dregs_ls can return arbitrary columns using the --return_cols option

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one inline comment about mixing case-insensitivity with wildcard parsing.
A second issue has to do with escaping wildcards. Postgres treats both % and _ as potential wildcards. I believe sqlalchemy just passes the string as-is to Postgres. One can specify an escape character (by default \). I would say we should go with the default escape character - it's already something we exclude from names. Then before using .like or .ilike any underscore characters should be escaped (otherwise an underscore matches any single character; I think we can do without that capability). Maybe we also need to look for backslashes and, if found, escape them as well, but it's unlikely to come up except possibly in a field like description.
I'm also wondering whether we should use * as the wildcard character rather than %. One advantage is it's already excluded from names (but not from all string fields). The other is that people are used to it as a wildcard character. If we use it, the procedure would be

  • escape all _ and %
  • replace * with %
  • invoke .like or .ilike as appropriate
    There still is an issue with either * or % existing in the string when it's not intended to be a matching character. Maybe we just have to disallow "like" comparisons for all but a carefully-selected set of string columns.

return stmt.where(column_ref[0].__getattribute__(the_op)(value))
# Special case where we are partially matching with a wildcard
if f[1] == "~=":
return stmt.where(column_ref[0].ilike(value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can assume that wildcard searches should also be case-insensitive. Unfortunately this probably means we need two new operators: 1. wildcard+case-insensitive (current definition of ~=), using sqlalchemy .ilike and 2. wildcard+case-sensitive, using sqlalchemy .like. If someone just wants case-insensitivity without wildcard searching they could use 1.
Another issue with using either .like and .ilike is escaping the special characters used in pattern-matching. I'll say more about that in a separate comment.

@stuartmcalpine
Copy link
Collaborator Author

  • Added the ~== operator for case sensitive wildcard searches
  • Changed the wildcard operator from % to * (and _ is also escaped)

There still is an issue with either * or % existing in the string when it's not intended to be a matching character. Maybe we just have to disallow "like" comparisons for all but a carefully-selected set of string columns.

Can either just leave it, or limit the columns. There is no limit on the columns currently, I don't know how ilike works on non-string columns...

I'd imagine this will primarily be used on the name column.

@JoanneBogart
Copy link
Collaborator

JoanneBogart commented Nov 4, 2024

I'm thinking it should only apply to a limited set of columns, but more than just name. I'd include at least owner and relative_path. Maybe also version_string, creator_uid and access_api

@stuartmcalpine
Copy link
Collaborator Author

I'm thinking it should only apply to a limited set of columns, but more than just name. I'd include at least owner and relative_path. Maybe also version_string, creator_uid and access_api

Looks like the partial match can only be done for string columns, so I've added a restriction to only allow the feature for certain string columns

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

@stuartmcalpine stuartmcalpine merged commit 955bda0 into main Nov 26, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants