-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update queries #163
Update queries #163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one inline comment about mixing case-insensitivity with wildcard parsing.
A second issue has to do with escaping wildcards. Postgres treats both %
and _
as potential wildcards. I believe sqlalchemy just passes the string as-is to Postgres. One can specify an escape character (by default \
). I would say we should go with the default escape character - it's already something we exclude from names. Then before using .like or .ilike any underscore characters should be escaped (otherwise an underscore matches any single character; I think we can do without that capability). Maybe we also need to look for backslashes and, if found, escape them as well, but it's unlikely to come up except possibly in a field like description
.
I'm also wondering whether we should use *
as the wildcard character rather than %
. One advantage is it's already excluded from names (but not from all string fields). The other is that people are used to it as a wildcard character. If we use it, the procedure would be
- escape all
_
and%
- replace
*
with%
- invoke
.like
or.ilike
as appropriate
There still is an issue with either*
or%
existing in the string when it's not intended to be a matching character. Maybe we just have to disallow "like" comparisons for all but a carefully-selected set of string columns.
src/dataregistry/query.py
Outdated
return stmt.where(column_ref[0].__getattribute__(the_op)(value)) | ||
# Special case where we are partially matching with a wildcard | ||
if f[1] == "~=": | ||
return stmt.where(column_ref[0].ilike(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can assume that wildcard searches should also be case-insensitive. Unfortunately this probably means we need two new operators: 1. wildcard+case-insensitive (current definition of ~=
), using sqlalchemy .ilike
and 2. wildcard+case-sensitive, using sqlalchemy .like
. If someone just wants case-insensitivity without wildcard searching they could use 1.
Another issue with using either .like
and .ilike
is escaping the special characters used in pattern-matching. I'll say more about that in a separate comment.
Can either just leave it, or limit the columns. There is no limit on the columns currently, I don't know how I'd imagine this will primarily be used on the |
I'm thinking it should only apply to a limited set of columns, but more than just |
Looks like the partial match can only be done for string columns, so I've added a restriction to only allow the feature for certain string columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks.
Make more options for querying
~=
query operator that can utalise the.ilike
filter to allow non-case-sensitive filering with wildcards (i.e., the%
character).dregs ls
can now filter on the dataset name, including%
wildcards, using the--name
option.dregs_ls
can return arbitrary columns using the--return_cols
option