-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util.parse
should support accessing tags, which would require a less restrictive regex
#299
Comments
Thank you for the detailed report! Since we want to fully implement lakeFS's URI specification, we must also match their support for ref expressions. Your regex suggestion sounds great, would you be willing to send a PR? I agree that the error message is a bit sparse, and even misleading in the case you presented. We might need to check our callsites for the order of |
This works great now, thanks so much for the quick fix! |
Given a list of LakeFS repositories and LakeFS authentication information, this script will check to see if there are any duplicate study IDs amongst dbGaP XML files anywhere in those repositories. See https://renci.atlassian.net/browse/DUG-374 for details. LakeFS authentication information can be provided in one of the [two ways supposed by the LakeFS library and client](https://lakefs.io/blog/improved-python-experience/#configuring-a-lakefs-client): 1. By creating an ~/.lakectl.yaml file with the authentication information. 2. By setting the LAKECTL_SERVER_ENDPOINT_URL, LAKECTL_CREDENTIALS_ACCESS_KEY_ID and LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY environment variables. We require at least `lakefs-spec ~= 0.11.1`, because otherwise [`.`s in tags are not supported](aai-institute/lakefs-spec#299) (implemented in [lakefs-spec v0.11.1](https://github.com/aai-institute/lakefs-spec/releases/tag/v0.11.1)). This PR also removes Avalon from requirements.txt, since we no longer use that.
util.parse()
currently assumes that the branch name is "Only letters, digits, underscores and dash, no leading dash."lakefs-spec/src/lakefs_spec/util.py
Lines 118 to 130 in 06134be
However, users might want to use a tag name instead of a branch name, which has a much more complicated syntax. Notably, the current regex doesn't match the two example tags in the LakeFS docs (
v2.3
anddev-jane-before-v2.3-merge
). It might also be useful to support ref expressions.I would suggest making the regex less restrictive -- at a minimum
(\w[\w\-\.]*)
, but possibly going all the way to support ref expressions to include[~^]
. The error could also be rewritten to be a bit clearer (e.g. becauselakefs.ls(...)
calls this after calling_strip_protocol()
, I get the following error message:ValueError: expected path with structure lakefs://<repo>/<ref>/<resource>, got 'heal-studies/v2.0/'
-- which made me think that I had forgotten to add thelakefs://
prefix before realizing that the branch was being rejected for being incorrect).Alternatively, if I'm misunderstanding how tags and ref expressions should be used in lakefs-spec, please let me know -- and thank you so much for building this great library!
The text was updated successfully, but these errors were encountered: