-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow merge on expressions #388
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
col: sqlalchemy.ColumnClause = ( | ||
sqlalchemy.column(column) | ||
if isinstance(column, str) | ||
else sqlalchemy.column(column.name, column.type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[F] It does not seem possible to overwrite the table of a column which already has the table set. I.e test_merge_with_itself_column
fails without this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is no longer required for this PR to work. I'd be happy to split it out if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for me personally I am OK to leave it here. It feels natural to have this change here.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #388 +/- ##
==========================================
+ Coverage 87.28% 87.31% +0.02%
==========================================
Files 92 92
Lines 9960 9981 +21
Branches 2033 2041 +8
==========================================
+ Hits 8694 8715 +21
+ Misses 912 911 -1
- Partials 354 355 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
dfcda52
to
6b31c8a
Compare
5b86816
to
da07321
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍
Couple comments below, also Studio tests fails 😥. But it is not related to this PR, as far as I can see.
if "." in name: | ||
name_path = name.split(".") | ||
elif DEFAULT_DELIMITER in name: | ||
name_path = name.split(DEFAULT_DELIMITER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wonder, if it is possible to have both dot (.
) and DEFAULT_DELIMITER
(__
) in column name? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Seems like one is always swapped for the other.
*right_on | ||
).db_signals() # type: ignore[assignment] | ||
|
||
if len(right_on_columns) != len(on_columns): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we loosing this check after these changes or I miss something? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is moved down. If any of the columns fail to resolve then we will have an entry in errors
and we'll raise a DatasetMergeError
.
col: sqlalchemy.ColumnClause = ( | ||
sqlalchemy.column(column) | ||
if isinstance(column, str) | ||
else sqlalchemy.column(column.name, column.type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for me personally I am OK to leave it here. It feels natural to have this change here.
Closes #299
This PR allows users to merge DataChains using
Column
s and/or expressions (e.g.path.file_stem(dc.c("source.path"))
). Examples of the new syntax are shown inexamples/multimodal/clip_inference.py
&examples/multimodal/wds.py
. Once this change has been merged I will make the appropriate changes in thedatachain-examples
repo.It should be noted that the change is backwards compatible and
merge
will continue to accept strings for theon
andright_on
parameters.