Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file: support exporting files as a symlink #819

Merged
merged 1 commit into from
Jan 16, 2025
Merged

file: support exporting files as a symlink #819

merged 1 commit into from
Jan 16, 2025

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jan 15, 2025

Closes #807.

Example Usage:

dc.export_files("output_dir", link_type="symlink")

@skshetry skshetry requested a review from a team January 15, 2025 06:09
Copy link

cloudflare-workers-and-pages bot commented Jan 15, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2d9547f
Status: ✅  Deploy successful!
Preview URL: https://974724d7.datachain-documentation.pages.dev
Branch Preview URL: https://symlink-export.datachain-documentation.pages.dev

View logs

Comment on lines +268 to +273
if link_type == "symlink":
try:
return self._symlink_to(dst)
except OSError as exc:
if exc.errno not in (errno.ENOTSUP, errno.EXDEV, errno.ENOSYS):
raise
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do in case of overwrite?

Looks like export allows you to overwrite files over and over again, but this _symlink_to will fail with FileExistsError.

self.ensure_cached()
source = self.get_local_path()
assert source, "File was not cached"
elif self.source.startswith("file://"):
Copy link
Member Author

@skshetry skshetry Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using self.source.startswith("file://") to avoid creating a filesystem.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 63.15789% with 7 lines in your changes missing coverage. Please review.

Project coverage is 87.47%. Comparing base (88737b6) to head (2d9547f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/lib/file.py 61.11% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #819      +/-   ##
==========================================
- Coverage   87.54%   87.47%   -0.07%     
==========================================
  Files         128      128              
  Lines       11326    11344      +18     
  Branches     1533     1538       +5     
==========================================
+ Hits         9915     9923       +8     
- Misses       1026     1033       +7     
- Partials      385      388       +3     
Flag Coverage Δ
datachain 87.41% <63.15%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

assert (tmp_path / "dir" / "myfile.txt").is_symlink()

dst = Path(file.get_local_path()) if use_cache else path
assert (tmp_path / "dir" / "myfile.txt").resolve() == dst
Copy link
Member Author

@skshetry skshetry Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using resolve() instead of readlink() because WindowsPath.readlink() returns an extended path that fails to compare (note the //?/ prefix).

-- WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/popen-gw2/default_working_dir0/.datachain/cache/29/89fe8cd7eda0a1e2e28bf837e137f5bdd8d20c506348f67f5671f5a036d529')
++ WindowsPath('//?/C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/popen-gw2/default_working_dir0/.datachain/cache/29/89fe8cd7eda0a1e2e28bf837e137f5bdd8d20c506348f67f5671f5a036d529')

@@ -2418,6 +2419,7 @@ def export_files(
signal="file",
placement: FileExportPlacement = "fullpath",
use_cache: bool = True,
link_type: Literal["copy", "symlink"] = "copy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we update / add docs while we do this (or as a followup)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 2d9547f

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we on this - any ideas about a better name for the export_file? + it would be great to update docs. Thanks for a quick turnaround @skshetry

@skshetry
Copy link
Member Author

while we on this - any ideas about a better name for the export_file?

I could not think of a good name tbh. So, I left it at that.

Some of my suggestions are: download_files, checkout_files, get_files and link_files.
I prefer download_files makes more sense if copy is going to be a default link type, and link_files if link types other than copy is going to be a default link type.

@@ -2418,6 +2419,7 @@ def export_files(
signal="file",
placement: FileExportPlacement = "fullpath",
use_cache: bool = True,
link_type: Literal["copy", "symlink"] = "copy",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing I forgot to mention - if files fail to symlink, it will always fall back to copy.

@skshetry skshetry merged commit aad99e2 into main Jan 16, 2025
36 of 38 checks passed
@skshetry skshetry deleted the symlink-export branch January 16, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symlink support
2 participants