Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring the auto generated relative paths #159

Merged
merged 7 commits into from
Oct 24, 2024

Conversation

stuartmcalpine
Copy link
Collaborator

  • Add .gen_paths directory as a prefix for all auto generated relative_paths. Users therefore cannot specify a manual relative path starting with this directory, however they are they free to specify any other relative path without having to worry about clashes with auto generated names

@JoanneBogart
Copy link
Collaborator

I took a look at your implementation for excluding paths starting with .gen_paths. I had something much simpler in mind: just exclude any string with starts with .gen_paths, no need to break it up into path components. Certainly we need to exclude a regular file called .gen_paths which I don't think is handled by your code, but to avoid confusion, even though it's not strictly necessary, I'd also like to exclude .gen_paths.hdf or even .gen_paths_user/some_file.hdf. I don't think it will cause problems for users; they have plenty of other naming choices.

@stuartmcalpine
Copy link
Collaborator Author

I took a look at your implementation for excluding paths starting with .gen_paths. I had something much simpler in mind: just exclude any string with starts with .gen_paths, no need to break it up into path components. Certainly we need to exclude a regular file called .gen_paths

Changes should now reflect this. I've added some more checks also for the relative_path in general (in dataset.py). Including illegal chars, but lmk if this is still the behavior you want here.

Users should not be able to put anything into the .gen_paths dir, or chose the relative_path just as .gen_paths

I'd also like to exclude .gen_paths.hdf or even .gen_paths_user/some_file.hdf. I don't think it will cause problems for users; they have plenty of other naming choices.

Not added this yet, any reason about .hdf?

@stuartmcalpine
Copy link
Collaborator Author

I've changed the way single files are ingested to preserve filename suffixes.

Now when a single file is ingested, the autogenerated part (i.e., the directory with name and version) is reserved for the directory name (within .gen_paths) and the file is simply copied into the folder, preserving the filename. e,g., something like .gen_paths/mydataset_1.0.0/this_was_the_filename.txt

Copy link
Collaborator

@JoanneBogart JoanneBogart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks

@stuartmcalpine stuartmcalpine merged commit 81949eb into main Oct 24, 2024
26 checks passed
@stuartmcalpine stuartmcalpine deleted the u/stuart/refactor_autogenerated_name branch October 24, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants