SciDataFlow RFC: #1 #25
vsbuffalo
announced in
Announcements
Replies: 1 comment
-
Dear @vincebuffalo,
Thanks for reaching out. First, I would be very interested in support for Amazon S3 !
Regarding your proposal, I'm not sure to understand correctly your approach for supplementary data. Do you want to to define manually a list of files (in data_manifest.yml) on the remote to mirror part of a directory structure ? (That's what I'm doing). Otherwise, could you add an example ?
Thanks,
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello SciDataFlow users!
This is a request for comments, in particular about future directions of SciDataFlow (SDF).
In particular, as I use SDF for my own scientific research, I increasingly see the need to iron out some of the rough patches of the workflow. As someone who spends way too many hours doing computational biology work, I know how important it is to have a clean, fast, simple workflow. When I developed SciDataFlow, I tried to design the workflow to be Git-like (but easier), so it would be a breeze to use.
However, in my use there is one continual sticking point that frustrates me: remote repositories do not support directory structures, but we need such hierarchical structures for file/data organization. To be clear, this constraint is imposed by the remote repositories which (to my knowledge) all do not support hierarchical directory structures.
To get around this limitation, there are two workflows:
However, increasingly I want SciDataFlow to support more extensible, less constraining ways of storing remote data than existing remote data repositories. One path I see forward is to introduce support of data storage on Amazon s3 buckets.
So this is a request for comment on this plan. Please let me know how this proposal sounds, if you'd use it, if you wouldn't, etc. If there are other sticking points in the SciDataFlow workflow, please let me know too (you can also start a new discussion thread if you like)!
PS: My work on SciDataFlow is unfortunately a bit delayed — as we all know, the academic scientific path does not really reward projects like this. I still have some PRs to incorporate, and I try to my best to address serious bugs ASAP. I will do my very best to continue to keep this software running perfectly.
Beta Was this translation helpful? Give feedback.
All reactions