Validation of DVC files and pipelines without downloading actual data #5949
Replies: 6 comments 3 replies
-
Similar logic can also be applied to |
Beta Was this translation helpful? Give feedback.
-
Question 1 Related #4657. But here it is about Question 2, |
Beta Was this translation helpful? Give feedback.
-
I don't think we should actually perform fetch and checkout, just pretend that we did and ignore the fact that some files are actually missing in the workspace. Frankly I'm not sure how to name the flag so that its behavior would be obvious. |
Beta Was this translation helpful? Give feedback.
-
Well, in a way #4657 is connected, yes, one could run |
Beta Was this translation helpful? Give feedback.
-
I think the main issue with current |
Beta Was this translation helpful? Give feedback.
-
Does any of you have a workaround for this problem that |
Beta Was this translation helpful? Give feedback.
-
While using DVC I often have two following questions:
Can I pull data from the remote? I.e. do all
*.dvc
files have corresponding data?Currently I solve this using
dvc status -c | grep "missing:"
(like described in status: flag to error-out if something is not up to date #4436), but I think that's suboptimal, because cache is also checked and it's a bit of a workaround.I can't just use an exit-code, because I don't want to download all the data first, so all the files which are present in remote will be marked as
deleted
and all the files which aren't asmissing
, which means always non-zero exit-code.If I pull the data, will pipeline's outputs change?
I don't know any easy ways to do it, because data which are not downloaded considered
deleted
and notmodified
, even though hash values in*.dvc
file and indvc.lock
don't match.I propose implementing
--pull
flag fordvc status
which would make it behave as ifdvc pull
was ran just before that. Particularly that means nodeleted
entries in output and no actual file hash computations before making comparison.This solves the first case and the second case: we can just use exit-code now.
I think it also doesn't complicate things much from an interface perspective.
Beta Was this translation helpful? Give feedback.
All reactions