CRUD updates, dataset mutation and BlobTree API updates #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a big batch of changes, implementing
BlobTree
API to make it more coherent and simpleropen(write=true)
.DataProject
A lot of these changes are intertwined so I've put all this here as a draft, but I'll probably need to break this apart into separate PRs.
BlobTree
BlobTree now has a largely dictionary-like interface:
keys(tree)
pairs(tree)
haskey(tree, path)
tree[path]
newdir(tree, path)
,newfile(tree, path)
delete!(tree, path)
Where
path
is either a relative pathRelPath
type, or anAbstractString
(in which case it'll be split on/
to become a relative path).Unlike Dict, iteration of BlobTree currently iterates values (not key value pairs). This has some benefits - for example, broadcasting processing across files in a directory.
isdir()
,isfile()
- determine whether a child of tree is a directory or file.Example
You can create a new temporary BlobTree via the
newdir()
function and fill it with combinations ofnewfile()
ornewdir()
You can also get access to a
BlobTree
by usingDataSets.from_path()
with alocal directory name. For example:
AbstractDataProject interface additions
To support CRUD of datasets (#31) within data projects, the data driver interface needs much more flexibility. I've added:
DataSets.create()
to create datasets — still needs some refinement, in particular the keyword parameters.Base.setindex!()
to add a dataset to a projectDataSets.delete()
to delete datasetsStackedDataProject
,AbstractTOMLDataProject
andTOMLDataProject
Relatedly, I've added
DataSets.from_path()
to create a standalone DataSet from data on the local filesystem, inferring the type as Blob or BlobTree. This can be passed as a source tocreate()
to make a copy.Still TODO here is
DataSets.config
(or some such) to update the metadata of a DataSet (alternatively — have the dataset know its owning data project and call back into that when it's updated?)Low level
AbstractDataDriver
interfaceThe low level driver interface is currently (in 0.2.6) just a function taking a user-defined callback.
However, to support CRUD operations for DataProject it needs to be expanded quite a bit. In particular to be able to create and delete storage in the storage backend. This PR adds
AbstractDataDriver
and, so far a single implementationFileSystemDriver
with implementations ofThis interface is probably still a bit half-baked and needs some refinement.