Simplify v1.0 API Dataset Creation #81
Unanswered
davidbuniat
asked this question in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Discussion regarding v1.0 API interface for dataset creation.
Currently to create a dataset we need to write the following code. We need to specify the dtypes (aka architecture of the dataset) and shape.
Use Case I
The user has a nested folder structure of the data locally stored in the FS. Importing the data and data type inference could be automated.
Use Case II
The data is already in PyTorch or Tensorflow format. Can I directly transform into a hub format? use of
from_tensorflow(...)
andfrom_pytorch(...)
Use Case III
Creating a dataset from scratch and populating it as if a NumPy array. Specify the entire dtype structure.
Use Case IV
I want to add an additional Tensor to my existing dataset.
Use case V
I want to add samples to my dataset (maybe in a streaming fashion). Is there a point for bounding the sample size?
Use Case VI
Applying transformation on a sample or a tensor separately. This discussion goes into Transformation function types.
Usee Case VII
I want to create just a single tensor. Do I have to create a dataset with a dictionary of a single element?
Beta Was this translation helpful? Give feedback.
All reactions