Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no metadata dir in a compressed bucket #18

Open
jinserk opened this issue Aug 28, 2020 · 6 comments
Open

no metadata dir in a compressed bucket #18

jinserk opened this issue Aug 28, 2020 · 6 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@jinserk
Copy link

jinserk commented Aug 28, 2020

Hi again,

Sorry for bothering you with several question and bug report, but this looks critical.
I made a compressed data bucket and it looks storing well, but when I retrieve the dataset, it has 0 len as follows:

Traceback (most recent call last):
  File "/home/jinserk/.pyenv/versions/3.8.5/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/jinserk/kyu/kyumlm/mlmanager/torch/workers.py", line 202, in run
    self.setup()
  File "/home/jinserk/kyu/kyumlm/mlmanager/torch/workers.py", line 190, in setup
    self.set_dataloaders()
  File "/home/jinserk/kyu/kyumlm/mlmanager/torch/workers.py", line 134, in set_dataloaders
    trainset, valset = self.set_datasets()
  File "/home/jinserk/kyu/kyumlm/tddft/ann/workers.py", line 88, in set_datasets
    print(dataset[0])
  File "/home/jinserk/kyu/kyumlm/tddft/ann/dataset.py", line 35, in __getitem__
    x = super().__getitem__(index)
  File "/home/jinserk/.pyenv/versions/kyumlm/lib/python3.8/site-packages/matorage/data/torch/dataset.py", line 81, in __getitem__
    return self._get_item_with_download(idx)
  File "/home/jinserk/.pyenv/versions/kyumlm/lib/python3.8/site-packages/matorage/data/torch/dataset.py", line 89, in _get_item_with_download
    _objectname, _relative_index = self._find_object(idx)
  File "/home/jinserk/.pyenv/versions/kyumlm/lib/python3.8/site-packages/matorage/data/data.py", line 128, in _find_object
    _key = self.end_indices[_key_idx]
IndexError: list index out of range

I've checked briefly, and found that the bucket has no metadata to read out the meta info of the dataset.
Can you fix this error? I have installed the latest master branch code.

@jinserk
Copy link
Author

jinserk commented Aug 28, 2020

One more minor error I found was, when I export the DataConfig to json, itemsize info of a DataAttribute was not exported. Of course I can add it manually.

@graykode
Copy link
Owner

graykode commented Aug 28, 2020

@jinserk

No, a lot of questions on this project don't bother me. Rather, I am happy to think that this project can be improved.

First question: If there is no information related to the metadata, it means that the save was accidentally broken in the middle.
Therefore, it seems necessary to create a metadata recover function for this case. Or maybe you have forgot datasaver.disconnect.

Second question : Yes, itemsize is missed. I'll add this part as soon as possible.

To fixed

  • create metadata.recover function
  • itemsize option also save json file.

So, for the first question, please double check that the code was written correctly before modifying this part.

@graykode graykode self-assigned this Aug 28, 2020
@graykode graykode added the bug Something isn't working label Aug 28, 2020
@jinserk
Copy link
Author

jinserk commented Aug 28, 2020

@graykode You're right! I forgot datasaver.disconnect. Thank you so much! By the way, is this disconnect not able to be called from DataSaver.__del__() automatically?

@graykode
Copy link
Owner

@jinserk Thanks for the great suggestion.

As you suggested, adding it to DataSaver.__del__() doesn't seem to have any problem in terms of concurrency(multiprocessing). I will reflect on this. Thanks!

@graykode
Copy link
Owner

@jinserk

The python destructor is not a function that is triggered when the class ends. Therefore, it seems more efficient to manage with python's Context Manager (__enter()__, __exit__) :

with DataSaver(...) as datasaver:
   datasave(...)

@jinserk
Copy link
Author

jinserk commented Aug 29, 2020

Looks great! I thought that __del__ is called when the instance destructed but it doesn't.. sorry for making you confused!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants