Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate docstrings to doctests #901

Closed
wants to merge 1 commit into from

Conversation

mathiasburger
Copy link

Changes

Migrate docstrings to doctest

  • torchdata/dataloader2/adapter.py
  • torchdata/datapipes/iter/load/*

Use PEP8 style in code examples, e.g. add newlines between defs:

  from torchdata.datapipes.iter import IterableWrapper


  def filepath_fn(name: str) -> str:
      return dir_path + name


  name_to_data = {"1.txt": b"DATA1", "2.txt": b"DATA2", "3.txt": b"DATA3"}
  source_dp = IterableWrapper(sorted(name_to_data.items()))
  fsspec_saver_dp = source_dp.save_by_fsspec(filepath_fn=filepath_fn, mode="wb")
  res_file_paths = list(fsspec_saver_dp)`

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 16, 2022
@mathiasburger
Copy link
Author

Comparison of the documentation before and after the changes:

Screenshot 2022-11-16 at 23 36 03
Screenshot 2022-11-16 at 23 35 33
Screenshot 2022-11-16 at 23 34 49
Screenshot 2022-11-16 at 23 33 56
Screenshot 2022-11-16 at 23 31 43
Screenshot 2022-11-16 at 23 30 58
Screenshot 2022-11-16 at 23 30 27
Screenshot 2022-11-16 at 23 29 47
Screenshot 2022-11-16 at 23 25 11
Screenshot 2022-11-16 at 23 24 31
Screenshot 2022-11-16 at 23 23 50
Screenshot 2022-11-16 at 23 22 36
Screenshot 2022-11-16 at 23 22 05

torchdata/datapipes/iter/load/huggingface.py Outdated Show resolved Hide resolved
Comment on lines 207 to 213
os.remove(dir_path + "1.txt")
os.remove(dir_path + "2.txt")
os.remove(dir_path + "3.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should them be os.remove(os.join(dir_path, "1.txt"))?

Copy link
Author

@mathiasburger mathiasburger Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath_fn constructs a filename and not a directory, so concatenation is correct here; I did not change the example itself and only added matching test cleanup

torchdata/datapipes/iter/load/iopath.py Show resolved Hide resolved
torchdata/datapipes/iter/load/online.py Show resolved Hide resolved
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this effort. Left a few comments above


import os

os.remove(file_prefix + "1.txt")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath_fn constructs a filename and not a directory, so concatenation is correct here; I did not change the example itself and only added matching test cleanup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. nit: Can we do a for loop here?

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Left a few minor comments

torchdata/dataloader2/adapter.py Outdated Show resolved Hide resolved

import os

os.remove(file_prefix + "1.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. nit: Can we do a for loop here?

torchdata/datapipes/iter/load/iopath.py Show resolved Hide resolved

from torchdata.datapipes.iter import IterableWrapper, S3FileLister

S3FileLister.__iter__ = lambda self: iter([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too hacky. How about construct an empty list as s3_prefixes and add a comment about the pattern of each path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho if we use an empty list of files, the example is not very descriptive. IterableWrapper(['s3://bucket-name/folder/', ...]) is necessary for the user to understand the example.

We could mock here, but we would basically do the same thing of emptying the list in order to make it work. But mocking would alter the example. So I find it difficult to achieve a good solution. Please note that the test setup is not shown in the example and that we don't really want to alter the example here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try patching. As I cannot use a with block, I will use patch.start() in testsetup and patch.stop() in testcleanup

Comment on lines 126 to 128
from torchdata.datapipes.iter import S3FileLoader

S3FileLoader.__iter__ = lambda self: iter([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to patching in testsetup and cleaning up the patch in testcleanup

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be great to add a section in the contribution guide (can be done in a separate PR) about doctests, briefly describing that it exists and is part of the docstring.

Perhaps somewhere in this section?

@SvenDS9
Copy link
Contributor

SvenDS9 commented Mar 17, 2023

I think it will be great to add a section in the contribution guide (can be done in a separate PR) about doctests, briefly describing that it exists and is part of the docstring.

Perhaps somewhere in this section?

I have added a section about doctests in 1aaa382 in #1069

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@NivekT merged this pull request in 837ede1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged topic: developer feature topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants