Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Checkpoint docs are incorrectly rendered on deepspeed.readthedocs.io #6747

Open
akeshet opened this issue Nov 12, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@akeshet
Copy link

akeshet commented Nov 12, 2024

The top google search hit for most deepspeed documentation searches is https://deepspeed.readthedocs.io/ or its various sub-pages. I assume this export is somehow maintained or at least enabled by the deepspeed team, hence this bug report.

The page on model checkpointing at https://deepspeed.readthedocs.io/en/latest/model-checkpointing.html#model-checkpointing shows empty sections for "Loading Training Checkpoints" and "Saving Training Checkpoints", which I have found confusing on a few occasions. Whereas the rst file in the repo has an autofunction declaration inside these sections, e.g. https://github.com/microsoft/DeepSpeed/blob/b692cdea479fba8201584054d654f639e925a265/docs/code-docs/source/model-checkpointing.rst

Evidently the autofunction there is not being correctly rendered by doc export.

@akeshet akeshet changed the title Model Checkpoint docs are incorrectly rendered on https://deepspeed.readthedocs.io/ Model Checkpoint docs are incorrectly rendered on deepspeed.readthedocs.io Nov 12, 2024
@SubhamCPP
Copy link

Hi @akeshet, thank you for reporting this issue.

I've also noticed several other pages with missing documentation sections. Below, I'm listing the URLs along with the specific missing blocks:

Training API - Link to Documentation

  • Forward Propagation
  • Backward Propagation
  • Optimizer Step
  • Gradient Accumulation

ZeRO - Link to Documentation

  • For Modifying Partitioned States
    • GPU Memory Management

The missing sections all seem to have the same root cause: the .. autofunction:: deepspeed.DeepSpeedEngine directive is not being correctly rendered by the documentation export process. This may be related to an issue with the Sphinx configuration or the specific way the documentation build process is set up for deepspeed.readthedocs.io.

It might be worthwhile to verify the Sphinx extensions being used and ensure the autodoc functionality is correctly configured for rendering these DeepSpeedEngine methods.

It might be helpful to add a documentation or bug label to this issue for visibility.

@loadams loadams added bug Something isn't working documentation Improvements or additions to documentation labels Nov 14, 2024
@loadams loadams self-assigned this Nov 14, 2024
@loadams
Copy link
Contributor

loadams commented Nov 15, 2024

@akeshet - yes, we maintain the readthedocs page, so thanks for reporting this here. I'll take a closer look at what is happening here for why the export is failing.

@SubhamCPP - thanks for the suggestions on the tags, I added them and will take a look.

@loadams
Copy link
Contributor

loadams commented Nov 22, 2024

Yes @SubhamCPP - it does appear to be an issue with that import, when building the docs with -v:

viewcode can't import deepspeed.DeepSpeedEngine, failed with error "No module named 'deepspeed.DeepSpeedEngine'"

We will work on fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants