Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optim: MARS #42

Merged
merged 2 commits into from
Nov 19, 2024
Merged

optim: MARS #42

merged 2 commits into from
Nov 19, 2024

Conversation

stockeh
Copy link
Owner

@stockeh stockeh commented Nov 19, 2024

Added MARS and common file for repeated ops (e.g., newton_schulz)

There is a function to set the previous gradient:

def set_last_grad(self, gradients: dict):
    """Set the last gradient for each parameter"""
    assert self._initialized, "Must be initialized before setting"
    if not self.is_approx:

        def update_last(gradient, state):
            state["last_grad"] = gradient

        tree_map(update_last, gradients, self.state)

however, this was tested extensively. The authors state in their paper p8 that using MARS-approx with is_approx = True is recommended.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

@stockeh stockeh merged commit ac8f677 into main Nov 19, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant