Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable orbax cloud logger by default. #1034

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 4 additions & 15 deletions MaxText/checkpointing.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@

abstract_logger = ocp.logging.abstract_logger
cloud_logger = ocp.logging.cloud_logger
composite_logger = ocp.logging.composite_logger
standard_logger = ocp.logging.standard_logger


def create_orbax_checkpoint_manager(
Expand Down Expand Up @@ -264,32 +262,23 @@ def map_to_pspec(data):
return None, None


def setup_checkpoint_logger(config) -> composite_logger.CompositeLogger | None:
def setup_checkpoint_logger(config) -> cloud_logger.CloudLogger | None:
"""Setup checkpoint logger.
Args:
config
Returns:
CompositeLogger
CloudLogger
"""
orbax_cloud_logger = None
orbax_standard_logger = None
max_logging.log("Setting up checkpoint logger...")
if config.enable_checkpoint_cloud_logger:
logger_name = f"goodput_{config.run_name}"
options = cloud_logger.CloudLoggerOptions(job_name=config.run_name, logger_name=logger_name)
orbax_cloud_logger = cloud_logger.CloudLogger(options=options)
max_logging.log("Successfully set up checkpoint cloud logger.")
return orbax_cloud_logger

if config.enable_checkpoint_standard_logger:
orbax_standard_logger = standard_logger.StandardLogger()
max_logging.log("Successfully set up checkpoint standard logger.")

orbax_logger = None
if orbax_cloud_logger is not None and orbax_standard_logger is not None:
orbax_logger = composite_logger.CompositeLogger(orbax_cloud_logger, orbax_standard_logger)
max_logging.log("Successfully set up checkpoint composite logger.")

return orbax_logger
return orbax_cloud_logger


def load_params_from_path(load_parameters_from_path, abstract_unboxed_params):
Expand Down
3 changes: 1 addition & 2 deletions MaxText/configs/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -446,8 +446,7 @@ enable_jax_profiler: False
jax_profiler_port: 9999

# Checkpoint Structured logging
enable_checkpoint_cloud_logger: False
enable_checkpoint_standard_logger: False
enable_checkpoint_cloud_logger: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it ever make sense to set this to False. If not then I would prefer removing the config option and not have a check in setting up the cloud logger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few reasons:

  • Sometimes too much logging can cause performance issues.
  • If Cloud logging starts breaking the training, issues like cloud monitoring api RATE_LIMIT exceeding.
  • Moreover, like Goodput(which is utilising cloud logging) we should give user option to turn it off.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!


# Single-controller
enable_single_controller: False
Expand Down
Loading