Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train use mlflow #287

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MohamedAliBouhaouala
Copy link
Contributor

@MohamedAliBouhaouala MohamedAliBouhaouala commented Oct 28, 2024

Motivation

This PR modifies the training process of intent classifier models in order to be able to expose the Mlflow UI and log metrics/artifacts/registered models there.

Type of change:

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@@ -0,0 +1,32 @@
version: '3.9'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these ROOT/docker/docker-compose.nlu.yml and ROOT/docker/docker-compose.nlu.dev.yml

ports:
- "5002:5000" # Expose MLflow UI
volumes:
- ./mlruns:/mlruns # Mount local directory for MLflow artifacts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ./mlruns:/mlruns # Mount local directory for MLflow artifacts
- mlruns:/mlruns # Mount local directory for MLflow artifacts

Add "mlruns:" to the volumes

@@ -34,6 +36,7 @@
'fr': "dbmdz/bert-base-french-europeana-cased",
}

mlflow.set_tracking_uri("http://0.0.0.0:5002")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mlflow.set_tracking_uri("http://0.0.0.0:5002")
mlflow.set_tracking_uri("http://0.0.0.0:5002")

This should be the docker hostname

self.save()

# Start MLflow run
with mlflow.start_run() as run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be implement in the parent class so that it would work for all models ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally it should be implemented as a helper class. Otherwise, we're logging metrics during training. The base class doesn't have a fit method.

@@ -96,7 +96,7 @@
if os.path.isfile(os.path.join(model.save_dir, "checkpoint")):
model.restore()
else:
model.save()
model.save_model()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need to rename this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion with keras' built-in save method. It's been throwing aberrant exceptions. Solved it by renaming the boilerplate method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants