Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Updates for RHEL AI 1.3 #230

Merged
merged 5 commits into from
Dec 9, 2024

Conversation

MichaelClifford
Copy link
Collaborator

@MichaelClifford MichaelClifford commented Dec 4, 2024

This PR updates the pipeline to use the RHEL AI 1.3 image.
In addition to updating the image, a couple other updates had to be made:

  • set XDG_CACHE_HOME to /tmp for the data_processing_task.
  • Update set_precomputed_skills_data_ratio() to use a temporary directory when needed in the sdg_op component.
  • Update PytorchJob manifest to account for changes in available parameters in main_ds.py.

@leseb
Copy link
Collaborator

leseb commented Dec 4, 2024

@MichaelClifford successful run for 652e94f

EDIT: not quite, this b235539 works 💯 though

@leseb leseb force-pushed the rhel1.3 branch 2 times, most recently from aee44dd to cb305a1 Compare December 5, 2024 15:29
On 1.3, we cannot edit
/usr/share/instructlab/sdg/default_data_recipes/skills.yaml, thus we had
to make adjustments to override the SDG DataMixer class to pass a
different skills file.

Also, sdg_sampling_size is now optional in the pipeline.

Signed-off-by: Sébastien Han <[email protected]>
@MichaelClifford
Copy link
Collaborator Author

@leseb b235539 works for the default SDG modes. When I tested out the agentic mode I got an FileAlreadyExists Error I think this last small change should fix it.

@leseb leseb marked this pull request as ready for review December 6, 2024 08:08
@MichaelClifford MichaelClifford changed the title WIP: Pipeline Updates for RHEL AI 1.3 Pipeline Updates for RHEL AI 1.3 Dec 6, 2024
chunk_word_count=1000,
server_ctx_size=4096,
)
# Tweak precomputed skills data ratio if needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add more explanation for this since it differs from what's exposed in ilab CLI? Or, link to an issue to replace this once the equivalent is in ilab/sdg? This can be a follow-up.

Copy link
Collaborator

@sallyom sallyom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but will wait for @leseb to approve/merge wrt standalone changes.

# Override XDG_DATA_DIRS with the temporary directory
# This allows SDG to read the new skills.yaml since it's looking into XDG_DATA_DIRS
# and looks for a default_data_recipes directory with a skills.yaml file
os.environ["XDG_DATA_DIRS"] = f"{temp_dir}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the override here instead of doing data_processing_task.set_env_variable so that removing this code will be easier in the future. Nothing to change!

pipeline.py Outdated Show resolved Hide resolved
Do not use None since it is not supported by the pipeline. Use the
default 1.0 and compare against it to determine whether we need to
tweak it.

Signed-off-by: Sébastien Han <[email protected]>
Copy link
Member

@tumido tumido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@tumido tumido merged commit 3bb3be0 into opendatahub-io:main Dec 9, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants