-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipeline Updates for RHEL AI 1.3 #230
Conversation
Signed-off-by: Michael Clifford <[email protected]>
Signed-off-by: Michael Clifford <[email protected]>
6c6de40
to
ba97769
Compare
EDIT: not quite, this b235539 works 💯 though |
aee44dd
to
cb305a1
Compare
On 1.3, we cannot edit /usr/share/instructlab/sdg/default_data_recipes/skills.yaml, thus we had to make adjustments to override the SDG DataMixer class to pass a different skills file. Also, sdg_sampling_size is now optional in the pipeline. Signed-off-by: Sébastien Han <[email protected]>
319d3db
to
b1cc706
Compare
Signed-off-by: Michael Clifford <[email protected]>
b1cc706
to
262fb94
Compare
chunk_word_count=1000, | ||
server_ctx_size=4096, | ||
) | ||
# Tweak precomputed skills data ratio if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add more explanation for this since it differs from what's exposed in ilab CLI? Or, link to an issue to replace this once the equivalent is in ilab/sdg? This can be a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but will wait for @leseb to approve/merge wrt standalone changes.
# Override XDG_DATA_DIRS with the temporary directory | ||
# This allows SDG to read the new skills.yaml since it's looking into XDG_DATA_DIRS | ||
# and looks for a default_data_recipes directory with a skills.yaml file | ||
os.environ["XDG_DATA_DIRS"] = f"{temp_dir}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the override here instead of doing data_processing_task.set_env_variable
so that removing this code will be easier in the future. Nothing to change!
Do not use None since it is not supported by the pipeline. Use the default 1.0 and compare against it to determine whether we need to tweak it. Signed-off-by: Sébastien Han <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
This PR updates the pipeline to use the RHEL AI 1.3 image.
In addition to updating the image, a couple other updates had to be made:
XDG_CACHE_HOME
to/tmp
for thedata_processing_task
.set_precomputed_skills_data_ratio()
to use a temporary directory when needed in thesdg_op
component.main_ds.py
.