Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(Merge #24 first)
Fix #5 (and partly through #2).
I think "partly" for #2 because in this PR, we have the option to run each model/API separately. In the future, we may want to split this up even further if we have multiple instances of the same model/API running, e.g. have two VMs running separate ollama instances. In this case, we might even split up on
model_name
too.Changes:
group_prompts_by_model
method inExperiment
creates a dictionary of dictionaries where the keys are the unique models in the experiment and each value is a prompt dictionary withmodel
equal to that keySettings
has aparallel
argument when initialising and aparallel
attribute - this indicates whether or not to process the experiments for different models "in parallel"The "parallelisation" doesn't use multiprocessing, it just creates separate async tasks for each model which we gather (so it's an
asyncio.gather
of models which themselves are anasyncio.gather
of individual prompts - a nestedasyncio.gather
). We have separate tqdm bars for each model which keep us updated.To run the pipeline in "parallel":
run_pipeline --parallel
orrun_pipeline -p