Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processing of experiment #31

Merged
merged 5 commits into from
Apr 26, 2024
Merged

Parallel processing of experiment #31

merged 5 commits into from
Apr 26, 2024

Conversation

rchan26
Copy link
Collaborator

@rchan26 rchan26 commented Apr 25, 2024

(Merge #24 first)

Fix #5 (and partly through #2).

I think "partly" for #2 because in this PR, we have the option to run each model/API separately. In the future, we may want to split this up even further if we have multiple instances of the same model/API running, e.g. have two VMs running separate ollama instances. In this case, we might even split up on model_name too.

Changes:

  • group_prompts_by_model method in Experiment creates a dictionary of dictionaries where the keys are the unique models in the experiment and each value is a prompt dictionary with model equal to that key
  • Settings has a parallel argument when initialising and a parallel attribute - this indicates whether or not to process the experiments for different models "in parallel"

The "parallelisation" doesn't use multiprocessing, it just creates separate async tasks for each model which we gather (so it's an asyncio.gather of models which themselves are an asyncio.gather of individual prompts - a nested asyncio.gather). We have separate tqdm bars for each model which keep us updated.

To run the pipeline in "parallel": run_pipeline --parallel or run_pipeline -p

@rchan26 rchan26 requested a review from fedenanni April 25, 2024 17:51
@rchan26 rchan26 changed the base branch from test-core to main April 25, 2024 17:51
Copy link
Collaborator

@fedenanni fedenanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rchan26 all looks / works fine, however there's a thing i'm not 100% sure. If OpenAI and Gemini have different max queries, at the moment I won't be able to specify that, right? Or did I miss something?

Because currently I set a general max_queries from the CLI for all of them. But maybe with -p I should provide specific max_queries?

@rchan26
Copy link
Collaborator Author

rchan26 commented Apr 26, 2024

yes, you're right, currently it is just a single one. we could configure Settings so that Settings.max_queries can either be an integer (same) limit for all models, or it can be a dictionary if -p is raised

shall we do this now or in another issue?

@rchan26 rchan26 merged commit b11a33e into main Apr 26, 2024
5 checks passed
@rchan26 rchan26 deleted the parallel-processing branch April 26, 2024 13:09
rchan26 added a commit that referenced this pull request May 20, 2024
rchan26 added a commit that referenced this pull request May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel processing of different APIs within an experiment
2 participants