Parallel processing of experiment #31

rchan26 · 2024-04-25T17:51:36Z

(Merge #24 first)

Fix #5 (and partly through #2).

I think "partly" for #2 because in this PR, we have the option to run each model/API separately. In the future, we may want to split this up even further if we have multiple instances of the same model/API running, e.g. have two VMs running separate ollama instances. In this case, we might even split up on model_name too.

Changes:

group_prompts_by_model method in Experiment creates a dictionary of dictionaries where the keys are the unique models in the experiment and each value is a prompt dictionary with model equal to that key
Settings has a parallel argument when initialising and a parallel attribute - this indicates whether or not to process the experiments for different models "in parallel"

The "parallelisation" doesn't use multiprocessing, it just creates separate async tasks for each model which we gather (so it's an asyncio.gather of models which themselves are an asyncio.gather of individual prompts - a nested asyncio.gather). We have separate tqdm bars for each model which keep us updated.

To run the pipeline in "parallel": run_pipeline --parallel or run_pipeline -p

fedenanni

@rchan26 all looks / works fine, however there's a thing i'm not 100% sure. If OpenAI and Gemini have different max queries, at the moment I won't be able to specify that, right? Or did I miss something?

Because currently I set a general max_queries from the CLI for all of them. But maybe with -p I should provide specific max_queries?

rchan26 · 2024-04-26T12:50:58Z

yes, you're right, currently it is just a single one. we could configure Settings so that Settings.max_queries can either be an integer (same) limit for all models, or it can be a dictionary if -p is raised

shall we do this now or in another issue?

Parallel processing of experiment

rchan26 added 3 commits April 25, 2024 19:41

return appropriate ollama ResponseError

42971ab

add 'parallel' option for async running each model

6b01c42

update settings tests for parallel attr

ba97c1e

rchan26 requested a review from fedenanni April 25, 2024 17:51

rchan26 changed the base branch from test-core to main April 25, 2024 17:51

rchan26 mentioned this pull request Apr 25, 2024

Timestamp input and output files #32

Merged

rchan26 added 2 commits April 26, 2024 10:46

Merge branch 'main' into parallel-processing

e5dd8d4

rename GEMINI_MODEL_ID to GEMINI_MODEL_NAME

8036aad

fedenanni approved these changes Apr 26, 2024

View reviewed changes

rchan26 merged commit b11a33e into main Apr 26, 2024
5 checks passed

rchan26 deleted the parallel-processing branch April 26, 2024 13:09

rchan26 added a commit that referenced this pull request May 20, 2024

Merge pull request #31 from alan-turing-institute/parallel-processing

87794c8

Parallel processing of experiment

rchan26 added a commit that referenced this pull request May 20, 2024

Merge pull request #31 from alan-turing-institute/parallel-processing

d11214e

Parallel processing of experiment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel processing of experiment #31

Parallel processing of experiment #31

rchan26 commented Apr 25, 2024

fedenanni left a comment

rchan26 commented Apr 26, 2024

Parallel processing of experiment #31

Parallel processing of experiment #31

Conversation

rchan26 commented Apr 25, 2024

fedenanni left a comment

Choose a reason for hiding this comment

rchan26 commented Apr 26, 2024