Unbatched queries to OpenAI are faster than `nlp.pipe` #224

Zatteliet · 2023-07-20T09:54:30Z

Zatteliet
Jul 20, 2023

A quick test tells me that, having initialized a GPT 3.5 NER component, passing texts one by one completes faster than using nlp.pipe -- by about 30 seconds for 100 documents.

I understand that nlp.pipe is normally batches requests to make processing more efficient in non-LLM components. I also understand, by #212, that OpenAI requests cannot be batched, so presumably in this case nlp.pipe still sends queries one by one. Still, the difference in time is significant.

Should this be addressed, and how?

Test snippet:

from datetime import datetime

import dotenv
import spacy

dotenv.load_dotenv()  # fetch OpenAI key

n = 100

nlp = spacy.blank("en")
nlp.add_pipe(
    "llm",
    config={
        "task": {"@llm_tasks": "spacy.NER.v2", "labels": ["FRUIT"]},
        "model": {"@llm_models": "spacy.GPT-3-5.v1"},
    },
)
nlp.initialize()

text = "I like apples and oranges"

start = datetime.now()
for _ in range(n):
    nlp(text)
end = datetime.now()
print("UNBATCHED time:", end - start)

start = datetime.now()
list(nlp.pipe([text] * n))
end = datetime.now()
print("BATCHED time:", end - start)

Gives

UNBATCHED time: 0:01:08.328383
BATCHED time: 0:01:35.651697

Answered by rmitsch

Jul 20, 2023

Alright, I ran some tests. The issue with benchmarking OpenAI is that the API performance fluctuates wildly, in my experience. So running the same tests can yield differences in runtime to a factor of 5 or more.

Because of this I ran the prompts in different variations a couple of times with different n. The average time was pretty comparable - if you haven't done so, I recommend running your script multiple times as well and averaging the runtimes for your comparison.

My conclusion is that using .pipe() is actually about equally fast as running documents individually, which is the expected behavior.

View full answer

rmitsch · 2023-07-20T10:10:40Z

rmitsch
Jul 20, 2023
Maintainer

Hi @Zatteliet, thanks for this data point!

... that OpenAI requests cannot be batched, ...

More precisely: all OpenAI models using the chat endpoint cannot be batched, as OpenAI only offers this for the completion endpoint at the moment. As soon as this is available we'll batch requests for models using the chat endpoint.

Should this be addressed, and how?

It seems like quite a significant time different, I'm surprised. We'll definitely look into this, batching shouldn't be slower than running the pipeline separately for each document here.

0 replies

rmitsch · 2023-07-20T11:48:44Z

rmitsch
Jul 20, 2023
Maintainer

Alright, I ran some tests. The issue with benchmarking OpenAI is that the API performance fluctuates wildly, in my experience. So running the same tests can yield differences in runtime to a factor of 5 or more.

Because of this I ran the prompts in different variations a couple of times with different n. The average time was pretty comparable - if you haven't done so, I recommend running your script multiple times as well and averaging the runtimes for your comparison.

My conclusion is that using .pipe() is actually about equally fast as running documents individually, which is the expected behavior.

1 reply

Zatteliet Jul 20, 2023
Author

Interesting! Thanks for digging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbatched queries to OpenAI are faster than `nlp.pipe` #224

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Unbatched queries to OpenAI are faster than nlp.pipe #224

Zatteliet Jul 20, 2023

Replies: 2 comments · 1 reply

rmitsch Jul 20, 2023 Maintainer

rmitsch Jul 20, 2023 Maintainer

Zatteliet Jul 20, 2023 Author

Unbatched queries to OpenAI are faster than `nlp.pipe` #224

Zatteliet
Jul 20, 2023

Replies: 2 comments 1 reply

rmitsch
Jul 20, 2023
Maintainer

rmitsch
Jul 20, 2023
Maintainer

Zatteliet Jul 20, 2023
Author