-
A quick test tells me that, having initialized a GPT 3.5 NER component, passing texts one by one completes faster than using I understand that Should this be addressed, and how? Test snippet: from datetime import datetime
import dotenv
import spacy
dotenv.load_dotenv() # fetch OpenAI key
n = 100
nlp = spacy.blank("en")
nlp.add_pipe(
"llm",
config={
"task": {"@llm_tasks": "spacy.NER.v2", "labels": ["FRUIT"]},
"model": {"@llm_models": "spacy.GPT-3-5.v1"},
},
)
nlp.initialize()
text = "I like apples and oranges"
start = datetime.now()
for _ in range(n):
nlp(text)
end = datetime.now()
print("UNBATCHED time:", end - start)
start = datetime.now()
list(nlp.pipe([text] * n))
end = datetime.now()
print("BATCHED time:", end - start) Gives
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @Zatteliet, thanks for this data point!
More precisely: all OpenAI models using the chat endpoint cannot be batched, as OpenAI only offers this for the completion endpoint at the moment. As soon as this is available we'll batch requests for models using the chat endpoint.
It seems like quite a significant time different, I'm surprised. We'll definitely look into this, batching shouldn't be slower than running the pipeline separately for each document here. |
Beta Was this translation helpful? Give feedback.
-
Alright, I ran some tests. The issue with benchmarking OpenAI is that the API performance fluctuates wildly, in my experience. So running the same tests can yield differences in runtime to a factor of 5 or more. Because of this I ran the prompts in different variations a couple of times with different My conclusion is that using |
Beta Was this translation helpful? Give feedback.
Alright, I ran some tests. The issue with benchmarking OpenAI is that the API performance fluctuates wildly, in my experience. So running the same tests can yield differences in runtime to a factor of 5 or more.
Because of this I ran the prompts in different variations a couple of times with different
n
. The average time was pretty comparable - if you haven't done so, I recommend running your script multiple times as well and averaging the runtimes for your comparison.My conclusion is that using
.pipe()
is actually about equally fast as running documents individually, which is the expected behavior.