-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detokenization parallelization #37
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #26 is there with this multi-process change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickNickGo It looks good to me in general. One major question is about the output order. We need to make sure the output order as same as before.
fastseq_cli/transformers_generate.py
Outdated
data_queue = Queue() | ||
msg_queue = Queue() | ||
p_list = [] | ||
threads = cpu_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to allow users to specify CPU numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't make a big difference right, although I can create an argument .,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be some differences. It will waste the CPU resources and it also brings overhead to create and manage these processes and sync data across these processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a parameter define when support parallel for fairseq. GPU machine has 32/64 or more CPU. Do you get better speed when have threads > 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice significant changes in overall time when number of threads are changed.
fastseq_cli/transformers_generate.py
Outdated
|
||
def chunks(lst, n): | ||
"""Yield successive n-sized chunks from lst.""" | ||
for i in range(0, len(lst), n): | ||
yield lst[i:i + n] | ||
|
||
class IOProcess (Process) : | ||
""" Write detokenized output to file in order.""" | ||
def __init__ (self, msg_queue, fout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__ (self, msg_queue, fout): | |
def __init__(self, msg_queue, fout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the similar spaces in other places.
fastseq_cli/transformers_generate.py
Outdated
def run (self) : | ||
while (True) : | ||
ind, dec = self.msg_queue.get() | ||
if dec == GENERATE_FINISHED : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dec == GENERATE_FINISHED : | |
if dec == GENERATE_FINISHED: |
Linting checks are clean. Could you please add additional formatting requirements (if any) in rcfile, this will reduce formatting iterations. |
Good suggestion. The rcfile is enhanced here(#38). One thing it does not cover is the whitespace between 1) function name and parentheses; 2) variables and colon, which you need to manually check and remove but it should be easy. |
Multi-worker preprocess : Bart Large BS 128 1k samples, throughput change from 11.8 (from #40 ) to 12.3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the numbers in the benchmarking scripts need to be updated?
fastseq_cli/transformers_generate.py
Outdated
return_tensors="pt", | ||
truncation=True, | ||
padding="max_length") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these parameters to the constructor instead of hard coding.
@feihugis thanks, I incorporated all nitpicks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comments: 1) update the benchmarking scripts as this PR will change the performance of all the transformers models; 2) add the docs for the new classes and public APIs (e.g. short description of the API, the types and meaning of the input args and returns).
fastseq_cli/transformers_generate.py
Outdated
def __init__(self, examples, tokenizer, model_name, prefix): | ||
self.examples = examples | ||
self.tokenizer= tokenizer | ||
self.model_name = model_name | ||
self.prefix = prefix | ||
self.return_tensors="pt" | ||
self.truncation=True | ||
self.padding="max_length" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean something like
def _init(self, examples, tokenizer, model_name, prefix, return_tensors, truncation, padding):
...
self.return_tensors = return_tensors
...
Only HF benchmarks:
After:
|
@NickNickGo One minor question: are the numbers in the benchmark scripts based on #40 or not? If not, the benchmark script may fail when both PRs are merged. |
self.return_tensors="pt" | ||
self.truncation=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use hard code here? We can put these two as the parameters of the constructor.
|
||
class IOProcess (Process): | ||
""" Write detokenized output to file in order.""" | ||
def __init__(self, msg_queue, fout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing docs
|
||
class PostProcess(Process): | ||
""" Parallel detokenization """ | ||
def __init__(self, tokenizer, data_queue, msg_queue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing docs.
Async detokenization