detokenization parallelization #37

NickNickGo · 2020-09-08T22:18:46Z

Async detokenization

JiushengChen

If #26 is there with this multi-process change?

fastseq_cli/transformers_generate.py

feihugis

@NickNickGo It looks good to me in general. One major question is about the output order. We need to make sure the output order as same as before.

fastseq_cli/transformers_generate.py

feihugis · 2020-09-08T22:40:49Z

fastseq_cli/transformers_generate.py

+    data_queue = Queue() 
+    msg_queue =  Queue() 
+    p_list = []
+    threads = cpu_count()


It may be better to allow users to specify CPU numbers.

shouldn't make a big difference right, although I can create an argument .,

There should be some differences. It will waste the CPU resources and it also brings overhead to create and manage these processes and sync data across these processes.

There is a parameter define when support parallel for fairseq. GPU machine has 32/64 or more CPU. Do you get better speed when have threads > 1?

@feihugis I added support for this.
@yuyan2do , I haven't yet analyzed effect of changing num of threads on speed, let me do that .

I didn't notice significant changes in overall time when number of threads are changed.

fastseq_cli/transformers_generate.py

feihugis · 2020-09-10T21:59:57Z

fastseq_cli/transformers_generate.py


 def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

+class IOProcess (Process) :
+    """ Write detokenized output to file in order.""" 
+    def __init__ (self, msg_queue, fout): 


Suggested change

def __init__ (self, msg_queue, fout):

def __init__(self, msg_queue, fout):

Remove the similar spaces in other places.

feihugis · 2020-09-10T22:03:41Z

fastseq_cli/transformers_generate.py

+    def run (self) :
+        while (True) : 
+            ind, dec = self.msg_queue.get() 
+            if dec == GENERATE_FINISHED : 


Suggested change

if dec == GENERATE_FINISHED :

if dec == GENERATE_FINISHED:

fastseq_cli/transformers_generate.py

NickNickGo · 2020-09-12T02:10:29Z

Linting checks are clean. Could you please add additional formatting requirements (if any) in rcfile, this will reduce formatting iterations.

feihugis · 2020-09-14T16:43:33Z

Linting checks are clean. Could you please add additional formatting requirements (if any) in rcfile, this will reduce formatting iterations.

Good suggestion. The rcfile is enhanced here(#38). One thing it does not cover is the whitespace between 1) function name and parentheses; 2) variables and colon, which you need to manually check and remove but it should be easy.

benchmarks/models/hf_mbart.sh

fastseq_cli/transformers_generate.py

NickNickGo · 2020-09-25T07:05:22Z

Multi-worker preprocess : Bart Large BS 128 1k samples, throughput change from 11.8 (from #40 ) to 12.3.

feihugis

Will the numbers in the benchmarking scripts need to be updated?

fastseq_cli/transformers_generate.py

feihugis · 2020-09-25T22:19:46Z

fastseq_cli/transformers_generate.py

+                          return_tensors="pt",
+                          truncation=True,
+                          padding="max_length")


Add these parameters to the constructor instead of hard coding.

fastseq_cli/transformers_generate.py

NickNickGo · 2020-09-29T23:50:28Z

@feihugis thanks, I incorporated all nitpicks.

feihugis

Last comments: 1) update the benchmarking scripts as this PR will change the performance of all the transformers models; 2) add the docs for the new classes and public APIs (e.g. short description of the API, the types and meaning of the input args and returns).

feihugis · 2020-09-29T23:56:30Z

fastseq_cli/transformers_generate.py

    def __init__(self, examples, tokenizer, model_name, prefix):
        self.examples = examples
        self.tokenizer= tokenizer
        self.model_name = model_name
        self.prefix = prefix
+        self.return_tensors="pt"
+        self.truncation=True
+        self.padding="max_length"


I mean something like

def _init(self, examples, tokenizer, model_name, prefix, return_tensors, truncation, padding): ... self.return_tensors = return_tensors ...

…into detokenization

NickNickGo · 2020-11-12T01:27:13Z

Only HF benchmarks:

Before (without #40 , #37 )

Model	W/O FastSeq (in samples/s)	W/ FastSeq (in samples/s)	Speedup
Bart (hf)	3.4	8.1	2.4x
DistilBart (hf)	4.0	8.5	2.1x
T5 (hf)	4.8	7.5	1.6x

After:

Model	W/O FastSeq (in samples/s)	W/ FastSeq (in samples/s)	Speedup
Bart (hf)	3.4	11.0	3.2x
DistilBart (hf)	4.0	13.5	3.4x
T5 (hf)	4.8	17.0	3.5x

feihugis · 2020-11-12T04:47:58Z

@NickNickGo One minor question: are the numbers in the benchmark scripts based on #40 or not? If not, the benchmark script may fail when both PRs are merged.

feihugis · 2020-11-12T04:50:05Z

fastseq_cli/transformers_generate.py

+        self.return_tensors="pt"
+        self.truncation=True


Why use hard code here? We can put these two as the parameters of the constructor.

feihugis · 2020-11-12T04:52:27Z

fastseq_cli/transformers_generate.py

+
+class IOProcess (Process):
+    """ Write detokenized output to file in order."""
+    def __init__(self, msg_queue, fout):


missing docs

feihugis · 2020-11-12T04:53:02Z

fastseq_cli/transformers_generate.py


+class PostProcess(Process):
+    """ Parallel detokenization """
+    def __init__(self, tokenizer, data_queue, msg_queue,


missing docs.

detokenization parallelization

b60838f

NickNickGo requested a review from a team September 8, 2020 22:18

JiushengChen reviewed Sep 8, 2020

View reviewed changes

fastseq_cli/transformers_generate.py Show resolved Hide resolved

fastseq_cli/transformers_generate.py Outdated Show resolved Hide resolved

fastseq_cli/transformers_generate.py Outdated Show resolved Hide resolved

minor changes

98ea20e

feihugis reviewed Sep 8, 2020

View reviewed changes

NickNickGo and others added 5 commits September 9, 2020 00:21

adding arguments to Postprocess

f40c8f6

adding arguments to Postprocess

e70078a

updating throughput

63e66a0

Ensuring in-order writes

d67bcd3

minor comments

ef1a191

feihugis reviewed Sep 10, 2020

View reviewed changes

linting checks

e9edc09

formatting changes

ee40b82

feihugis reviewed Sep 16, 2020

View reviewed changes

benchmarks/models/hf_mbart.sh Show resolved Hide resolved

feihugis reviewed Sep 24, 2020

View reviewed changes

fastseq_cli/transformers_generate.py Outdated Show resolved Hide resolved

Multi-worker preprocess and fetch

5bf190b

feihugis reviewed Sep 25, 2020

View reviewed changes

nitpicks

a595d02

feihugis reviewed Sep 30, 2020

View reviewed changes

yuyan2do approved these changes Oct 29, 2020

View reviewed changes

NickNickGo and others added 6 commits October 30, 2020 00:59

argument description added

9ce149a

benchmarks updated

143aeff

readme Updated

47c5941

Merge branch 'main' into detokenization

e707b11

readme update

3ed0006

Merge branch 'detokenization' of https://github.com/NickNickGo/fastseq …

228765e

…into detokenization

feihugis reviewed Nov 12, 2020

View reviewed changes

NickNickGo mentioned this pull request Dec 10, 2020

Detokenization parallelization public repo #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detokenization parallelization #37

detokenization parallelization #37

NickNickGo commented Sep 8, 2020

JiushengChen left a comment

feihugis left a comment

feihugis Sep 8, 2020

NickNickGo Sep 8, 2020

feihugis Sep 9, 2020

yuyan2do Sep 9, 2020

NickNickGo Sep 9, 2020

NickNickGo Sep 10, 2020

feihugis Sep 10, 2020

feihugis Sep 10, 2020

feihugis Sep 10, 2020

NickNickGo commented Sep 12, 2020

feihugis commented Sep 14, 2020 •

edited

Loading

NickNickGo commented Sep 25, 2020 •

edited

Loading

feihugis left a comment

feihugis Sep 25, 2020

NickNickGo commented Sep 29, 2020

feihugis left a comment

feihugis Sep 29, 2020

NickNickGo commented Nov 12, 2020

feihugis commented Nov 12, 2020

feihugis Nov 12, 2020

feihugis Nov 12, 2020

feihugis Nov 12, 2020

	def __init__ (self, msg_queue, fout):
	def __init__(self, msg_queue, fout):

detokenization parallelization #37

Are you sure you want to change the base?

detokenization parallelization #37

Conversation

NickNickGo commented Sep 8, 2020

JiushengChen left a comment

Choose a reason for hiding this comment

feihugis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickNickGo commented Sep 12, 2020

feihugis commented Sep 14, 2020 • edited Loading

NickNickGo commented Sep 25, 2020 • edited Loading

feihugis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickNickGo commented Sep 29, 2020

feihugis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickNickGo commented Nov 12, 2020

feihugis commented Nov 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feihugis commented Sep 14, 2020 •

edited

Loading

NickNickGo commented Sep 25, 2020 •

edited

Loading