-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize encoder interpolation latency #309
optimize encoder interpolation latency #309
Conversation
encoding/rs/interpolation.go
Outdated
tmp.Mul(&wPow, &w) | ||
|
||
wPow.Set(&tmp) | ||
// We cam lookup the inverse power by counting RootOfUnity backward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: cam -> can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was the perf measured?
ys := polyEvals[g.ChunkLength*i : g.ChunkLength*(i+1)] | ||
err := rb.ReverseBitOrderFr(ys) | ||
if err != nil { | ||
results <- err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have continue
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I used "return", because if anything that has error, the entire MakeFrames becomes false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to use continue
. If you had enough errors for all of the workers to return, the program could hang on L104.
Continuing ensures that the workers continue to consume the requests.
Obviously, you could optimize this further if an error was likely here, but this would be a programming error so I don't think it needs to be optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. good call.
} | ||
coeffs, err := g.GetInterpolationPolyCoeff(ys, uint32(j)) | ||
if err != nil { | ||
results <- err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same above
It is tested in preprod, by examining the encoder logs, see the link. |
|
||
func (g *Encoder) interpolyWorker( | ||
polyEvals []fr.Element, | ||
jobChan <-chan JobRequest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it just use a workerpool, instead of a dedicated channel to create job request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually a good idea, I think we should create a global persistent object under prover? what do you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why global/persistent? Is the concern in workpool creation/destroy cost? It's generally better not have global/persistent object if can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say each requests is 256KB, 10MBps is equivalent to 40 parallel requests, and they need to share threads. If multiple objects are specified, it is unclear how to allocate them. I will mark it as todo, since I don't know the performance of the threadpool, compared to now.
a130de0
to
2ee68eb
Compare
@@ -225,6 +225,7 @@ func (p *ParametrizedProver) proofWorker( | |||
points: nil, | |||
err: err, | |||
} | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just moving the for loop below inside this code block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a else statement
encoding/rs/encode.go
Outdated
@@ -56,8 +57,8 @@ func (g *Encoder) Encode(inputFr []fr.Element) (*GlobalPoly, []Frame, []uint32, | |||
return nil, nil, nil, err | |||
} | |||
|
|||
log.Printf(" SUMMARY: Encode %v byte among %v numNode takes %v\n", | |||
len(inputFr)*encoding.BYTES_PER_COEFFICIENT, g.NumChunks, time.Since(start)) | |||
log.Printf(" SUMMARY: RSEncode %v byte among %v numNode with chunkSize %v takes %v\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we settle on the terminology convention that "size" means num of bytes, and "length" means num of symbols
@@ -77,29 +78,45 @@ func (g *Encoder) MakeFrames( | |||
indices := make([]uint32, 0) | |||
frames := make([]Frame, g.NumChunks) | |||
|
|||
for i := uint64(0); i < uint64(g.NumChunks); i++ { | |||
numWorker := uint64(g.NumRSWorker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to convert to uint64, since L83 makes the conversion already.
Also it probably has no need to use that many workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is assigned to number chunks at L84
frames, | ||
) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move the k
defined on L76 down here could make it more readable
frames[k].Coeffs = coeffs | ||
} | ||
|
||
results <- nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be inside the for loop? It'll be then one result per JobRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are "numWorker" of threads, and "NumChunks" of jobs. If it is moved inside the for loop, the worker is terminated
encoding/rs/encode.go
Outdated
} | ||
jobChan <- jr | ||
k++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is k
needed? it's the same as i
encoding/rs/encode.go
Outdated
for i := uint64(0); i < g.NumChunks; i++ { | ||
j := rb.ReverseBitsLimited(uint32(g.NumChunks), uint32(i)) | ||
jr := JobRequest{ | ||
Index: uint64(i), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
is already uint64
Why are these changes needed?
This PR optimize encoder interpolation latency by
Previously, encoding 2MB data would take 2-3sec. After the change, the computation can be contained about 300ms
Checks