-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
commitlog recovery time #129
Comments
So it appears one issue is this does not actually seem to do what it says: Lines 200 to 208 in 1c69c5c
The issue is that the memory-mapped index file is 10MB, so reading at |
Yeah we shouldn't have to rebuild every time. Initially, I didn't and then someone hit a bug where the index wasn't setup properly and submitted a PR doing this to hack a fix and I left it to optimize later. I can't quite remember what Kafka does here but binary searching for the last entry sounds good to me. |
I have it working with a binary search. Will submit a PR shortly. |
Awesome - you da man. |
@travisjeffery One snag I've run into: I'm using a fork of the commitlog package which I've modified index entries to also contain a i := sort.Search(n, func(i int) bool {
if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
panic(err)
}
return entry.Position == 0 && entry.Size == 0
}) Of course, Jocko doesn't have the i := sort.Search(n, func(i int) bool {
if err := idx.ReadEntryAtFileOffset(entry, int64(i*entryWidth)); err != nil {
panic(err)
}
return entry.Position == 0 && i > 0
}) However, the problem is handling the special case where the index has a single entry. We have no way of distinguishing between an empty index and an index with one entry in it since in both cases the first entry will have a Any ideas? |
I think in NewIndex we can check if the index file exists and if it does assume the index is legit and that the length of the file marks the position of the next entry. And we can tell the number of entries based on this, cause if there is no entry the file will be 0 bytes, with 1 entry it'll be 8 bytes. We gotta change the code in here: https://github.com/travisjeffery/jocko/blob/master/commitlog/index.go#L86-L89. Cause right now the index file itself is initialized to the full size, when we should leave index file as is and only initialize the mmap's length by passing the length into the MapRegion or MapAt apis. |
Not sure if the gommap lib which is pretty jank will support that though, which is what Kafka is able to do I think. Alternatively when the index is created we could write -1 as the first offset and overwrite that on the first entry. |
The recovery time of a commitlog is currently linear with the number of entries in the log. As a result, a log with many messages (e.g. millions) can take several seconds or more to recover. The reason is the index is rebuilt by reading back the log:
jocko/commitlog/segment.go
Lines 60 to 65 in 1c69c5c
It's unclear to me what the purpose of truncating the index and rebuilding it is. AFAICT, it just initializes the segment's
nextOffset
andposition
. However, the index sanity check is already reading the last entry which we can recovernextOffset
from. Is there another purpose for rebuilding the index?The text was updated successfully, but these errors were encountered: