-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16982 csum: recalculate checksum on retrying #15786
base: google/2.6
Are you sure you want to change the base?
Conversation
Ticket title is 'We should not report checksum errors against the nmve device for key verification' |
I have already tested it by manually injecting failure, and I'm working on turning that into a unit test.
|
This PR fixes retry logic by actually recalculating the checksum; also it removes the code that incorrectly records nvme error. Change-Id: Ib0287851fea4d125eecda48c5ccb3c73ed85b8f8 Signed-off-by: Jinshan Xiong <[email protected]>
7f74db4
to
bb23b17
Compare
@@ -5140,6 +5141,11 @@ obj_csum_update(struct dc_object *obj, daos_obj_update_t *args, struct obj_auxi_ | |||
if (!obj_csum_dedup_candidate(&obj->cob_co->dc_props, args->iods, args->nr)) | |||
return 0; | |||
|
|||
if (obj_auxi->csum_retry) { | |||
/* Release old checksum result and prepare for new calculation */ | |||
daos_csummer_free_ic(obj->cob_co->dc_csummer, &obj_auxi->rw_args.iod_csums); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably want to do this after a couple of retries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really easy to add but I wonder if that is indeed necessary, because cksum error is a rare event by itself.
How about revising it to:
if (obj_auxi->csum_retry && obj_auxi->csum_retry_cnt > 2) { ... }
would that work for you?
/* Release old checksum result and prepare for new calculation */ | ||
daos_csummer_free_ic(obj->cob_co->dc_csummer, &obj_auxi->rw_args.iod_csums); | ||
} | ||
|
||
return dc_obj_csum_update(obj->cob_co->dc_csummer, obj->cob_co->dc_props, | ||
obj->cob_md.omd_id, args->dkey, args->iods, args->sgls, args->nr, | ||
obj_auxi->reasb_req.orr_singv_los, &obj_auxi->rw_args.dkey_csum, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of the actual issue we saw, it was the dkey_csum that needs to be recalculated, is that happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if I read the code correctly because we release the previous calculation above.
This PR fixes retry logic by actually recalculating the checksum; also it removes the code that incorrectly records nvme error.
Change-Id: Ib0287851fea4d125eecda48c5ccb3c73ed85b8f8
Signed-off-by: Jinshan Xiong [email protected]
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: