Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rados: implement binding for rados_write_op_cmpext #581

Merged
merged 1 commit into from
Dec 8, 2021

Conversation

gman0
Copy link
Contributor

@gman0 gman0 commented Sep 27, 2021

This PR adds binding for rados_write_op_cmpext RADOS Write operation including unit tests.

Checklist

  • Added tests for features and functional changes
  • Public functions and types are documented
  • Standard formatting is applied to Go code

Related: ceph/ceph-csi#2148

Copy link
Collaborator

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generally looks good to me. However, we've recently started using a new api-stability policy were new API calls are put in a "preview" state and protected by a build tag.
Below I've suggested the required doc comment change, but we'll also need the new API and test placed in a file with the build tag "ceph_preview" - I'm open to alternative names but since this is an existing larger functioal area you can default to "write_op_preview.go" and "write_op_preview_test.go" if nothing better comes to mind.

I hope this isn't too much of a hassle. Please do let me know if you have any comments / questions about it.

rados/write_op.go Outdated Show resolved Hide resolved
@phlogistonjohn
Copy link
Collaborator

Sorry, one small thing in addition to the above: please squash the 2nd patch into the first as well as any subsequent patches for the doc comment and file splitting/build tag work. Thanks!

@gman0 gman0 force-pushed the rados_write_op_cmpext branch 2 times, most recently from b0e5562 to 2926a90 Compare September 28, 2021 12:42
@phlogistonjohn phlogistonjohn added the API This PR includes a change to the public API of a go-ceph package label Sep 28, 2021
@phlogistonjohn
Copy link
Collaborator

@Mergifyio rebase

@mergify
Copy link

mergify bot commented Sep 28, 2021

Command rebase: success

Branch has been successfully rebased

@leseb leseb force-pushed the rados_write_op_cmpext branch from 2926a90 to a4c3be6 Compare September 28, 2021 13:35
@gman0
Copy link
Contributor Author

gman0 commented Sep 28, 2021

@phlogistonjohn regarding the API stability policy you've mentioned: how long does it take for a feature to graduate from preview into GA?

@phlogistonjohn
Copy link
Collaborator

@phlogistonjohn regarding the API stability policy you've mentioned: how long does it take for a feature to graduate from preview into GA?

This isn't really firm yet, but I had previously thought that 2 releases would be sufficient (see: #525 (reply in thread) )
However, if an api is uncontroversial, I could see asking for a shorter incubation period as part of the PR discussion.
When PR #565 is merged (assuming it is) PRs will require a patch to include an "added in version" and "expected stable version" tracked in a json file. Thus the submitter can suggest a release for it to go stable.

Please do participate in #525 for deeper discussions for the api stability plan too!

@phlogistonjohn
Copy link
Collaborator

I forgot to mention, if you need to use a preview api in practice it's just a matter of specifying the ceph_preview build tag. Because go-ceph is not v1 yet we want to make sure we can change things in practice, but we also have many real world consumers and thus don't want to break things willy-nilly. Thus the policy of putting some apis behind a build tag so that we're not "locked in" until after the preview period is over.

// size_t cmp_len,
// uint64_t off,
// int * prval);
func (w *WriteOp) CmpExt(b []byte, offset uint64, prval *int32) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose an int32 and not just int?

oe.cBuffer,
oe.cDataLen,
oe.cOffset,
(*C.int)(unsafe.Pointer(prval)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are several problems luring here:

  • a C.int is possibly larger than an int32 which will cause the C code to overwrite the memory after prval.
  • prval is not returned immediately, but set, when the step is executed. That means the C code keeps storing prval after the function returned. Since prval is a Go pointer, the pointer passing rules don't allow that. So prval actually must be C allocated.

If I understand the existing code correctly, you are supposed to use &oe.rval here, which is exactly for this use. @phlogistonjohn can you confirm?

Copy link
Collaborator

@ansiwen ansiwen Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw that writeStep doesn't have an rval yet. So your would have to add it, like in the GetOmapStep. Again, @phlogistonjohn, please confirm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have in fact a bug around that in the existing code. I filed it under #584.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catches thanks.

The way I understand it, the rval/prval stuff helps you determine which specific step failed when you have a batch of many steps. Since we use the various *Step types to mediate the actual C batches into Go we should probably take that approach or have a specific step type for CmpExt.

I'll reply to the new issue separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ansiwen! I guess you're referring to this?

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily.

Please correct me if I'm wrong, but doesn't this restriction (not keeping pointers around after C function returns) only pertain to GC and the possibility of C code accessing memory after it's been free'd? The way I understood it, the struct is stored in r.steps, which is definitely kept around at least until Operate() is called (so it won't be hit by GC), after which that memory is never touched by the C code again. Isn't this enough to guarantee that use-after-free won't occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the integer size, is it guaranteed that C's and Go's int types would have the same size, or that Go's int is at least as big as C's?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gman0

I guess you're referring to this?

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily.

No, in this case I'm referring to one paragraph later:

C code may not keep a copy of a Go pointer after the call returns.

and

C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.

Please correct me if I'm wrong, but doesn't this restriction (not keeping pointers around after C function returns) only pertain to GC and the possibility of C code accessing memory after it's been free'd? The way I understood it, the struct is stored in r.steps, which is definitely kept around at least until Operate() is called (so it won't be hit by GC), after which that memory is never touched by the C code again. Isn't this enough to guarantee that use-after-free won't occur?

That is the most obvious case of how pointers can get invalid, but not the only one. The GC is allowed to move objects around, and the current implementation in Go actually does it already in the case of objects on the stack. But also objects on the heap could be moved, although that is not implemented in practice yet. But another implementation of Go or later versions of Go might do that. The problem is, there is no contract that guarantees, that an object will always stay at the same address. The GC keeps track of all pointers in Go memory, so it could adjust them as necessary, but it cannot do so with pointers in C memory, which means they would become dangling.

Regarding the integer size, is it guaranteed that C's and Go's int types would have the same size, or that Go's int is at least as big as C's?

Actually I don't know, I also would have to look it up. And that's why we should use the type C.int, which is guaranteed to have the same size as an int in C. IIUC you will remove the prval from the external API, so it's no problem to use C.int. But in general, if I want to make sure that two types have the same size at compile time, I use this trick:

const _ = -(unsafe.Sizeof(int(0)) - unsafe.Sizeof(C.int(0)))

This always results in a constant ... overflows uintptr compile error, if the sizes are no equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ansiwen Thank you, that's very insightful.

For checking type sizes, wouldn't it be enough to make sure that sizeof( Go's int ) >= sizeof( C's int ) -- at least in this case?

const _ = unsafe.Sizeof(int(0)) - unsafe.Sizeof(C.int(0))

Because otherwise with hard equality, I'm in fact getting an overflow error on an amd64 CentOS system. Printing these values revealed that on this particular system Go's int is 8 bytes while C's is 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like 4596bd7 work?

@gman0 gman0 force-pushed the rados_write_op_cmpext branch from 4596bd7 to 5a2db33 Compare October 6, 2021 08:28
@gman0
Copy link
Contributor Author

gman0 commented Oct 6, 2021

In the last commit I've accidentally left in code that was unrelated to this PR.

@phlogistonjohn
Copy link
Collaborator

Just a friendly reminder that the next go-ceph release is less than a week away (2021-10-12). The following release will be in December.

If there's no strong need for this right away that's great, and not a problem. I just want to make sure we're all aware. Thanks!

@gman0
Copy link
Contributor Author

gman0 commented Oct 6, 2021

@phlogistonjohn thanks for the notice! I have two more PRs in queue (that are hopefully less controversial than this one), but I wasn't aware the deadline is so close.

@gman0 gman0 force-pushed the rados_write_op_cmpext branch 2 times, most recently from 86d4ade to 496dc49 Compare October 7, 2021 12:10
@gman0
Copy link
Contributor Author

gman0 commented Oct 7, 2021

The CI seems to be failing on this?

2021-10-07T12:26:18.3476195Z === RUN   TestMirroring
2021-10-07T12:26:28.3854085Z     mirror_workflow_test.go:63: 
2021-10-07T12:26:28.3855613Z         	Error Trace:	mirror_workflow_test.go:63
2021-10-07T12:26:28.3856426Z         	Error:      	Received unexpected error:
2021-10-07T12:26:28.3858905Z         	            	rados: ret=-95, Operation not supported: "Module 'mirroring' is not enabled (required by command 'fs snapshot mirror enable'): use `ceph mgr module enable mirroring` to enable it"
2021-10-07T12:26:28.3860089Z         	Test:       	TestMirroring
2021-10-07T12:26:30.5778381Z --- FAIL: TestMirroring (12.23s)

@phlogistonjohn
Copy link
Collaborator

The CI seems to be failing on this?

2021-10-07T12:26:18.3476195Z === RUN   TestMirroring
2021-10-07T12:26:28.3854085Z     mirror_workflow_test.go:63: 
2021-10-07T12:26:28.3855613Z         	Error Trace:	mirror_workflow_test.go:63
2021-10-07T12:26:28.3856426Z         	Error:      	Received unexpected error:
2021-10-07T12:26:28.3858905Z         	            	rados: ret=-95, Operation not supported: "Module 'mirroring' is not enabled (required by command 'fs snapshot mirror enable'): use `ceph mgr module enable mirroring` to enable it"
2021-10-07T12:26:28.3860089Z         	Test:       	TestMirroring
2021-10-07T12:26:30.5778381Z --- FAIL: TestMirroring (12.23s)

That's a common "test flake" that is tracked in #574 - I will restart the CI run.

@phlogistonjohn
Copy link
Collaborator

@Mergifyio rebase

@mergify
Copy link

mergify bot commented Oct 18, 2021

rebase

✅ Branch has been successfully rebased

@leseb leseb force-pushed the rados_write_op_cmpext branch from 496dc49 to 666cdc8 Compare October 18, 2021 18:11
@phlogistonjohn
Copy link
Collaborator

@Mergifyio rebase

@mergify
Copy link

mergify bot commented Nov 15, 2021

rebase

✅ Branch has been successfully rebased

@ktdreyer ktdreyer force-pushed the rados_write_op_cmpext branch from 666cdc8 to 8329507 Compare November 15, 2021 23:13
@phlogistonjohn
Copy link
Collaborator

PR #597 has been merged after reducing it's scope a bit. I think you've already incorporated the changes needed to match the fixes in that PR, but I recommend double checking.

To pass the CI we need to ensure the new API is documented. I've recently added docs pertaining to that:
https://github.com/ceph/go-ceph/blob/master/docs/development.md#documentation-conventions

Please review and feel free to ask any questions or for any assistance as I realize this process is new.

@phlogistonjohn
Copy link
Collaborator

Hi @gman0 - do you have some time to revisit this in the next couple of weeks? I think it would be disappointing if we couldn't manage to get this PR in shape before the next release. Please let us know if there's any confusion about the process, as there have been recent changes, so we can help!

@gman0
Copy link
Contributor Author

gman0 commented Nov 29, 2021

Hi @phlogistonjohn thank you for the reminder and apologies for the delay, I'll try to work on this ASAP. I'm just busy with other tasks at the moment.

There's actually about 6 more bindings for read and write RADOS ops I'd like to add though... I'm fully aware of December's release, and have doubts all these changes will land in time.

@phlogistonjohn
Copy link
Collaborator

Hi @phlogistonjohn thank you for the reminder and apologies for the delay, I'll try to work on this ASAP. I'm just busy with other tasks at the moment.

Totally understood.

There's actually about 6 more bindings for read and write RADOS ops I'd like to add though... I'm fully aware of December's release, and have doubts all these changes will land in time.

File them at whatever pace works for you, we'll try to be responsive and if it doesn't seem like it don't hesitate to "@"-ping us. Since this one has been open a while I just want to give it a fair chance to land in the Dec. release!

@gman0 gman0 force-pushed the rados_write_op_cmpext branch from 8329507 to e1d7ac3 Compare November 30, 2021 16:23
@gman0
Copy link
Contributor Author

gman0 commented Nov 30, 2021

I have two questions:

  1. The development docs mention make api-update. Do I run this and commit the changes as a part of this PR too? Or is this done only by you guys when cutting a release?
  2. I'm not sure about the WriteOpCmpExtStep struct I've added to this PR. Do we add a struct for each write operation? It's a lot of boilerplate code, and there are more ops that return result via int* in args. Is it ok?

@gman0
Copy link
Contributor Author

gman0 commented Nov 30, 2021

PTAL @phlogistonjohn @ansiwen

@gman0
Copy link
Contributor Author

gman0 commented Nov 30, 2021

Regarding the CI: I can't see TestWriteOpCmpExt being run nor which test failed?

@phlogistonjohn
Copy link
Collaborator

I have two questions:

1. The [development docs](https://github.com/ceph/go-ceph/blob/master/docs/development.md#documentation-conventions) mention `make api-update`. Do I run this and commit the changes as a part of this PR too? Or is this done only by you guys when cutting a release?

You and other contributors like you are expected to run the command and commit the results.

2. I'm not sure about the `WriteOpCmpExtStep` struct I've added to this PR. Do we add a struct for each write operation? It's a lot of boilerplate code, and there are more ops that return result via `int*` in args. Is it ok?

I'll take a look.

Regarding the CI: I can't see TestWriteOpCmpExt being run nor which test failed?

I think you hit a test flake / timeout issue. I've restarted the tests. If we see the same problem occur repeatedly on your pr it might be related, but at this time I doubt it, I think you got unlucky.

@phlogistonjohn
Copy link
Collaborator

WriteOpCmpExtStep seems OK to me. We do want a new step type for each new kind of step. I think your pr is good as-is wrt WriteOpCmpExtStep. If we do find ourselves repeating a lot of code in the future, we can create a private "helper" type that gets embedded in the various AbcXyzStep types. Does that make sense?

@gman0
Copy link
Contributor Author

gman0 commented Nov 30, 2021

Thanks! Sounds good. I'll run make api-update and push.

@gman0
Copy link
Contributor Author

gman0 commented Dec 1, 2021

@phlogistonjohn sorry but I'm having trouble getting make api-update to finish:

# make api-update
make RESULTS_DIR="/tmp/go-ceph/_results" ENTRYPOINT_ARGS="--test-run=IMPLEMENTS --micro-osd=/bin/true " test-container
make[1]: Entering directory '/tmp/go-ceph'
docker run --security-opt label=disable --rm --hostname test_ceph_aio \
	-v /tmp/go-ceph:/go/src/github.com/ceph/go-ceph:z -v /tmp/go-ceph/_results:/results:z  \
	go-ceph-ci:octopus --test-run=IMPLEMENTS --micro-osd=/bin/true 
*** running: /bin/true /tmp/ceph
skipping tests, executing implements tool
go: downloading github.com/stretchr/testify v1.4.0
go: downloading golang.org/x/sys v0.0.0-20200501145240-bc7a7d42d5c3
go: downloading github.com/gofrs/uuid v3.2.0+incompatible
go: downloading github.com/aws/aws-sdk-go v1.35.24
go: downloading github.com/davecgh/go-spew v1.1.0
go: downloading github.com/pmezard/go-difflib v1.0.0
go: downloading gopkg.in/yaml.v2 v2.2.8
go: downloading github.com/jmespath/go-jmespath v0.4.0
go: downloading github.com/gofrs/uuid v1.2.0
github.com/ceph/go-ceph
github.com/ceph/go-ceph/internal/retry
golang.org/x/sys/internal/unsafeheader
github.com/ceph/go-ceph/common/commands
github.com/ceph/go-ceph/internal/cutil
github.com/ceph/go-ceph/internal/errutil
golang.org/x/sys/unix
github.com/ceph/go-ceph/contrib/implements/internal/implements
github.com/ceph/go-ceph/contrib/implements
github.com/ceph/go-ceph/internal/callbacks
github.com/ceph/go-ceph/rgw
github.com/aws/aws-sdk-go/aws/awserr
github.com/ceph/go-ceph/internal/timespec
github.com/aws/aws-sdk-go/internal/shareddefaults
github.com/aws/aws-sdk-go/internal/ini
github.com/aws/aws-sdk-go/internal/sync/singleflight
github.com/aws/aws-sdk-go/aws/endpoints
github.com/aws/aws-sdk-go/internal/sdkio
github.com/aws/aws-sdk-go/aws/credentials
github.com/jmespath/go-jmespath
github.com/aws/aws-sdk-go/aws/client/metadata
github.com/aws/aws-sdk-go/internal/strings
github.com/ceph/go-ceph/rados
github.com/aws/aws-sdk-go/internal/sdkmath
github.com/aws/aws-sdk-go/aws/awsutil
github.com/aws/aws-sdk-go/aws
github.com/aws/aws-sdk-go/aws/request
github.com/aws/aws-sdk-go/private/protocol
github.com/aws/aws-sdk-go/private/protocol/rest
github.com/aws/aws-sdk-go/aws/signer/v4
github.com/ceph/go-ceph/rgw/admin
github.com/ceph/go-ceph/internal/commands
github.com/ceph/go-ceph/contrib
github.com/ceph/go-ceph/cephfs
github.com/ceph/go-ceph/rbd
github.com/ceph/go-ceph/cephfs/admin
github.com/ceph/go-ceph/rbd/admin
rm -f ./implements
go build -o implements ./contrib/implements
make[1]: Leaving directory '/tmp/go-ceph'
./contrib/apiage.py --mode=update \
	--current-tag="$(git describe --tags --abbrev=0)"
error: no source data found (path: ./_results/implements.json)
make: *** [Makefile:221: api-update] Error 1

go-ceph/_results/ is indeed empty, there's no implements.json.

@gman0 gman0 force-pushed the rados_write_op_cmpext branch from e1d7ac3 to 0b8e223 Compare December 1, 2021 09:30
@gman0
Copy link
Contributor Author

gman0 commented Dec 1, 2021

By the way, the CI was failing because of gofmt. Apparently I missed adding //go:build ceph_preview build tag in the newly added .go files. Now it's failing on ptrguard tests though:

=== RUN   TestMirroring
568
    mirror_workflow_test.go:103: 
569
        	Error Trace:	mirror_workflow_test.go:103
570
        	Error:      	Received unexpected error:
571
        	            	cephfs: ret=-17, File exists
572
        	Test:       	TestMirroring

@gman0
Copy link
Contributor Author

gman0 commented Dec 1, 2021

go-ceph/entrypoint.sh

Lines 243 to 253 in 20da1c6

implements_tool() {
if command -v castxml >/dev/null ; then
mkdir -p "${RESULTS_DIR}"
show ./implements --list \
--report-json "${RESULTS_DIR}/implements.json" \
--report-text "${RESULTS_DIR}/implements.txt" \
cephfs rados rbd cephfs/admin rbd/admin rgw/admin
# output the brief summary info onto stdout
grep '^[A-Z]' "${RESULTS_DIR}/implements.txt"
fi
}

implements_tool silently exists if it can't find castxml in $PATH.

@gman0 gman0 force-pushed the rados_write_op_cmpext branch from 0b8e223 to 0955c0b Compare December 1, 2021 11:33
@gman0
Copy link
Contributor Author

gman0 commented Dec 1, 2021

I've run make api-update (but had to pass castxml to test-container as a workaround for #614). I believe all comments are now addressed? PTAL @phlogistonjohn @ansiwen

phlogistonjohn
phlogistonjohn previously approved these changes Dec 1, 2021
Copy link
Collaborator

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a bunch for sticking with this.

Copy link
Collaborator

@ansiwen ansiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some cosmetic suggestions. Otherwise LGTM.

Comment on lines 25 to 26
// IntSize is the size of C.int
IntSize = C.sizeof_int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. The constants in cutil are for files that don't import C, but you are only using it in a file that imports C anyway, so you can just use C.sizeof_int directly there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I didn't know that.

)

// Compile-time assertion ensuring that Go's `int` is at least as large as C's.
const _ = unsafe.Sizeof(int(0)) - unsafe.Sizeof(C.int(0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be

Suggested change
const _ = unsafe.Sizeof(int(0)) - unsafe.Sizeof(C.int(0))
const _ = unsafe.Sizeof(int(0)) - C.sizeof_int

but doesn't matter actuallly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably nicer. Maybe later you / we could find a better place for these compile-time assertions, or is cutil/aliases.go ok?

}

func (s *WriteOpCmpExtStep) update() error {
s.Result = (int)(*s.prval)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s.Result = (int)(*s.prval)
s.Result = int(*s.prval)

@gman0 gman0 force-pushed the rados_write_op_cmpext branch from 0955c0b to 2b0d7fd Compare December 8, 2021 16:42
@mergify mergify bot dismissed phlogistonjohn’s stale review December 8, 2021 16:43

Pull request has been modified.

This commit implements binding for rados_write_op_cmpext RADOS Write operation.
Includes a unit test.

Signed-off-by: Robert Vasek <[email protected]>
@gman0 gman0 force-pushed the rados_write_op_cmpext branch from 2b0d7fd to 9d4b149 Compare December 8, 2021 17:06
@gman0
Copy link
Contributor Author

gman0 commented Dec 8, 2021

@ansiwen @phlogistonjohn I've addressed the comments, can you please take a look again?

Copy link
Collaborator

@ansiwen ansiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Collaborator

@phlogistonjohn phlogistonjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for your contribution here.

@mergify mergify bot merged commit 0443277 into ceph:master Dec 8, 2021
@gman0
Copy link
Contributor Author

gman0 commented Dec 8, 2021

Thank you both for all the feedback and reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API This PR includes a change to the public API of a go-ceph package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants