-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add json:""
marshaling annotations to response structs
#21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
VectorIds []*string | ||
Usage *Usage | ||
NextPaginationToken *string | ||
VectorIds []*string `json:"vector_ids,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: your question in the PR description, from what I've read at least it seems like adding omitempty
is a good call. I think it avoids ambiguous values -- like is a vector ID 0
, or does it not exist? I'm always a fan of being explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are places where not using omitempty
does make sense as it'll drop zero values that maybe you want to include.
For this PR I was mainly focused on making sure we're using omitempty
for pointers and structs where it makes sense, primarily to avoid null
values in the resulting JSON.
{ | ||
name: "Fields omitted", | ||
input: FetchVectorsResponse{}, | ||
want: `{}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we account for a case when some fields are present, but others are not (or are empty)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are cases where we do test for that like in TestMarshalingDescribeIndexStatsResponse
, but for something like this where all fields have omitempty
we just test for an empty struct.
}, | ||
Usage: &Usage{ReadUnits: toUInt32(5)}, | ||
}, | ||
want: `{"matches":[{"vector":{"id":"vec-1","values":[0.01,0.01,0.01]},"score":0.1},{"vector":{"id":"vec-2","values":[0.02,0.02,0.02]},"score":0.2}],"usage":{"read_units":5}}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth at least having the want
values as globals at the top? Or fixtures or whatever? Just to downsize the amount of duplicated code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely a lot of duplicative code. I'd prefer not to move the want
blocks around though as they're specific to the struct that's being tested, and it'd probably be more confusing to have to look them up elsewhere than in-line.
I can try and think of ways to clean some of this up though.
pinecone/models.go
Outdated
Size *int64 `json:"size,omitempty"` | ||
Status CollectionStatus `json:"status"` | ||
Dimension *int32 `json:"dimension,omitempty"` | ||
VectorCount *int32 `json:"vector_count,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an exception to the omitempty
thing I was saying above -- if vector_count
is empty, that actually is meaningful, right? Like, that means there are no vectors in the collection, right? If so, I think we should actually not have omitempty
there, since the value of 0
indicates the actual state of things (vs an ambiguous state)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for using omitempty
here is VectorCount
is a pointer, and I've tried to use omitempty
with all our pointers. The reason is otherwise an empty pointer is marshaled to null
in the JSON results.
For collections specifically, this is mainly to cover a specific issue with VectorCount
and other fields possibly being empty when creating a collection from an empty index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked the Collection
struct and the toCollection
function to map the pointers into zero values for the ints to make things easier to work with for consumers.
SparseValues *SparseValues | ||
Metadata *Metadata | ||
Id string `json:"id"` | ||
Values []float32 `json:"values,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we always had to have dense values, no? Like, you can technically "only" have sparse values in an index if you upload a dense vector that's just filled with 0
s and has a dimension of 1
or 2
or something, but I think you have to have a dense vector in order to upload a sparse vector (so I think omitempty
here is not good)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(b/c you should never have an empty dense vector value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do have to have dense values, yes. Similar to the pointer issue I mentioned here: #21 (comment) I was mainly doing this defensively to avoid instances where we'd have values: null
in the JSON, but I don't think that's ever expected to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the generated code that we map between uses omitempty
for this, so I think it's fine to leave it as-is for now:
Values []float32 `protobuf:"fixed32,2,rep,packed,name=values,proto3" json:"values,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
…e stripping out empty pointers on conversion, and properly converting the struct keys to lower and snake case, add a bunch of unit tests to valid marshaling
7ac687d
to
567551d
Compare
…ional structs to default to 0 integer values rather than passing a pointer, clean up tests
Problem
There's been some UX feedback regarding marshaling of some of our response structs.
Primarily, we're not using
json:"key_name"
annotation for our struct fields so they default to capitalized keys in JSON output when marshaled. IE:{ "Usage": { "ReadUnits": 5 }}
. We'd like these lower case and matched with our API spec.There are also instances where we're returning pointers for certain
*int32
fields which can be irritating to consumers needing to dereference and nil-check. In certain cases we can default any nil pointer values to 0 before returning things.There's no testing for the marshaling on any of our custom structs.
Solution
I may have gone a bit overboard with the testing here. The bulk of this is tests and the tests are all very similar, definitely open to opinions on removing if people find them unnecessary, but changing up how the JSON is produced is pretty easy and it's an important point of integration in my mind, so maybe worth testing for.
I also wasn't clear if adding these annotations to our request structs made sense either, so I left those out for now. Overall I'm still working on getting a better understanding of Go syntax and expectations.
models.go
andindex_connection.go
.Collection
andUsage
structs to default integer values rather than using pointers.models_test.go
file to run through some validation on struct marshaling, also add unit tests toindex_connection_test.go
.Originally I thought
omitempty
for some of these pointers may be helpful from a consumer standpoint, but I'm actually not sure the more I've thought about it. Does it make sense to omit more or less things? Other feedback here would be much appreciated.Type of Change
Test Plan
Unit tests to explicitly stress marshaling the structs via
json.Marshal()
.