Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add json:"" marshaling annotations to response structs #21

Merged
merged 3 commits into from
May 7, 2024

Conversation

austin-denoble
Copy link
Contributor

@austin-denoble austin-denoble commented May 3, 2024

Problem

There's been some UX feedback regarding marshaling of some of our response structs.

Primarily, we're not using json:"key_name" annotation for our struct fields so they default to capitalized keys in JSON output when marshaled. IE: { "Usage": { "ReadUnits": 5 }}. We'd like these lower case and matched with our API spec.

There are also instances where we're returning pointers for certain *int32 fields which can be irritating to consumers needing to dereference and nil-check. In certain cases we can default any nil pointer values to 0 before returning things.

There's no testing for the marshaling on any of our custom structs.

Solution

I may have gone a bit overboard with the testing here. The bulk of this is tests and the tests are all very similar, definitely open to opinions on removing if people find them unnecessary, but changing up how the JSON is produced is pretty easy and it's an important point of integration in my mind, so maybe worth testing for.

I also wasn't clear if adding these annotations to our request structs made sense either, so I left those out for now. Overall I'm still working on getting a better understanding of Go syntax and expectations.

  • Add JSON marshaling annotations to the response structs in models.go and index_connection.go.
  • Update Collection and Usage structs to default integer values rather than using pointers.
  • Add new models_test.go file to run through some validation on struct marshaling, also add unit tests to index_connection_test.go.

Originally I thought omitempty for some of these pointers may be helpful from a consumer standpoint, but I'm actually not sure the more I've thought about it. Does it make sense to omit more or less things? Other feedback here would be much appreciated.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Unit tests to explicitly stress marshaling the structs via json.Marshal().


Copy link
Contributor

@haruska haruska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

VectorIds []*string
Usage *Usage
NextPaginationToken *string
VectorIds []*string `json:"vector_ids,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: your question in the PR description, from what I've read at least it seems like adding omitempty is a good call. I think it avoids ambiguous values -- like is a vector ID 0, or does it not exist? I'm always a fan of being explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are places where not using omitempty does make sense as it'll drop zero values that maybe you want to include.

For this PR I was mainly focused on making sure we're using omitempty for pointers and structs where it makes sense, primarily to avoid null values in the resulting JSON.

{
name: "Fields omitted",
input: FetchVectorsResponse{},
want: `{}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we account for a case when some fields are present, but others are not (or are empty)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases where we do test for that like in TestMarshalingDescribeIndexStatsResponse, but for something like this where all fields have omitempty we just test for an empty struct.

},
Usage: &Usage{ReadUnits: toUInt32(5)},
},
want: `{"matches":[{"vector":{"id":"vec-1","values":[0.01,0.01,0.01]},"score":0.1},{"vector":{"id":"vec-2","values":[0.02,0.02,0.02]},"score":0.2}],"usage":{"read_units":5}}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth at least having the want values as globals at the top? Or fixtures or whatever? Just to downsize the amount of duplicated code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely a lot of duplicative code. I'd prefer not to move the want blocks around though as they're specific to the struct that's being tested, and it'd probably be more confusing to have to look them up elsewhere than in-line.

I can try and think of ways to clean some of this up though.

Size *int64 `json:"size,omitempty"`
Status CollectionStatus `json:"status"`
Dimension *int32 `json:"dimension,omitempty"`
VectorCount *int32 `json:"vector_count,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an exception to the omitempty thing I was saying above -- if vector_count is empty, that actually is meaningful, right? Like, that means there are no vectors in the collection, right? If so, I think we should actually not have omitempty there, since the value of 0 indicates the actual state of things (vs an ambiguous state)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for using omitempty here is VectorCount is a pointer, and I've tried to use omitempty with all our pointers. The reason is otherwise an empty pointer is marshaled to null in the JSON results.

For collections specifically, this is mainly to cover a specific issue with VectorCount and other fields possibly being empty when creating a collection from an empty index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the Collection struct and the toCollection function to map the pointers into zero values for the ints to make things easier to work with for consumers.

SparseValues *SparseValues
Metadata *Metadata
Id string `json:"id"`
Values []float32 `json:"values,omitempty"`
Copy link
Contributor

@aulorbe aulorbe May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we always had to have dense values, no? Like, you can technically "only" have sparse values in an index if you upload a dense vector that's just filled with 0s and has a dimension of 1 or 2 or something, but I think you have to have a dense vector in order to upload a sparse vector (so I think omitempty here is not good)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(b/c you should never have an empty dense vector value)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do have to have dense values, yes. Similar to the pointer issue I mentioned here: #21 (comment) I was mainly doing this defensively to avoid instances where we'd have values: null in the JSON, but I don't think that's ever expected to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the generated code that we map between uses omitempty for this, so I think it's fine to leave it as-is for now:

Values []float32 `protobuf:"fixed32,2,rep,packed,name=values,proto3" json:"values,omitempty"`

Copy link
Contributor

@aulorbe aulorbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

…e stripping out empty pointers on conversion, and properly converting the struct keys to lower and snake case, add a bunch of unit tests to valid marshaling
@austin-denoble austin-denoble force-pushed the adenoble/json-marshaling-annotations branch from 7ac687d to 567551d Compare May 6, 2024 20:59
…ional structs to default to 0 integer values rather than passing a pointer, clean up tests
@austin-denoble austin-denoble merged commit cbb7f75 into main May 7, 2024
3 checks passed
@austin-denoble austin-denoble deleted the adenoble/json-marshaling-annotations branch May 7, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants