Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docstrings for index_connection.go #31

Merged
merged 22 commits into from
Jun 28, 2024

Conversation

aulorbe
Copy link
Contributor

@aulorbe aulorbe commented Jun 24, 2024

Problem

We do not have our publicly exported funcs, methods, or structs documented in godoc comments.

Asana ticket: https://app.asana.com/0/1203260648987893/1207649763804261/f

Solution

Add 'em!

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

CI passes.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful, @aulorbe! Just a few requests. And let me know if/when you want a closer copy review.

func (idx *IndexConnection) Close() error {
err := idx.grpcConn.Close()
return err
}

// UpsertVectors indexes vectors into a Pinecone index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think you could add metadata and sparse values to the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// fmt.Println("Error:", err)
// }
//
// queryVector := []float32{1.0, 2.0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include a metadata filter? A hybrid query would be nice as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you!

@aulorbe aulorbe requested a review from austin-denoble June 25, 2024 17:46
@aulorbe aulorbe marked this pull request as ready for review June 25, 2024 17:46
Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for getting all of these added! Took an initial pass through it, but I should probably do another.

Overall looking good, I left some similar feedback to the Client PR. Mainly, I think we need to remove all the usage of getEnvVars in these examples, and we could probably look at using DescribeIndex over ListIndexes in some of these code examples for targeting the index host.

Let me know if you'd like to sync to discuss the feedback holistically for these PRs.

// IndexConnection holds the parameters for Pinecone IndexConnection object.
//
// Fields:
// - Namespace: The namespace of the index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Namespace line looks too indented. I think for wording we could be a bit more explicit, like the TypeScript client here: https://github.com/pinecone-io/pinecone-ts-client/blob/4fe3e04e05c4c8748262e5a00446bcb656d379eb/src/data/index.ts#L131

"The namespace where index operations will be performed." Just something to make the targeting aspect for the connection a bit more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: indentation -- I doubled checked and without an extra whitespace in the front, they don't turn into bullet pts in the godoc rendering, so I think they gotta stay as-is.

Changed the namespace wording, thank you (and the add'l metadata wording)!

Comment on lines 20 to 23
// - additionalMetadata: Additional metadata to be sent with each request.
// - dataClient: The gRPC client for the index.
// - grpcConn: The gRPC connection.
type IndexConnection struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is public but these fields are technically private in that they're not exported. I think we can still add doc comments for them.

nitpick'y wording:
"Additional metadata to be sent with each RPC request."

// ctx := context.Background()
//
// clientParams := pinecone.NewClientParams{
// ApiKey: getEnvVars("PINECONE_API_KEY"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this feedback on the Client PR: we should not be using getEnvVars() for this.

I'd either use a string directly or use os.Getenv():

ApiKey: "YOUR_API_KEY",

or

ApiKey: os.Getenv("PINECONE_API_KEY")

We try and pull PINECONE_API_KEY from the environment inside the SDK though, so it's a little redundant maybe to show it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll just use

ApiKey: "YOUR_API_KEY"

@@ -49,11 +56,88 @@ func newIndexConnection(in newIndexParameters) (*IndexConnection, error) {
return &idx, nil
}

// Close closes the connection to a Pinecone index.
//
// Returns a pointer to an IndexConnection object and an error if the connection cannot be closed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, the Close() only returns an error if there is one, otherwise it returns nil:

I would also update this to explicitly refer to this closing the grpc.ClientConn.

Comment on lines 63 to 64
// Parameters:
// idx: The IndexConnection object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idx: IndexConnection isn't a parameter here, it's the receiver for the function, meaning it's part of the IndexConnection struct. I don't think we need to refer to the receiver in doc comments at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thank you so much!

// clientParams := pinecone.NewClientParams{
// ApiKey: getEnvVars("PINECONE_API_KEY"),
// SourceTag: "your_source_identifier", // optional
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Check indentations.

//
// count, err := idx.UpsertVectors(ctx, vectors)
// if err != nil {
// fmt.Println("Error:", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Similar to previous feedback - anywhere we're doing an "Error:" let's look at adding a little more context around what failed so grepping the code examples is a bit quicker.

Noticed this issue across the comments in this file.

// allowing for the request to be canceled or to timeout according to the context's deadline.
// - in: The vectors to index.
//
// Returns the number of vectors upserted and an error if the request fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: or an error

Comment on lines 121 to 126
// idxs, err := pc.ListIndexes(ctx)
//
// idx, err := pc.Index(idxs[0].Host)
// if err != nil {
// fmt.Println("Error:", err)
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a missing err check after the ListIndexes call.

Also, I think we could use DescribeIndex to just pull a single index and then that makes the example code a bit clearer for these, let me know what you think.

//
// res, err := idx.QueryByVectorValues(ctx, &pinecone.QueryByVectorValuesRequest{
// Vector: queryVector,
// TopK: topK, // number of vectors (nearest neighbors) you want returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be referencing "nearest neighbors"? I don't think we do in other SDKs, and it may be superfluous here. Isn't it based on the similarity metric anyways?

It's in a few places in the comments in this file, I'm thinking we should maybe just drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah can do!

@aulorbe aulorbe changed the title Add godoc comments to index_connection.go Add docstrings for index_connection.go Jun 25, 2024
Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! These look great, left a few comments and nits which you can take or leave. I think we could always iterate on these as we move forwards anyways, and you've spent plenty of time getting these looking nice.

// Parameters:
// - ctx: A context.Context object controls the request's lifetime,
// allowing for the request to be canceled or to timeout according to the context's deadline.
// - in: The vectors to index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still "index".

Comment on lines 128 to 129
// Values: []float32{1.0, 2.0},
// },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Indentation here is actually a bit off (checked locally). I think just indent the Values line, and move the bracket back.

Comment on lines 234 to 235
// - Prefix: The prefix by which to filter.
// - Limit: The maximum number of vectors to return.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two could use a little more clarity due to how the default behavior works, what do you think?

Suggested change
// - Prefix: The prefix by which to filter.
// - Limit: The maximum number of vectors to return.
// - Prefix: The prefix by which to filter. If unspecified, an empty string will be used which will list all vector ids in // the namespace.
// - Limit: The maximum number of vectors to return. If unspecified, the server will use a default value.

Referencing what I wrote 😅: https://github.com/pinecone-io/pinecone-ts-client/blob/4fe3e04e05c4c8748262e5a00446bcb656d379eb/src/data/list.ts#L10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh great! I didn't know that was the default behavior, actually. Thanks!

// which is passed into the QueryByVectorValues method.
//
// Fields:
// - Vector: The ID of the vector for which you want to find similar vectors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be talking about the vector itself, not the ID:

Suggested change
// - Vector: The ID of the vector for which you want to find similar vectors.
// - Vector: The query vector for which you want to find similar vectors.

Or however you want to word it.

// which is passed into the QueryByVectorId method.
//
// Fields:
// - VectorId: The ID of the vector for which you want to find similar vectors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We mention the id being "unique" in some of our other doc strings.

Comment on lines +514 to +516
// Returns an error if the request fails,
// otherwise returns nil. This method will also return nil if the passed vector ID does not exist in the index or
// namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Line breaks on this are a little wonky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah unfortunately this is just the auto-wrapping on my linter. I think I should keep it just so it's consistent throughout this PR. WDYT?

Comment on lines +518 to +519
// Note: You must instantiate an IndexWithNamespace connection in order to delete vectors by ID in namespaces other
// than the default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 621 to 624
//
// Returns an error if the request fails, otherwise returns nil.
//
// Parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also put the note about needing to instantiate an index with a namespace, etc here?

// Note: You must instantiate an IndexWithNamespace connection in order to delete vectors by ID in namespaces other
// than the default.

//
// Fields:
// - Dimension: The dimension of the index.
// - IndexFullness: The fullness level of the index. Note: only available on pods-based indexes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does "pod-based" feel better than "pods-based"? Might just be me.

Also I didn't know this was the case, is it just always 0 for serverless?

Copy link
Contributor Author

@aulorbe aulorbe Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like pods-based b/c it's more technically correct but 🤷 .

Copy link
Contributor Author

@aulorbe aulorbe Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you try to run DescribeIndexStatsFiltered on a Serverless index, you get this error:

Failed to describe index "serverless-test". Error: rpc error: code =
InvalidArgument desc = Serverless and Starter indexes do not support describing index stats with metadata filtering

@aulorbe aulorbe requested a review from jseldess June 27, 2024 17:37
Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// fmt.Println("Error:", err)
// }
//
// queryVector := []float32{1.0, 2.0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you!

Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing all the changes!

@aulorbe aulorbe merged commit 81bd572 into main Jun 28, 2024
3 checks passed
@aulorbe aulorbe deleted the Audrey/IndexConnection-docstrings branch June 28, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants