Skip to content

Commit

Permalink
Implement Bulk Import, Regenerate core for 2024-10 API (#79)
Browse files Browse the repository at this point in the history
## Problem
We are releasing a new version of the API this month: `2024-10`.

There are 3 primary new features that are included in this release:
- Import
- Inference
  - Embed
  - Rerank

This PR implements the operations to support import. Sorry about the
size, but you can basically ignore all of the generated code under
`internal/gen` unless you're curious about the new structure of the
generated core files. Follow the `codgen/build-clients.sh` script for
those details.

## Solution
Since the import operations are technically part of the data plane but
only supported via REST, they are represented in the OpenAPI spec and
not our protos file. Because of this, we need to change a few things to
support these operations `Client` and `IndexConnection` structs to
support these operations because traditionally the code
`IndexConnection` wraps was targeting gRPC-only db data operations. We
now need to generate rest code for the data plane as well so we can
interact with imports.

- Update the `codegen/build-clients.sh` script to handle building new
modules for both `internal/gen/db_data/grpc` and
`internal/gen/db_data/rest`.
- Update `Client` struct and move `NewClientBaseParams` into a field
that can be shared more easily when constructing the `IndexConnection`.
- Add `buildDataClientBaseOptions` to handle constructing the necessary
rest client options for the underlying `dbDataClient`.
- Add an `ensureHostHasHttps` helper as we need to make sure this is
present for the index `Host` that's passed, which was not necessary for
grpc.
- Update `Index` method to handle calling `buildDataClientBaseOptions`
and passes the new client into `newIndexConnection`.
- Update `IndexConnection` to support both REST and gRPC interfaces
under the hood (`restClient`, `grpcClient`).
- Update `newIndexConnection` to support attaching the new `restClient`
to the `IndexConnection` struct.
- Update `IndexConnection` to support all import operations:
`StartImport`, `ListImports`, `DescribeImport`, `CancelImport`.
- Add end-to-end integration test for validating the import flow against
serverless indexes.
- Some nitpicky code cleanup, renaming of things around the new rest vs.
grpc paradigm, etc.

## Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan
`just test` - make sure CI passes

To see examples of how to use the new methods, check the doc comments.


---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1208325183834377
  - https://app.asana.com/0/0/1208541827330963
  • Loading branch information
austin-denoble committed Oct 23, 2024
1 parent 6886f84 commit feb3a85
Show file tree
Hide file tree
Showing 15 changed files with 2,966 additions and 2,010 deletions.
1 change: 0 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ jobs:
run: |
go get ./pinecone
- name: Run tests
continue-on-error: true
run: go test -count=1 -v ./pinecone
env:
PINECONE_API_KEY: ${{ secrets.API_KEY }}
Expand Down
2 changes: 1 addition & 1 deletion codegen/apis
Submodule apis updated from 3b7369 to 3002f1
17 changes: 13 additions & 4 deletions codegen/build-clients.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,19 @@ db_control_module="db_control"
db_data_module="db_data"
inference_module="inference"

# generated output destination paths
# generated grpc output destination paths
# db_data_destination must align with the option go_package in the proto file:
# https://github.com/pinecone-io/apis/blob/d1d005e75cc9fe9a5c486ef9218fe87b57765961/src/release/db/data/data.proto#L3
db_data_destination="internal/gen/data"
db_data_destination="internal/gen/${db_data_module}"
db_control_destination="internal/gen/${db_control_module}"
inference_destination="internal/gen/${inference_module}"

# version file
version_file="internal/gen/api_version.go"
# generated oas files

# generated oas file destination paths
db_data_rest_destination="${db_data_destination}/rest"
db_data_oas_file="${db_data_rest_destination}/${db_data_module}_${version}.oas.go"
db_control_oas_file="${db_control_destination}/${db_control_module}_${version}.oas.go"
inference_oas_file="${inference_destination}/${inference_module}_${version}.oas.go"

Expand Down Expand Up @@ -92,6 +95,9 @@ EOL
update_apis_repo
verify_spec_version $version

# Clear internal/gen/* contents
rm -rf internal/gen/*

# Generate db_control oas client
rm -rf "${db_control_destination}"
mkdir -p "${db_control_destination}"
Expand All @@ -102,9 +108,12 @@ rm -rf "${inference_destination}"
mkdir -p "${inference_destination}"
generate_oas_client $inference_module $inference_oas_file

# Generate db_data proto client
# Generate db_data oas and proto clients
rm -rf "${db_data_destination}"
mkdir -p "${db_data_destination}"
mkdir -p "${db_data_rest_destination}"

generate_oas_client $db_data_module $db_data_oas_file
generate_proto_client $db_data_module

# Generate version file
Expand Down
Loading

0 comments on commit feb3a85

Please sign in to comment.