Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve upsert throughput by 3x (#334)
## Problem Python SDK upsert throughput is low compared to other SDKs - for example I can achive 880 vector upserts/sec with the Python SDK, compared to 3500 upserts/sec with the Java SDK. Profiling the Python SDK performing these upserts shows a large percentage of time in gRPC / protobuf serialisation / deserialisation. ## Solution Upgrade protobuf from v3 to v4. This adds a number of performance improvements in parsing / serialization as documented at https://protobuf.dev/news/2022-05-06/#python-updates This increases upsert() throughput by 3x (measured by upserting 1M 768 dimension indexes to a pod-based index in batches of 500): * Before: 880 vectors/sec * After: 2580 vectors/sec As per the documentation, this results in an incompatible change with the _generated_ Python code, so this depends on a related change to pinecone-protos to change the version of protobuf used to generate the Python code there. ## Type of Change - [x] None of the above: Performance improvement. ## Test Plan Use existing regression tests.
- Loading branch information