Skip to content

Commit

Permalink
Add pdoc integration in GitHub workflows (#207)
Browse files Browse the repository at this point in the history
## Problem
We're working on generating documentation references from the codebase
as a part of CI to each client.

For Python, we can use [pdoc](https://pdoc.dev/) to generate our client
reference as a static webpage which can then be directly deployed to
GitHub pages.

We also need to update CI workflows to handle building and deploying
documentation artifacts.

## Solution
- Add `pdoc` as a dependency under
`[tool.poetry.group.dev.dependencies]` in `pyproject.toml`.
- Add a new `build-docs` action under `.github/actions/`.
- Add a `build-docs` job to `pr.yaml` to verify docs build along with
testing on prs.
- Add new `merge` workflow under `.github/workflows/`.
- Update some of the existing docstrings in `index.py` and `manage.py`.

**Note:** There's a lot more that needs to be cleaned up with the
docstrings and overall formatting of the `pdoc` output. I felt like the
PR was getting large so opted to keep this one focused on the CI pieces
and getting things properly building in the pipeline. I'll be following
up with more fine-grained comment coverage of the top-level modules.

I have my default vscode formatter set to black, and I think prettier
went a little wild with some of the yaml files. I can try and back out
the changes but most of them seem fine to me, although slightly
different from some of the styling after black was run the other day.

## Type of Change
- [X] New feature (non-breaking change which adds functionality)
- [X] Infrastructure change (CI configs, etc)
- [X] Non-code change (docs, etc)

## Test Plan
Make sure the `Pull Request / Build docs with pdoc` check passes as a
part of this PR.

I forked the repo and validated the `merge` workflow ran as expected and
deployed the docs properly.
You can see that run here:
https://github.com/austin-denoble/pinecone-python-client/actions/runs/6359304970

You can see the deployed docs in GitHub pages for validation of the
overall flow here:
https://austin-denoble.github.io/pinecone-python-client/pinecone.html
austin-denoble authored Oct 5, 2023
1 parent 03da225 commit 1b625d8
Showing 13 changed files with 368 additions and 153 deletions.
17 changes: 17 additions & 0 deletions .github/actions/build-docs/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: 'Build client documentation'
description: 'Generates client documentation using pdoc'
runs:
using: 'composite'
steps:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.x

- name: Setup Poetry
uses: ./.github/actions/setup-poetry

- name: Build html documentation
shell: bash
run: |
poetry run pdoc pinecone/ --favicon ./favicon-32x32.png --docformat google -o ./docs
4 changes: 2 additions & 2 deletions .github/workflows/alpha-release.yaml
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ on:
description: 'Git ref to build (branch name or SHA)'
required: true
type: string
default: 'main'
default: 'main'
releaseLevel:
description: 'Release level'
required: true
@@ -38,4 +38,4 @@ jobs:
TWINE_REPOSITORY: 'pypi'
secrets:
PYPI_USERNAME: ${{ secrets.PROD_PYPI_USERNAME }}
PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
22 changes: 22 additions & 0 deletions .github/workflows/merge.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: 'Merge to main'

on:
push:
branches:
- main
workflow_dispatch: {}

jobs:
build-and-deploy-documentation:
runs-on: ubuntu-latest
steps:
- name: Generate pdoc documentation
uses: ./.github/actions/build-docs

- name: Deploy documentation to gh-pages
uses: s0/git-publish-subdir-action@develop
env:
REPO: self
BRANCH: gh-pages
FOLDER: docs
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
20 changes: 17 additions & 3 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
name: Pull-request CI
name: Pull Request

on: pull_request
on:
pull_request: {}
push:
branches:
- main
workflow_dispatch: {}

jobs:
run-tests:
uses: './.github/workflows/testing.yaml'
uses: './.github/workflows/testing.yaml'

build-docs:
name: Build docs with pdoc
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build docs with pdoc
uses: './.github/actions/build-docs'
4 changes: 2 additions & 2 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ on:
description: 'Git ref to build (branch name or SHA)'
required: true
type: string
default: 'main'
default: 'main'
releaseLevel:
description: 'Release level'
required: true
@@ -37,4 +37,4 @@ jobs:
prereleaseSuffix: ''
secrets:
PYPI_USERNAME: ${{ secrets.PROD_PYPI_USERNAME }}
PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -136,8 +136,12 @@ venv.bak/
# Rope project settings
.ropeproject

# mkdocs documentation
/site
# pdocs documentation
# We want to exclude any locally generated artifacts, but we rely on
# keeping documentation assets in the docs/ folder.
docs/*
!docs/pinecone-python-client-fork.png
!docs/favicon-32x32.png

# mypy
.mypy_cache/
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
include LICENSE.txt requirements.txt requirements-grpc.txt pinecone/__version__ pinecone/__environment__
include LICENSE.txt pinecone/__version__ pinecone/__environment__
recursive-exclude tests *
Binary file added docs/favicon-32x32.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions pinecone/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
"""
.. include:: ../README.md
"""
from pinecone.core.utils.constants import CLIENT_VERSION as __version__
from .config import *
from .exceptions import *
@@ -12,7 +15,12 @@

# Kept for backwards-compatibility
UpsertResult = None
"""@private"""
DeleteResult = None
"""@private"""
QueryResult = None
"""@private"""
FetchResult = None
"""@private"""
InfoResult = None
"""@private"""
155 changes: 72 additions & 83 deletions pinecone/index.py
Original file line number Diff line number Diff line change
@@ -108,61 +108,50 @@ def upsert(
namespace: Optional[str] = None,
batch_size: Optional[int] = None,
show_progress: bool = True,
**kwargs
**kwargs,
) -> UpsertResponse:
"""
The upsert operation writes vectors into a namespace.
If a new value is upserted for an existing vector id, it will overwrite the previous value.
API reference: https://docs.pinecone.io/reference/upsert
To upsert in parallel follow: https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel
A vector can be represented by a 1) Vector object, a 2) tuple or 3) a dictionary
If a tuple is used, it must be of the form `(id, values, metadata)` or `(id, values)`.
where id is a string, vector is a list of floats, metadata is a dict,
and sparse_values is a dict of the form `{'indices': List[int], 'values': List[float]}`.
Examples:
>>> ('id1', [1.0, 2.0, 3.0], {'key': 'value'}, {'indices': [1, 2], 'values': [0.2, 0.4]})
>>> ('id1', [1.0, 2.0, 3.0], None, {'indices': [1, 2], 'values': [0.2, 0.4]})
>>> ('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0])
If a Vector object is used, a Vector object must be of the form
`Vector(id, values, metadata, sparse_values)`, where metadata and sparse_values are optional
arguments.
Examples:
>>> index.upsert([('id1', [1.0, 2.0, 3.0], {'key': 'value'}),
('id2', [1.0, 2.0, 3.0]),
])
>>> Vector(id='id1', values=[1.0, 2.0, 3.0], metadata={'key': 'value'})
>>> Vector(id='id2', values=[1.0, 2.0, 3.0])
>>> Vector(id='id3', values=[1.0, 2.0, 3.0], sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))
**Note:** the dimension of each vector must match the dimension of the index.
If a dictionary is used, it must be in the form `{'id': str, 'values': List[float], 'sparse_values': {'indices': List[int], 'values': List[float]}, 'metadata': dict}`
Examples:
>>> index.upsert([('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0])])
>>>
>>> index.upsert([{'id': 'id1', 'values': [1.0, 2.0, 3.0], 'metadata': {'key': 'value'}},
{'id': 'id2',
'values': [1.0, 2.0, 3.0],
'sprase_values': {'indices': [1, 8], 'values': [0.2, 0.4]},
])
>>> index.upsert([Vector(id='id1',
values=[1.0, 2.0, 3.0],
metadata={'key': 'value'}),
Vector(id='id2',
values=[1.0, 2.0, 3.0],
sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))])
>>> {'id': 'id2', 'values': [1.0, 2.0, 3.0], 'sparse_values': {'indices': [1, 8], 'values': [0.2, 0.4]}])
>>> index.upsert([Vector(id='id1', values=[1.0, 2.0, 3.0], metadata={'key': 'value'}),
>>> Vector(id='id2', values=[1.0, 2.0, 3.0], sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))])
API reference: https://docs.pinecone.io/reference/upsert
Args:
vectors (Union[List[Vector], List[Tuple]]): A list of vectors to upsert.
A vector can be represented by a 1) Vector object, a 2) tuple or 3) a dictionary
1) if a tuple is used, it must be of the form (id, values, metadata) or (id, values).
where id is a string, vector is a list of floats, metadata is a dict,
and sparse_values is a dict of the form {'indices': List[int], 'values': List[float]}.
Examples: ('id1', [1.0, 2.0, 3.0], {'key': 'value'}, {'indices': [1, 2], 'values': [0.2, 0.4]}),
('id1', [1.0, 2.0, 3.0], None, {'indices': [1, 2], 'values': [0.2, 0.4]})
('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0]),
2) if a Vector object is used, a Vector object must be of the form
Vector(id, values, metadata, sparse_values),
where metadata and sparse_values are optional arguments
Examples: Vector(id='id1',
values=[1.0, 2.0, 3.0],
metadata={'key': 'value'})
Vector(id='id2',
values=[1.0, 2.0, 3.0])
Vector(id='id3',
values=[1.0, 2.0, 3.0],
sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))
Note: the dimension of each vector must match the dimension of the index.
3) if a dictionary is used, it must be in the form
{'id': str, 'values': List[float], 'sparse_values': {'indices': List[int], 'values': List[float]},
'metadata': dict}
namespace (str): The namespace to write to. If not specified, the default namespace is used. [optional]
batch_size (int): The number of vectors to upsert in each batch.
If not specified, all vectors will be upserted in a single batch. [optional]
@@ -279,9 +268,9 @@ def _vector_transform(item: Union[Vector, Tuple]):
vectors=list(map(_vector_transform, vectors)),
**args_dict,
_check_type=_check_type,
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
),
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
)

@staticmethod
@@ -331,36 +320,36 @@ def delete(
delete_all: Optional[bool] = None,
namespace: Optional[str] = None,
filter: Optional[Dict[str, Union[str, float, int, bool, List, dict]]] = None,
**kwargs
**kwargs,
) -> Dict[str, Any]:
"""
The Delete operation deletes vectors from the index, from a single namespace.
No error raised if the vector id does not exist.
Note: for any delete call, if namespace is not specified, the default namespace is used.
Delete can occur in the following mutual exclusive ways:
1. Delete by ids from a single namespace
2. Delete all vectors from a single namespace by setting delete_all to True
3. Delete all vectors from a single namespace by specifying a metadata filter
(note that for this option delete all must be set to False)
API reference: https://docs.pinecone.io/reference/delete_post
Examples:
>>> index.delete(ids=['id1', 'id2'], namespace='my_namespace')
>>> index.delete(delete_all=True, namespace='my_namespace')
>>> index.delete(filter={'key': 'value'}, namespace='my_namespace')
Args:
ids (List[str]): Vector ids to delete [optional]
delete_all (bool): This indicates that all vectors in the index namespace should be deleted.. [optional]
Default is False.
namespace (str): The namespace to delete vectors from [optional]
If not specified, the default namespace is used.
filter (Dict[str, Union[str, float, int, bool, List, dict]]):
If specified, the metadata filter here will be used to select the vectors to delete.
This is mutually exclusive with specifying ids to delete in the ids param or using delete_all=True.
See https://www.pinecone.io/docs/metadata-filtering/.. [optional]
The Delete operation deletes vectors from the index, from a single namespace.
No error raised if the vector id does not exist.
Note: for any delete call, if namespace is not specified, the default namespace is used.
Delete can occur in the following mutual exclusive ways:
1. Delete by ids from a single namespace
2. Delete all vectors from a single namespace by setting delete_all to True
3. Delete all vectors from a single namespace by specifying a metadata filter
(note that for this option delete all must be set to False)
API reference: https://docs.pinecone.io/reference/delete_post
Examples:
>>> index.delete(ids=['id1', 'id2'], namespace='my_namespace')
>>> index.delete(delete_all=True, namespace='my_namespace')
>>> index.delete(filter={'key': 'value'}, namespace='my_namespace')
Args:
ids (List[str]): Vector ids to delete [optional]
delete_all (bool): This indicates that all vectors in the index namespace should be deleted.. [optional]
Default is False.
namespace (str): The namespace to delete vectors from [optional]
If not specified, the default namespace is used.
filter (Dict[str, Union[str, float, int, bool, List, dict]]):
If specified, the metadata filter here will be used to select the vectors to delete.
This is mutually exclusive with specifying ids to delete in the ids param or using delete_all=True.
See https://www.pinecone.io/docs/metadata-filtering/.. [optional]
Keyword Args:
Supports OpenAPI client keyword arguments. See pinecone.core.client.models.DeleteRequest for more details.
@@ -377,9 +366,9 @@ def delete(
DeleteRequest(
**args_dict,
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS and v is not None},
_check_type=_check_type
_check_type=_check_type,
),
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
)

@validate_and_convert_errors
@@ -419,7 +408,7 @@ def query(
include_values: Optional[bool] = None,
include_metadata: Optional[bool] = None,
sparse_vector: Optional[Union[SparseValues, Dict[str, Union[List[float], List[int]]]]] = None,
**kwargs
**kwargs,
) -> QueryResponse:
"""
The Query operation searches a namespace, using a query vector.
@@ -501,9 +490,9 @@ def _query_transform(item):
QueryRequest(
**args_dict,
_check_type=_check_type,
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
),
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
)
return parse_query_response(response, vector is not None or id)

@@ -515,7 +504,7 @@ def update(
set_metadata: Optional[Dict[str, Union[str, float, int, bool, List[int], List[float], List[str]]]] = None,
namespace: Optional[str] = None,
sparse_values: Optional[Union[SparseValues, Dict[str, Union[List[float], List[int]]]]] = None,
**kwargs
**kwargs,
) -> Dict[str, Any]:
"""
The Update operation updates vector in a namespace.
@@ -563,9 +552,9 @@ def update(
id=id,
**args_dict,
_check_type=_check_type,
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
),
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
)

@validate_and_convert_errors
@@ -596,9 +585,9 @@ def describe_index_stats(
DescribeIndexStatsRequest(
**args_dict,
**{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
_check_type=_check_type
_check_type=_check_type,
),
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
**{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
)

@staticmethod
20 changes: 11 additions & 9 deletions pinecone/manage.py
Original file line number Diff line number Diff line change
@@ -16,13 +16,13 @@
"describe_index",
"list_indexes",
"scale_index",
"IndexDescription",
"create_collection",
"describe_collection",
"list_collections",
"delete_collection",
"configure_index",
"CollectionDescription",
"IndexDescription",
]


@@ -84,12 +84,12 @@ def create_index(
:param name: the name of the index.
:type name: str
:param dimension: the dimension of vectors that would be inserted in the index
:param index_type: type of index, one of {"approximated", "exact"}, defaults to "approximated".
:param index_type: type of index, one of `{"approximated", "exact"}`, defaults to "approximated".
The "approximated" index uses fast approximate search algorithms developed by Pinecone.
The "exact" index uses accurate exact search algorithms.
It performs exhaustive searches and thus it is usually slower than the "approximated" index.
:type index_type: str, optional
:param metric: type of metric used in the vector index, one of {"cosine", "dotproduct", "euclidean"}, defaults to "cosine".
:param metric: type of metric used in the vector index, one of `{"cosine", "dotproduct", "euclidean"}`, defaults to "cosine".
Use "cosine" for cosine similarity,
"dotproduct" for dot-product,
and "euclidean" for euclidean distance.
@@ -111,7 +111,8 @@ def create_index(
:param source_collection: Collection name to create the index from
:type metadata_config: str, optional
:type timeout: int, optional
:param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds; if -1, return immediately and do not wait. Default: None
:param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds;
if -1, return immediately and do not wait. Default: None
"""
api_instance = _get_api_instance()

@@ -160,7 +161,8 @@ def delete_index(name: str, timeout: int = None):
:param name: the name of the index.
:type name: str
:param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds; if -1, return immediately and do not wait. Default: None
:param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds;
if -1, return immediately and do not wait. Default: None
:type timeout: int, optional
"""
api_instance = _get_api_instance()
@@ -199,8 +201,8 @@ def list_indexes():
def describe_index(name: str):
"""Describes a Pinecone index.
:param: the name of the index
:return: Description of an index
:param name: the name of the index to describe.
:return: Returns an `IndexDescription` object
"""
api_instance = _get_api_instance()
response = api_instance.describe_index(name)
@@ -235,8 +237,8 @@ def scale_index(name: str, replicas: int):

def create_collection(name: str, source: str):
"""Create a collection
:param: name: Name of the collection
:param: source: Name of the source index
:param name: Name of the collection
:param source: Name of the source index
"""
api_instance = _get_api_instance()
api_instance.create_collection(create_collection_request=CreateCollectionRequest(name=name, source=source))
254 changes: 206 additions & 48 deletions poetry.lock

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -70,7 +70,8 @@ googleapis-common-protos = ">=1.53.0"
lz4 = ">=3.1.3"
protobuf = "~=3.20.0"

[tool.poetry.dev-dependencies]
[tool.poetry.group.dev.dependencies]
pdoc = "^14.1.0"
pytest = "6.2.4"
pytest-asyncio = "0.15.1"
pytest-cov = "2.10.1"
@@ -84,8 +85,8 @@ pandas = ">=1.3.5"
# which will only be installed if you run `poetry install --extras "grpc"`,
# from the base dependencies.
#
# Note that Poetry expects the dependencies defined in either tool.poetry.dependencies
# or tool.poetry.dev-dependencies, but they're only referenced by name in the grpc entry under
# Note that Poetry expects the dependencies to be defined in either tool.poetry.dependencies
# or tool.poetry.group.dev.dependencies, but they're only referenced by name in the grpc entry under
# tool.poetry.extras
[tool.poetry.extras]
grpc = ["grpcio", "grpc-gateway-protoc-gen-openapiv2", "googleapis-common-protos", "lz4", "protobuf"]

0 comments on commit 1b625d8

Please sign in to comment.