Add pdoc integration in GitHub workflows (#207)

## Problem We're working on generating documentation references from the codebase as a part of CI to each client. For Python, we can use [pdoc](https://pdoc.dev/) to generate our client reference as a static webpage which can then be directly deployed to GitHub pages. We also need to update CI workflows to handle building and deploying documentation artifacts. ## Solution - Add `pdoc` as a dependency under `[tool.poetry.group.dev.dependencies]` in `pyproject.toml`. - Add a new `build-docs` action under `.github/actions/`. - Add a `build-docs` job to `pr.yaml` to verify docs build along with testing on prs. - Add new `merge` workflow under `.github/workflows/`. - Update some of the existing docstrings in `index.py` and `manage.py`. **Note:** There's a lot more that needs to be cleaned up with the docstrings and overall formatting of the `pdoc` output. I felt like the PR was getting large so opted to keep this one focused on the CI pieces and getting things properly building in the pipeline. I'll be following up with more fine-grained comment coverage of the top-level modules. I have my default vscode formatter set to black, and I think prettier went a little wild with some of the yaml files. I can try and back out the changes but most of them seem fine to me, although slightly different from some of the styling after black was run the other day. ## Type of Change - [X] New feature (non-breaking change which adds functionality) - [X] Infrastructure change (CI configs, etc) - [X] Non-code change (docs, etc) ## Test Plan Make sure the `Pull Request / Build docs with pdoc` check passes as a part of this PR. I forked the repo and validated the `merge` workflow ran as expected and deployed the docs properly. You can see that run here: https://github.com/austin-denoble/pinecone-python-client/actions/runs/6359304970 You can see the deployed docs in GitHub pages for validation of the overall flow here: https://austin-denoble.github.io/pinecone-python-client/pinecone.html
pinecone-io · Oct 5, 2023 · 1b625d8 · 1b625d8
1 parent 03da225
commit 1b625d8
Showing 13 changed files with 368 additions and 153 deletions.
diff --git a/.github/actions/build-docs/action.yml b/.github/actions/build-docs/action.yml
@@ -0,0 +1,17 @@
+name: 'Build client documentation'
+description: 'Generates client documentation using pdoc'
+runs:
+  using: 'composite'
+  steps:
+    - name: Setup Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: 3.x
+
+    - name: Setup Poetry
+      uses: ./.github/actions/setup-poetry
+
+    - name: Build html documentation
+      shell: bash
+      run: |
+        poetry run pdoc pinecone/ --favicon ./favicon-32x32.png --docformat google -o ./docs
diff --git a/.github/workflows/alpha-release.yaml b/.github/workflows/alpha-release.yaml
@@ -7,7 +7,7 @@ on:
         description: 'Git ref to build (branch name or SHA)'
         required: true
         type: string
-        default: 'main' 
+        default: 'main'
       releaseLevel:
         description: 'Release level'
         required: true
@@ -38,4 +38,4 @@ jobs:
       TWINE_REPOSITORY: 'pypi'
     secrets:
       PYPI_USERNAME: ${{ secrets.PROD_PYPI_USERNAME }}
-      PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
+      PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
diff --git a/.github/workflows/merge.yaml b/.github/workflows/merge.yaml
@@ -0,0 +1,22 @@
+name: 'Merge to main'
+
+on:
+  push:
+    branches:
+      - main
+  workflow_dispatch: {}
+
+jobs:
+  build-and-deploy-documentation:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Generate pdoc documentation
+        uses: ./.github/actions/build-docs
+
+      - name: Deploy documentation to gh-pages
+        uses: s0/git-publish-subdir-action@develop
+        env:
+          REPO: self
+          BRANCH: gh-pages
+          FOLDER: docs
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.github/workflows/pr.yaml b/.github/workflows/pr.yaml
@@ -1,7 +1,21 @@
-name: Pull-request CI
+name: Pull Request
 
-on: pull_request
+on:
+  pull_request: {}
+  push:
+    branches:
+      - main
+  workflow_dispatch: {}
 
 jobs:
   run-tests:
-    uses: './.github/workflows/testing.yaml'
+    uses: './.github/workflows/testing.yaml'
+
+  build-docs:
+    name: Build docs with pdoc
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+      - name: Build docs with pdoc
+        uses: './.github/actions/build-docs'
diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml
@@ -7,7 +7,7 @@ on:
         description: 'Git ref to build (branch name or SHA)'
         required: true
         type: string
-        default: 'main' 
+        default: 'main'
       releaseLevel:
         description: 'Release level'
         required: true
@@ -37,4 +37,4 @@ jobs:
       prereleaseSuffix: ''
     secrets:
       PYPI_USERNAME: ${{ secrets.PROD_PYPI_USERNAME }}
-      PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
+      PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PASSWORD }}
diff --git a/.gitignore b/.gitignore
@@ -136,8 +136,12 @@ venv.bak/
 # Rope project settings
 .ropeproject
 
-# mkdocs documentation
-/site
+# pdocs documentation
+#   We want to exclude any locally generated artifacts, but we rely on 
+#   keeping documentation assets in the docs/ folder.
+docs/*
+!docs/pinecone-python-client-fork.png
+!docs/favicon-32x32.png
 
 # mypy
 .mypy_cache/

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,2 +1,2 @@
-include LICENSE.txt requirements.txt requirements-grpc.txt pinecone/__version__ pinecone/__environment__
+include LICENSE.txt pinecone/__version__ pinecone/__environment__
 recursive-exclude tests *
diff --git a/docs/favicon-32x32.png b/docs/favicon-32x32.png
diff --git a/pinecone/__init__.py b/pinecone/__init__.py
@@ -1,3 +1,6 @@
+"""
+.. include:: ../README.md
+"""
 from pinecone.core.utils.constants import CLIENT_VERSION as __version__
 from .config import *
 from .exceptions import *
@@ -12,7 +15,12 @@
 
 # Kept for backwards-compatibility
 UpsertResult = None
+"""@private"""
 DeleteResult = None
+"""@private"""
 QueryResult = None
+"""@private"""
 FetchResult = None
+"""@private"""
 InfoResult = None
+"""@private"""
diff --git a/pinecone/index.py b/pinecone/index.py
@@ -108,61 +108,50 @@ def upsert(
         namespace: Optional[str] = None,
         batch_size: Optional[int] = None,
         show_progress: bool = True,
-        **kwargs
+        **kwargs,
     ) -> UpsertResponse:
         """
         The upsert operation writes vectors into a namespace.
         If a new value is upserted for an existing vector id, it will overwrite the previous value.
 
-        API reference: https://docs.pinecone.io/reference/upsert
-
         To upsert in parallel follow: https://docs.pinecone.io/docs/insert-data#sending-upserts-in-parallel
 
+        A vector can be represented by a 1) Vector object, a 2) tuple or 3) a dictionary
+
+        If a tuple is used, it must be of the form `(id, values, metadata)` or `(id, values)`.
+        where id is a string, vector is a list of floats, metadata is a dict,
+        and sparse_values is a dict of the form `{'indices': List[int], 'values': List[float]}`.
+
+        Examples:
+            >>> ('id1', [1.0, 2.0, 3.0], {'key': 'value'}, {'indices': [1, 2], 'values': [0.2, 0.4]})
+            >>> ('id1', [1.0, 2.0, 3.0], None, {'indices': [1, 2], 'values': [0.2, 0.4]})
+            >>> ('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0])
+
+        If a Vector object is used, a Vector object must be of the form
+        `Vector(id, values, metadata, sparse_values)`, where metadata and sparse_values are optional
+        arguments.
+
         Examples:
-            >>> index.upsert([('id1', [1.0, 2.0, 3.0], {'key': 'value'}),
-                              ('id2', [1.0, 2.0, 3.0]),
-                              ])
+            >>> Vector(id='id1', values=[1.0, 2.0, 3.0], metadata={'key': 'value'})
+            >>> Vector(id='id2', values=[1.0, 2.0, 3.0])
+            >>> Vector(id='id3', values=[1.0, 2.0, 3.0], sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))
+
+        **Note:** the dimension of each vector must match the dimension of the index.
+
+        If a dictionary is used, it must be in the form `{'id': str, 'values': List[float], 'sparse_values': {'indices': List[int], 'values': List[float]}, 'metadata': dict}`
+
+        Examples:
+            >>> index.upsert([('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0])])
+            >>>
             >>> index.upsert([{'id': 'id1', 'values': [1.0, 2.0, 3.0], 'metadata': {'key': 'value'}},
-                              {'id': 'id2',
-                                        'values': [1.0, 2.0, 3.0],
-                                        'sprase_values': {'indices': [1, 8], 'values': [0.2, 0.4]},
-                              ])
-            >>> index.upsert([Vector(id='id1',
-                                     values=[1.0, 2.0, 3.0],
-                                     metadata={'key': 'value'}),
-                              Vector(id='id2',
-                                     values=[1.0, 2.0, 3.0],
-                                     sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))])
+            >>>               {'id': 'id2', 'values': [1.0, 2.0, 3.0], 'sparse_values': {'indices': [1, 8], 'values': [0.2, 0.4]}])
+            >>> index.upsert([Vector(id='id1', values=[1.0, 2.0, 3.0], metadata={'key': 'value'}),
+            >>>               Vector(id='id2', values=[1.0, 2.0, 3.0], sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))])
+
+        API reference: https://docs.pinecone.io/reference/upsert
 
         Args:
             vectors (Union[List[Vector], List[Tuple]]): A list of vectors to upsert.
-
-                     A vector can be represented by a 1) Vector object, a 2) tuple or 3) a dictionary
-                     1) if a tuple is used, it must be of the form (id, values, metadata) or (id, values).
-                        where id is a string, vector is a list of floats, metadata is a dict,
-                        and sparse_values is a dict of the form {'indices': List[int], 'values': List[float]}.
-                        Examples: ('id1', [1.0, 2.0, 3.0], {'key': 'value'}, {'indices': [1, 2], 'values': [0.2, 0.4]}),
-                                  ('id1', [1.0, 2.0, 3.0], None, {'indices': [1, 2], 'values': [0.2, 0.4]})
-                                  ('id1', [1.0, 2.0, 3.0], {'key': 'value'}), ('id2', [1.0, 2.0, 3.0]),
-
-                    2) if a Vector object is used, a Vector object must be of the form
-                         Vector(id, values, metadata, sparse_values),
-                        where metadata and sparse_values are optional arguments
-                       Examples: Vector(id='id1',
-                                        values=[1.0, 2.0, 3.0],
-                                        metadata={'key': 'value'})
-                                 Vector(id='id2',
-                                        values=[1.0, 2.0, 3.0])
-                                 Vector(id='id3',
-                                        values=[1.0, 2.0, 3.0],
-                                        sparse_values=SparseValues(indices=[1, 2], values=[0.2, 0.4]))
-
-                    Note: the dimension of each vector must match the dimension of the index.
-
-                3) if a dictionary is used, it must be in the form
-                   {'id': str, 'values': List[float], 'sparse_values': {'indices': List[int], 'values': List[float]},
-                    'metadata': dict}
-
             namespace (str): The namespace to write to. If not specified, the default namespace is used. [optional]
             batch_size (int): The number of vectors to upsert in each batch.
                                If not specified, all vectors will be upserted in a single batch. [optional]
@@ -279,9 +268,9 @@ def _vector_transform(item: Union[Vector, Tuple]):
                 vectors=list(map(_vector_transform, vectors)),
                 **args_dict,
                 _check_type=_check_type,
-                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
+                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
             ),
-            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
+            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
         )
 
     @staticmethod
@@ -331,36 +320,36 @@ def delete(
         delete_all: Optional[bool] = None,
         namespace: Optional[str] = None,
         filter: Optional[Dict[str, Union[str, float, int, bool, List, dict]]] = None,
-        **kwargs
+        **kwargs,
     ) -> Dict[str, Any]:
         """
-          The Delete operation deletes vectors from the index, from a single namespace.
-          No error raised if the vector id does not exist.
-          Note: for any delete call, if namespace is not specified, the default namespace is used.
-
-          Delete can occur in the following mutual exclusive ways:
-          1. Delete by ids from a single namespace
-          2. Delete all vectors from a single namespace by setting delete_all to True
-          3. Delete all vectors from a single namespace by specifying a metadata filter
-             (note that for this option delete all must be set to False)
-
-          API reference: https://docs.pinecone.io/reference/delete_post
-
-          Examples:
-              >>> index.delete(ids=['id1', 'id2'], namespace='my_namespace')
-              >>> index.delete(delete_all=True, namespace='my_namespace')
-              >>> index.delete(filter={'key': 'value'}, namespace='my_namespace')
-
-          Args:
-              ids (List[str]): Vector ids to delete [optional]
-              delete_all (bool): This indicates that all vectors in the index namespace should be deleted.. [optional]
-                                 Default is False.
-              namespace (str): The namespace to delete vectors from [optional]
-                               If not specified, the default namespace is used.
-              filter (Dict[str, Union[str, float, int, bool, List, dict]]):
-                      If specified, the metadata filter here will be used to select the vectors to delete.
-                      This is mutually exclusive with specifying ids to delete in the ids param or using delete_all=True.
-                       See https://www.pinecone.io/docs/metadata-filtering/.. [optional]
+        The Delete operation deletes vectors from the index, from a single namespace.
+        No error raised if the vector id does not exist.
+        Note: for any delete call, if namespace is not specified, the default namespace is used.
+
+        Delete can occur in the following mutual exclusive ways:
+        1. Delete by ids from a single namespace
+        2. Delete all vectors from a single namespace by setting delete_all to True
+        3. Delete all vectors from a single namespace by specifying a metadata filter
+            (note that for this option delete all must be set to False)
+
+        API reference: https://docs.pinecone.io/reference/delete_post
+
+        Examples:
+            >>> index.delete(ids=['id1', 'id2'], namespace='my_namespace')
+            >>> index.delete(delete_all=True, namespace='my_namespace')
+            >>> index.delete(filter={'key': 'value'}, namespace='my_namespace')
+
+        Args:
+            ids (List[str]): Vector ids to delete [optional]
+            delete_all (bool): This indicates that all vectors in the index namespace should be deleted.. [optional]
+                                Default is False.
+            namespace (str): The namespace to delete vectors from [optional]
+                            If not specified, the default namespace is used.
+            filter (Dict[str, Union[str, float, int, bool, List, dict]]):
+                    If specified, the metadata filter here will be used to select the vectors to delete.
+                    This is mutually exclusive with specifying ids to delete in the ids param or using delete_all=True.
+                    See https://www.pinecone.io/docs/metadata-filtering/.. [optional]
 
         Keyword Args:
           Supports OpenAPI client keyword arguments. See pinecone.core.client.models.DeleteRequest for more details.
@@ -377,9 +366,9 @@ def delete(
             DeleteRequest(
                 **args_dict,
                 **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS and v is not None},
-                _check_type=_check_type
+                _check_type=_check_type,
             ),
-            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
+            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
         )
 
     @validate_and_convert_errors
@@ -419,7 +408,7 @@ def query(
         include_values: Optional[bool] = None,
         include_metadata: Optional[bool] = None,
         sparse_vector: Optional[Union[SparseValues, Dict[str, Union[List[float], List[int]]]]] = None,
-        **kwargs
+        **kwargs,
     ) -> QueryResponse:
         """
         The Query operation searches a namespace, using a query vector.
@@ -501,9 +490,9 @@ def _query_transform(item):
             QueryRequest(
                 **args_dict,
                 _check_type=_check_type,
-                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
+                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
             ),
-            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
+            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
         )
         return parse_query_response(response, vector is not None or id)
 
@@ -515,7 +504,7 @@ def update(
         set_metadata: Optional[Dict[str, Union[str, float, int, bool, List[int], List[float], List[str]]]] = None,
         namespace: Optional[str] = None,
         sparse_values: Optional[Union[SparseValues, Dict[str, Union[List[float], List[int]]]]] = None,
-        **kwargs
+        **kwargs,
     ) -> Dict[str, Any]:
         """
         The Update operation updates vector in a namespace.
@@ -563,9 +552,9 @@ def update(
                 id=id,
                 **args_dict,
                 _check_type=_check_type,
-                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS}
+                **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
             ),
-            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
+            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
         )
 
     @validate_and_convert_errors
@@ -596,9 +585,9 @@ def describe_index_stats(
             DescribeIndexStatsRequest(
                 **args_dict,
                 **{k: v for k, v in kwargs.items() if k not in _OPENAPI_ENDPOINT_PARAMS},
-                _check_type=_check_type
+                _check_type=_check_type,
             ),
-            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS}
+            **{k: v for k, v in kwargs.items() if k in _OPENAPI_ENDPOINT_PARAMS},
         )
 
     @staticmethod

diff --git a/pinecone/manage.py b/pinecone/manage.py
@@ -16,13 +16,13 @@
     "describe_index",
     "list_indexes",
     "scale_index",
-    "IndexDescription",
     "create_collection",
     "describe_collection",
     "list_collections",
     "delete_collection",
     "configure_index",
     "CollectionDescription",
+    "IndexDescription",
 ]
 
 
@@ -84,12 +84,12 @@ def create_index(
     :param name: the name of the index.
     :type name: str
     :param dimension: the dimension of vectors that would be inserted in the index
-    :param index_type: type of index, one of {"approximated", "exact"}, defaults to "approximated".
+    :param index_type: type of index, one of `{"approximated", "exact"}`, defaults to "approximated".
         The "approximated" index uses fast approximate search algorithms developed by Pinecone.
         The "exact" index uses accurate exact search algorithms.
         It performs exhaustive searches and thus it is usually slower than the "approximated" index.
     :type index_type: str, optional
-    :param metric: type of metric used in the vector index, one of {"cosine", "dotproduct", "euclidean"}, defaults to "cosine".
+    :param metric: type of metric used in the vector index, one of `{"cosine", "dotproduct", "euclidean"}`, defaults to "cosine".
         Use "cosine" for cosine similarity,
         "dotproduct" for dot-product,
         and "euclidean" for euclidean distance.
@@ -111,7 +111,8 @@ def create_index(
     :param source_collection: Collection name to create the index from
     :type metadata_config: str, optional
     :type timeout: int, optional
-    :param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds; if -1, return immediately and do not wait. Default: None
+    :param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds;
+        if -1, return immediately and do not wait. Default: None
     """
     api_instance = _get_api_instance()
 
@@ -160,7 +161,8 @@ def delete_index(name: str, timeout: int = None):
 
     :param name: the name of the index.
     :type name: str
-    :param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds; if -1, return immediately and do not wait. Default: None
+    :param timeout: Timeout for wait until index gets ready. If None, wait indefinitely; if >=0, time out after this many seconds;
+        if -1, return immediately and do not wait. Default: None
     :type timeout: int, optional
     """
     api_instance = _get_api_instance()
@@ -199,8 +201,8 @@ def list_indexes():
 def describe_index(name: str):
     """Describes a Pinecone index.
 
-    :param: the name of the index
-    :return: Description of an index
+    :param name: the name of the index to describe.
+    :return: Returns an `IndexDescription` object
     """
     api_instance = _get_api_instance()
     response = api_instance.describe_index(name)
@@ -235,8 +237,8 @@ def scale_index(name: str, replicas: int):
 
 def create_collection(name: str, source: str):
     """Create a collection
-    :param: name: Name of the collection
-    :param: source: Name of the source index
+    :param name: Name of the collection
+    :param source: Name of the source index
     """
     api_instance = _get_api_instance()
     api_instance.create_collection(create_collection_request=CreateCollectionRequest(name=name, source=source))

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -70,7 +70,8 @@ googleapis-common-protos = ">=1.53.0"
 lz4 = ">=3.1.3"
 protobuf = "~=3.20.0"
 
-[tool.poetry.dev-dependencies]
+[tool.poetry.group.dev.dependencies]
+pdoc = "^14.1.0"
 pytest = "6.2.4"
 pytest-asyncio = "0.15.1"
 pytest-cov = "2.10.1"
@@ -84,8 +85,8 @@ pandas = ">=1.3.5"
 # which will only be installed if you run `poetry install --extras "grpc"`,
 # from the base dependencies. 
 #
-# Note that Poetry expects the dependencies defined in either tool.poetry.dependencies 
-# or tool.poetry.dev-dependencies, but they're only referenced by name in the grpc entry under 
+# Note that Poetry expects the dependencies to be defined in either tool.poetry.dependencies 
+# or tool.poetry.group.dev.dependencies, but they're only referenced by name in the grpc entry under 
 # tool.poetry.extras
 [tool.poetry.extras]
 grpc = ["grpcio", "grpc-gateway-protoc-gen-openapiv2", "googleapis-common-protos", "lz4", "protobuf"]