Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Remove create_with_new_index() #57

Merged
merged 15 commits into from
Oct 15, 2023
Merged

Remove create_with_new_index() #57

merged 15 commits into from
Oct 15, 2023

Conversation

igiloh-pinecone
Copy link
Contributor

@igiloh-pinecone igiloh-pinecone commented Oct 3, 2023

Problem

It's complicated to have both a from_config() constructor and a create_with_new_index() constructor.

Solution

If a KnowledgeBase is instantiated with an index_name of a non-existing index, as long as the connection with pinecone is ok - no error is raised. Instead, we simply print a message "please create the index".
Since we had a check whether the index is connected \ existed before any operation anyway - this doesn't change the error handling much.

The only edge case here is that a user instantiating a KnowledgeBase with a misspelled index_name, and not explicitly checking kb.verify_connection_health() - would only get the error when he tries to perform upsert\query. The CLI and the app would do the explicit check, so no late error there.
I think we can live with it

Type of Change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Test Plan

Modified tests

I changed `create_resin_index()` to be a method, which happens after KB instantiation.
We were checking for `if self._index in None` in all of our functions anyway - so this doesn't change the error logic.

If the user will create a KB instance with a name of a non-existing index, it will be printed that he has to create it first
The index name was already passed, there is no point passing it again to create_index()
@igiloh-pinecone igiloh-pinecone requested a review from acatav October 3, 2023 13:43
@igiloh-pinecone igiloh-pinecone marked this pull request as draft October 3, 2023 13:43
@igiloh-pinecone igiloh-pinecone changed the title WIP: Remove create_with_new_index() Remove create_with_new_index() Oct 4, 2023
This makes more sense, since now the constructor is very "fast" (logic only), and the slow process of actually connecting\creating happens explicitly later
@igiloh-pinecone igiloh-pinecone marked this pull request as ready for review October 10, 2023 10:00
Following the latest .connect() change
acatav
acatav previously approved these changes Oct 10, 2023
resin/knoweldge_base/knowledge_base.py Outdated Show resolved Hide resolved
resin/knoweldge_base/knowledge_base.py Show resolved Hide resolved
@acatav acatav self-requested a review October 10, 2023 15:05
@acatav acatav dismissed their stale review October 11, 2023 08:14

changes required

resin/knoweldge_base/knowledge_base.py Outdated Show resolved Hide resolved
raise RuntimeError(
f"KnowledgeBase is already connected to index {self.index_name}. "
)
self._index = self._connect_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the way in the client side to restart the actual connection? should we support some flag that force re-connection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there is one...
I'll look in the client code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there isn't really a way to do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if you create a new Index object or call pinecone.init ? so you think holding the index object over days should work fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically - it should. The connection hangs up after ~10-15 minutes anyway.
And if it won't - that's not something we're going to solve in resin...

igiloh-pinecone added a commit that referenced this pull request Oct 11, 2023
Got feedback in comments of PR #57 that this code is a bit unreadable, so I decided to implement explicitly in the constructors.
It's a bit more code duplication, but much more readable.

While at it, I also made some arguments mandatory, while keeping them optional in the config
Following feedback on PR #57
@@ -54,6 +48,7 @@ def __init__(self,
chunker: Optional[Chunker] = None,
reranker: Optional[Reranker] = None,
default_top_k: int = 5,
index_params: Optional[dict] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we take this params only on the create method? getting it here makes an illusion that you pass index params and get a KB with it, but the truth is you can use it only in create method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will explain it in the docstring.
It has to be here, so you can do that:

kb = KnowledgeBase.from_config(cfg)
kb.create_index()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, not ideal but good enough for now. I think it means that we want the defaults to be in the config.yaml

Comment on lines 114 to 124
except Exception as e:
try:
pinecone_whoami()
except Exception:
raise RuntimeError(
"Failed to connect to Pinecone. "
"Please check your credentials and try again"
) from e
raise RuntimeError(
"The index did not respond. Please check your credentials and try again"
) from e

if self._index_name not in list_indexes():
raise RuntimeError(
f"index {self._index_name} does not exist anymore"
"and was probably deleted. "
"Please create it first using `create_with_new_index()`"
) from e
raise RuntimeError("Index unexpectedly did not respond. "
"Please try again in few moments") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's better to have this detailed validation where we can tell the user exactly where is the problem. We hide pinecone API from them and also the full name of their index so we make it hard for them to debug where is the actual problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We give very detailed errors in .connect(), it seemed like a duplication.
If someone has passed connect(), then for sure all of his credentials etc were already checked. This method would only be called after the initial connection was successfully created (at least once).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. In the minimum I would change the message to something like "unexpected error" and not something about credential which is not the most likely thing in this situation.
I still think that providing a detailed reason in this point it's important and we should try to provide it without the need for the user to manually try to connect again, so I wonder if we can call here connect or do some small change to connect so it won't set a new index in this situation. Also use connect as some retry mechanism is an option.

Anyway, these are my thoughts, you can make any decision and I'm ok with it

resin/knoweldge_base/knowledge_base.py Outdated Show resolved Hide resolved
If the index already exists - we will error out anyway.
@igiloh-pinecone igiloh-pinecone merged commit 1ec5e90 into dev Oct 15, 2023
9 checks passed
@igiloh-pinecone igiloh-pinecone deleted the remove_create_new branch October 15, 2023 18:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants