[Tech Question] Query biasing problem #502

Gangxin-Li · 2023-08-15T10:44:34Z

Gangxin-Li
Aug 15, 2023

Hi Ai leaders,

I am wondering how you address the issue of query bias.

From what I understand, we are trying to build a word embedding to acquire new knowledge. However, when we submit a query, it is directly fed into the word embedding. How do we determine the contents of the fields? And how do you enhance the accuracy of the results?

For instance, if I input "history" knowledge into the model, but only want to ask a general question rather than a history-specific one, it seems there is no mechanism in place to manage this distinction. My proposal is to develop a binary text classification system to handle the query, and give it a strong instructions when we promote a query.

What are your thoughts on this? Or dose default LLM has handle that?

Many thanks,
Gangxin

debanjum · 2023-08-20T07:54:12Z

debanjum
Aug 20, 2023
Maintainer

Hi Gangxin, please correct me if my understanding is wrong, but it seems like you want Khoj to be able to respond to some questions without referencing your personal knowledge base?

If so, the two techniques Khoj currently uses to answer general questions are:

The chat model will answer your query using it's general knowledge if it doesn't find any relevant information in your personal knowledge base
You can prefix your chat message with @general if you want to force the chat model to respond using only it's general knowledge. This will prevent it from trying to retrieve any relevant entries from your personal knowledge base. Example query: @general How did the dinosaurs die out? will force the chat model to answer using it's general knowledge and not use anything in you knowledge base

0 replies

Gangxin-Li · 2023-08-21T15:49:22Z

Gangxin-Li
Aug 21, 2023
Author

Thank you for your response!!!

I completely understand. At this time, it's merely inferring from the model.
That sounds great. Can you provide a link detailing how you implemented that? It's a brilliant idea.

I want to understand how Khoj distinguishes whether questions are related to my personal knowledge or not. In more detail, we use keyword-based methods or TF-IDF to determine if new questions are relevant to our knowledge. Against this backdrop, I've built a binary classification system to detect whether a new question is related or not. But I am not sure whether my binary classification could beat the original LLM recognition.

Has LLM already implemented this well? I'm uncertain about how to specify the fields that LLM identifies. I have checked several materials about this one, but nothing valuable found.

1 reply

debanjum Oct 17, 2023
Maintainer

Khoj uses a search LLM to find any items from your personal knowledge base that maybe relevant to the question. This only returns items that cross a present confidence threshold for relevance to the users query. Then a chat model is called that is provided these notes as context and asked to answer the user's query. See the architecture diagram in the khoj docs for an overview.

P.S: Converted this issue to a Q&A discussion for better organization

TomLucidor · 2023-10-23T16:28:42Z

TomLucidor
Oct 23, 2023

Very tempted to say that having bigger corpus won't hurt.
I wonder if it is possible to mix standard information corpus with counterfactual (niche information or contrarian) corpus such that the chatbot can contrast conventional wisdom with esoteric possibilities

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tech Question] Query biasing problem #502

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[Tech Question] Query biasing problem #502

Gangxin-Li Aug 15, 2023

Replies: 3 comments · 1 reply

debanjum Aug 20, 2023 Maintainer

Gangxin-Li Aug 21, 2023 Author

debanjum Oct 17, 2023 Maintainer

TomLucidor Oct 23, 2023

Gangxin-Li
Aug 15, 2023

Replies: 3 comments 1 reply

debanjum
Aug 20, 2023
Maintainer

Gangxin-Li
Aug 21, 2023
Author

debanjum Oct 17, 2023
Maintainer

TomLucidor
Oct 23, 2023