Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

best working prompt text #1819

Open
Amitt1412 opened this issue Nov 18, 2024 · 2 comments
Open

best working prompt text #1819

Amitt1412 opened this issue Nov 18, 2024 · 2 comments

Comments

@Amitt1412
Copy link

Hello folks,

I used my pdfs to get the data & set of question & answers. Also, I implemented the dspy flow. Could you please help me understand the following

  1. Why the retrieval is happening instead of sending the chunk directly to the model as our main aim is to get the response generated from the llm as per the business need. Retrieval process may miss the chunk from where the question actually belonged to.
index = torch.load('index.pt', weights_only=True)
max_characters = 4000 # >98th percentile of document lengths

@functools.lru_cache(maxsize=None)
def search(query, k=5):
    query_embedding = torch.tensor(Embed(input=query, model="text-embedding-3-small").data[0]['embedding'])
    topk_scores, topk_indices = torch.matmul(index, query_embedding).topk(k)
    topK = [dict(score=score.item(), **corpus[idx]) for idx, score in zip(topk_indices, topk_scores)]
    return [doc['text'][:max_characters] for doc in topK]
  1. I wanted to save the best prompt what is the command to save the same. once I have run the following
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24, num_candidates=2)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(RAG(), trainset=trainset, valset=valset, num_trials=20,
                           max_bootstrapped_demos=2, max_labeled_demos=2,
                           requires_permission_to_run=False)

Suggestion would be appreciated.

@arnavsinghvi11
Copy link
Collaborator

Hi @Amitt1412

Regarding #1, are you coming across this with any of the DSPy retrieval model integrations? Seems more relevant to the retrieval model / strategy used but lmk if there are any additional details to address this in DSPy!

Regarding #3, you can save/load programs in DSPy through:

optimized_rag.save(YOUR_SAVE_PATH)

loaded_optimized_rag = RAG()
loaded_optimized_rag.load(path=YOUR_SAVE_PATH)

Hope this helps!

@Amitt1412
Copy link
Author

Amitt1412 commented Nov 19, 2024

Hi @arnavsinghvi11

Thank you for the response. Here in the documentation
https://dspy.ai/tutorials/rag/#keeping-an-eye-on-cost

A part of code snippet is as here -


import torch
import functools
from litellm import embedding as Embed

with open("test_collection.jsonl") as f:
    corpus = [ujson.loads(line) for line in f]

index = torch.load('index.pt', weights_only=True)
max_characters = 4000 # >98th percentile of document lengths

@functools.lru_cache(maxsize=None)
def search(query, k=5):
    query_embedding = torch.tensor(Embed(input=query, model="text-embedding-3-small").data[0]['embedding'])
    topk_scores, topk_indices = torch.matmul(index, query_embedding).topk(k)
    topK = [dict(score=score.item(), **corpus[idx]) for idx, score in zip(topk_indices, topk_scores)]
    return [doc['text'][:max_characters] for doc in topK]

class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question, k=self.num_docs)
        return self.respond(context=context, question=question)

rag = RAG()
rag(question="what are high memory and low memory on linux?")

In this part we are searching / retrieving the context relevant to the question. But we are trying to compare the question & answers based on the current prompt & than we are trying to find out a better prompt from the set of generated prompts.

Using a retrieval, it may happen that our context itself is not retrieved & therefore our response generation in the pipeline will be affected. This may overall affect the overall prompt optimization flow.

**So, it is a very genuine curiosity that why are we trying to retrieve the context based on the query instead of directly sending the chunk from where the question is related to.

Could you please help me understand your perspective. **

Else you may also help me understand the execution flow.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants