Retrieval Augemented Generation with Llama-cpp-python - factual retrieval, eloquence, logic and parameters #1320

calypset · 2024-04-02T19:55:17Z

calypset
Apr 2, 2024

Hello,

I am building a RAG with Llama-cpp-python and langchain LlamaCpp for a few hundred PDFs of scientific information with GPU support.

I am tryuing to have the AI to generate text but void any hallucination and remain factual while explaining diversely and well. I have tried optimizing the parameters of the LLM to my best knowledge based on information online.

I use Zephyr LLM and Langchain RetrievalQA. (Haven't had much luck with memory, so no memory) . I use FAISS for vectordb.

I was wondering how to avoid any hallucination, and if those parameters would seem appropriate for the intended purpose of interrogating a large set of data sticking to the facts?

Also any other recommandations? Such as on retrieval of references, or addition of other exterrnal sources such as online browsing a wiki.

Loading the model as such with parameters:

llm = LlamaCpp(
        model_path=model_source,
        **params,
        chat_format='zephyr', 
        stop=['</s>'], 
        callbacks=callbacks
    )

Parameters loaded:

N_GPU_LAYERS  = -1
ROPE_FREQ_SCALE  = 2.5
ROPE_FREQ_BASE = 26000
N_DISCARD = 1
N_KEEP = 4
N_PARTS = 1
N_CTX = 16384
N_BATCH = 2048
N_PREDICT = -1
CONTEXT_WINDOW = 16384
MAX_TOKENS = 16384
MAX_NEW_TOKENS = 4096
LAST_N_TOKENS_SIZE  = 4096
TEMPERATURE = 1
REPEAT_PENALTY = 1
F16_KV = True
TOP_P = 1
TYPICAL_P = 0.9
MIN_P = 0.1
TFS = 0.999
VERBOSE = True
ECHO = False

PS: llama-cpp-python is great!

sashokbg · 2024-04-06T11:48:58Z

sashokbg
Apr 6, 2024

One thing I can recommend is using promptflow and create an evaluation flow that can help you analyse how accurate your model is.
https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/concept-model-monitoring-generative-ai-evaluation-metrics?view=azureml-api-2

Once you have a batch test data and some good evaluation flow with metrics you can start tuning your prompt and your entire flow. It is almost like Test Driven Development but for LLMs.

You should also try to restrict what your model generates as much as possible by using some jinja2 template and possibly "few shots" technique as described here https://www.promptingguide.ai/techniques/fewshot

Also it is a good idea to make your assistant always quote and cite the source (original pdf) where the info was found.

To sum it up - the task is not trivial and the tools / techniques required will take you some time to learn, but the benefits are great so I encourage you to do it :)

1 reply

calypset Apr 6, 2024
Author

Thank you for your answer it is much appreciated.

I am relatively short on time so the evaluation is eyeballed comparing different parameters.

I have tried fewshot with limited success. It was relatively easy to implement but the results I found were underwhelming but it may be because I didn't set it up right.

For the references it is indeed an excellent point and I am looking to implement this. Do you have a pointer for code on this? References are the document itself, but could also be captured from the end of peer-reviewed papers, referenced to in the text, and the reference format varies from journal to journal of course.

One thing that probably helped I implemented reading two columns of text from the PDFs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval Augemented Generation with Llama-cpp-python - factual retrieval, eloquence, logic and parameters #1320

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Retrieval Augemented Generation with Llama-cpp-python - factual retrieval, eloquence, logic and parameters #1320

calypset Apr 2, 2024

Replies: 1 comment · 1 reply

sashokbg Apr 6, 2024

calypset Apr 6, 2024 Author

calypset
Apr 2, 2024

Replies: 1 comment 1 reply

sashokbg
Apr 6, 2024

calypset Apr 6, 2024
Author