Skip to content

Commit

Permalink
💄
Browse files Browse the repository at this point in the history
  • Loading branch information
yackermann committed Jan 14, 2024
1 parent c02bf86 commit ba0e5c1
Show file tree
Hide file tree
Showing 3 changed files with 352 additions and 44 deletions.
72 changes: 70 additions & 2 deletions S7.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3687,11 +3687,79 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"llm = ChatOpenAI(temperature=0)"
"llm = ChatOpenAI(temperature=0)\n",
"\n",
"chain = RetrievalQA.from_chain_type(\n",
" llm=llm,\n",
" retriever=retriever,\n",
" verbose=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"\"LLaVA is a model that is specifically designed for instruction-following on multimodal data. It accurately follows user instructions and provides comprehensive responses, rather than simply describing the scene. LLaVA is trained with a small multimodal instruction-following dataset and demonstrates similar reasoning results to multimodal GPT-4. In contrast, GPT-4 focuses more on chat capabilities and generating responses based on textual input. LLaVA's ability to understand visual content that is not covered in its training data and its impressive OCR (optical character recognition) ability are some of its notable features.\""
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"What makes LLava different from GPT-4?\")"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
"\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
"\n",
"\u001b[1m> Finished chain.\u001b[0m\n"
]
},
{
"data": {
"text/plain": [
"'The architecture of LLaVA consists of two main components: a Language Model (LM) and a Vision-Language Model (VLM). The LM is responsible for processing and generating text, while the VLM is responsible for understanding and generating visual content. The LM is pretrained on a large corpus of text data, while the VLM is pretrained on a large dataset of image-text pairs. These pretrained models are then fine-tuned on a specific task or dataset to improve their performance. The combination of the LM and VLM allows LLaVA to understand and generate responses that incorporate both textual and visual information.'"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chain.run(\"What is the architecture of LLava?\")"
]
}
],
Expand Down
Loading

0 comments on commit ba0e5c1

Please sign in to comment.