💄

yackermann · Jan 14, 2024 · ba0e5c1 · ba0e5c1
1 parent c02bf86
commit ba0e5c1
Show file tree

Hide file tree

Showing 3 changed files with 352 additions and 44 deletions.
diff --git a/S7.ipynb b/S7.ipynb
@@ -3687,11 +3687,79 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 47,
    "metadata": {},
    "outputs": [],
    "source": [
-    "llm = ChatOpenAI(temperature=0)"
+    "llm = ChatOpenAI(temperature=0)\n",
+    "\n",
+    "chain = RetrievalQA.from_chain_type(\n",
+    "    llm=llm,\n",
+    "    retriever=retriever,\n",
+    "    verbose=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\"LLaVA is a model that is specifically designed for instruction-following on multimodal data. It accurately follows user instructions and provides comprehensive responses, rather than simply describing the scene. LLaVA is trained with a small multimodal instruction-following dataset and demonstrates similar reasoning results to multimodal GPT-4. In contrast, GPT-4 focuses more on chat capabilities and generating responses based on textual input. LLaVA's ability to understand visual content that is not covered in its training data and its impressive OCR (optical character recognition) ability are some of its notable features.\""
+      ]
+     },
+     "execution_count": 48,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain.run(\"What makes LLava different from GPT-4?\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new RetrievalQA chain...\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'The architecture of LLaVA consists of two main components: a Language Model (LM) and a Vision-Language Model (VLM). The LM is responsible for processing and generating text, while the VLM is responsible for understanding and generating visual content. The LM is pretrained on a large corpus of text data, while the VLM is pretrained on a large dataset of image-text pairs. These pretrained models are then fine-tuned on a specific task or dataset to improve their performance. The combination of the LM and VLM allows LLaVA to understand and generate responses that incorporate both textual and visual information.'"
+      ]
+     },
+     "execution_count": 49,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "chain.run(\"What is the architecture of LLava?\")"
    ]
   }
  ],