From a05c20ad36ec0e893938b0c54d9d5d5c1d02e241 Mon Sep 17 00:00:00 2001
From: Ryan Lin <linjinhong@yandex.com>
Date: Tue, 14 Jan 2025 17:29:15 +0800
Subject: [PATCH 1/2] build_RAG_with_milvus_and_firecrawl

---
 .../build_RAG_with_milvus_and_firecrawl.ipynb | 737 ++++++++++++++++++
 1 file changed, 737 insertions(+)
 create mode 100644 bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb
diff --git a/bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb
new file mode 100644
index 000000000..08332e550
--- /dev/null
+++ b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb
@@ -0,0 +1,737 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e1ac4fceb263946",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>   <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_firecrawl.ipynb\" target=\"_blank\">\n",
+    "    <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1e6d7d051e1b5d9",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "# Building RAG with Milvus and Firecrawl\n",
+    "\n",
+    "[Firecrawl](https://www.firecrawl.dev/) empowers developers to build AI applications with clean data scraped from any website. With advanced scraping, crawling, and data extraction capabilities, Firecrawl simplifies the process of converting website content into clean markdown or structured data for downstream AI workflows.\n",
+    "\n",
+    "In this tutorial, we’ll show you how to build a Retrieval-Augmented Generation (RAG) pipeline using Milvus and Firecrawl. The pipeline integrates Firecrawl for web data scraping, Milvus for vector storage, and OpenAI for generating insightful, context-aware responses.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7f4056ca7aca338",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Preparation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9d871e3c89b776",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Dependencies and Environment\n",
+    "\n",
+    "To start, install the required dependencies by running the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "initial_id",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install firecrawl-py pymilvus openai requests tqdm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6281694e62787a4b",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75596dd3e4d1d8de",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Setting Up API Keys\n",
+    "\n",
+    "To use Firecrawl to scrape data from the specified URL, you need to obtain a [FIRECRAWL_API_KEY](https://www.firecrawl.dev/) and set it as an environment variable. Also, we will use OpenAI as the LLM in this example. You should prepare the [OPENAI_API_KEY](https://platform.openai.com/docs/quickstart) as an environment variable as well."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "e14a99a051098a87",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:45.285696Z",
+     "start_time": "2025-01-13T22:44:45.276604Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"FIRECRAWL_API_KEY\"] = \"fc-***********\"\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baf040ed69755a72",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Prepare the LLM and Embedding Model\n",
+    "\n",
+    "We initialize the OpenAI client to prepare the embedding model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "6ac107df-9e41-4afe-be70-1ff2cfd8a727",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:46.274166Z",
+     "start_time": "2025-01-13T22:44:46.226576Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "openai_client = OpenAI()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fcd7358761d1e78",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Define a function to generate text embeddings using OpenAI client. We use the [text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) model as an example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "d90e40d0-44a4-43a2-8567-8c8d758d1bd4",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:47.061510Z",
+     "start_time": "2025-01-13T22:44:47.057608Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def emb_text(text):\n",
+    "    return (\n",
+    "        openai_client.embeddings.create(input=text, model=\"text-embedding-3-small\")\n",
+    "        .data[0]\n",
+    "        .embedding\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d4cc6ea-e4fe-4721-8c5a-e930c1c3ff9b",
+   "metadata": {},
+   "source": [
+    "Generate a test embedding and print its dimension and first few elements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "8be4219c-9ead-45bb-aea1-15dd1143b3f1",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:48.590762Z",
+     "start_time": "2025-01-13T22:44:47.892755Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1536\n",
+      "[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_embedding = emb_text(\"This is a test\")\n",
+    "embedding_dim = len(test_embedding)\n",
+    "print(embedding_dim)\n",
+    "print(test_embedding[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4a0bb710a360517",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Scrape Data Using Firecrawl"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d00e1e8eccd189f3",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Initialize the Firecrawl Application\n",
+    "We will use the `firecrawl` library to scrape data from the specified URL in markdown format. Begin by initializing the Firecrawl application:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "100fa4fd9c1e003",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:49.780980Z",
+     "start_time": "2025-01-13T22:44:49.778066Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from firecrawl import FirecrawlApp\n",
+    "\n",
+    "app = FirecrawlApp(api_key=os.environ[\"FIRECRAWL_API_KEY\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b56f94698cc6ec1",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Scrape the Target Website\n",
+    "Scrape the content from the target URL. The website [LLM-powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) provides an in-depth exploration of autonomous agent systems built using large language models (LLMs). We will use these content building a RAG system. \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "fdc4e3ff7d0e3cd1",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:53.059840Z",
+     "start_time": "2025-01-13T22:44:50.985724Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Scrape a website:\n",
+    "scrape_status = app.scrape_url(\n",
+    "    \"https://lilianweng.github.io/posts/2023-06-23-agent/\",\n",
+    "    params={\"formats\": [\"markdown\"]},\n",
+    ")\n",
+    "\n",
+    "markdown_content = scrape_status[\"markdown\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "229449707ff84d2d",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Process the Scraped Content\n",
+    "\n",
+    "To make the scraped content manageable for insertion into Milvus, we simply use \"# \" to separate the content, which can roughly separate the content of each main part of the scraped markdown file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "1f03911422e6a973",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:53.197411Z",
+     "start_time": "2025-01-13T22:44:53.192577Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Section 1:\n",
+      "Table of Contents\n",
+      "\n",
+      "- [Agent System Overview](#agent-system-overview)\n",
+      "- [Component One: Planning](#component-one-planning)  - [Task Decomposition](#task-decomposition)\n",
+      "  - [Self-Reflection](#self-reflection)\n",
+      "- [Component Two: Memory](#component-two-memory)  - [Types of Memory](#types-of-memory)\n",
+      "  - [...\n",
+      "--------------------------------------------------\n",
+      "Section 2:\n",
+      "Agent System Overview [\\#](\\#agent-system-overview)\n",
+      "\n",
+      "In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n",
+      "\n",
+      "- **Planning**\n",
+      "  - Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling effi...\n",
+      "--------------------------------------------------\n",
+      "Section 3:\n",
+      "Component One: Planning [\\#](\\#component-one-planning)\n",
+      "\n",
+      "A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\n",
+      "\n",
+      "#...\n",
+      "--------------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "def split_markdown_content(content):\n",
+    "    return [section.strip() for section in content.split(\"# \") if section.strip()]\n",
+    "\n",
+    "\n",
+    "# Process the scraped markdown content\n",
+    "sections = split_markdown_content(markdown_content)\n",
+    "\n",
+    "# Print the first few sections to understand the structure\n",
+    "for i, section in enumerate(sections[:3]):\n",
+    "    print(f\"Section {i+1}:\")\n",
+    "    print(section[:300] + \"...\")\n",
+    "    print(\"-\" * 50)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1353faca577a4d7",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Load Data into Milvus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08d46563-ac2f-4951-a1fc-920577750a44",
+   "metadata": {},
+   "source": [
+    "### Create the collection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "1ca09192-a150-46eb-86c5-de9f10546a41",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:56.074594Z",
+     "start_time": "2025-01-13T22:44:56.049664Z"
+    },
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from pymilvus import MilvusClient\n",
+    "\n",
+    "milvus_client = MilvusClient(uri=\"./milvus_demo.db\")\n",
+    "collection_name = \"my_rag_collection\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d4e1e20-4fb3-4a20-b8fd-5e20497b0c62",
+   "metadata": {},
+   "source": [
+    "> As for the argument of `MilvusClient`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fadf5572-f0ea-4799-9247-2e92e58bfe20",
+   "metadata": {},
+   "source": [
+    "Check if the collection already exists and drop it if it does."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "d7e81fae-b312-408a-82f3-c301928bb9df",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:44:58.905857Z",
+     "start_time": "2025-01-13T22:44:58.900442Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "if milvus_client.has_collection(collection_name):\n",
+    "    milvus_client.drop_collection(collection_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41e082b4-93e1-438a-800b-ec34174d43f8",
+   "metadata": {},
+   "source": [
+    "Create a new collection with specified parameters.\n",
+    "\n",
+    "If we don’t specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "97bd0406-ad92-4b18-8702-2dbb666e5f46",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:45:00.959301Z",
+     "start_time": "2025-01-13T22:45:00.430915Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "milvus_client.create_collection(\n",
+    "    collection_name=collection_name,\n",
+    "    dimension=embedding_dim,\n",
+    "    metric_type=\"IP\",  # Inner product distance\n",
+    "    consistency_level=\"Strong\",  # Strong consistency level\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a488258-64d5-4539-942f-dd50d90300e4",
+   "metadata": {},
+   "source": [
+    "### Insert data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "a2ac980d-5969-4fe9-97b7-5a65c78c9224",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:45:11.546190Z",
+     "start_time": "2025-01-13T22:45:03.364884Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processing sections: 100%|██████████| 17/17 [00:08<00:00,  2.09it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'insert_count': 17, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], 'cost': 0}"
+      ]
+     },
+     "execution_count": 34,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "\n",
+    "data = []\n",
+    "\n",
+    "for i, section in enumerate(tqdm(sections, desc=\"Processing sections\")):\n",
+    "    embedding = emb_text(section)\n",
+    "    data.append({\"id\": i, \"vector\": embedding, \"text\": section})\n",
+    "\n",
+    "# Insert data into Milvus\n",
+    "milvus_client.insert(collection_name=collection_name, data=data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01b502fb-21b4-48dc-b624-93b510538fbe",
+   "metadata": {},
+   "source": [
+    "## Build RAG"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7120f6b1-4f07-4b70-b1f1-dc47a9cd7524",
+   "metadata": {},
+   "source": [
+    "### Retrieve data for a query\n",
+    "\n",
+    "Let’s specify a query question about the website we just scraped.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "id": "33f1f275-9038-4836-9189-d55efed7fab7",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:52:53.599519Z",
+     "start_time": "2025-01-13T22:52:53.593602Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "question = \"What are the main components of autonomous agents?\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0446d285-9e27-45be-a5f1-29bbf55fa2e9",
+   "metadata": {},
+   "source": [
+    "Search for the question in the collection and retrieve the semantic top-3 matches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "id": "ad2e5d1e-6550-41d8-928b-a34562b86790",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:52:54.955970Z",
+     "start_time": "2025-01-13T22:52:54.444028Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "search_res = milvus_client.search(\n",
+    "    collection_name=collection_name,\n",
+    "    data=[emb_text(question)],\n",
+    "    limit=3,\n",
+    "    search_params={\"metric_type\": \"IP\", \"params\": {}},\n",
+    "    output_fields=[\"text\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "113d8b42-0397-4015-b7aa-8861563bcf14",
+   "metadata": {},
+   "source": [
+    "Let’s take a look at the search results of the query\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "id": "ac85b108-8fbc-4eb9-82e1-6b9058c25388",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:52:56.101959Z",
+     "start_time": "2025-01-13T22:52:56.098492Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    [\n",
+      "        \"Agent System Overview [\\\\#](\\\\#agent-system-overview)\\n\\nIn a LLM-powered autonomous agent system, LLM functions as the agent\\u2019s brain, complemented by several key components:\\n\\n- **Planning**\\n  - Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\n  - Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n- **Memory**\\n  - Short-term memory: I would consider all the in-context learning (See [Prompt Engineering](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)) as utilizing short-term memory of the model to learn.\\n  - Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n- **Tool use**\\n  - The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n![](agent-overview.png)Fig. 1. Overview of a LLM-powered autonomous agent system.\",\n",
+      "        0.6343474388122559\n",
+      "    ],\n",
+      "    [\n",
+      "        \"Table of Contents\\n\\n- [Agent System Overview](#agent-system-overview)\\n- [Component One: Planning](#component-one-planning)  - [Task Decomposition](#task-decomposition)\\n  - [Self-Reflection](#self-reflection)\\n- [Component Two: Memory](#component-two-memory)  - [Types of Memory](#types-of-memory)\\n  - [Maximum Inner Product Search (MIPS)](#maximum-inner-product-search-mips)\\n- [Component Three: Tool Use](#component-three-tool-use)\\n- [Case Studies](#case-studies)  - [Scientific Discovery Agent](#scientific-discovery-agent)\\n  - [Generative Agents Simulation](#generative-agents-simulation)\\n  - [Proof-of-Concept Examples](#proof-of-concept-examples)\\n- [Challenges](#challenges)\\n- [Citation](#citation)\\n- [References](#references)\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT), [GPT-Engineer](https://github.com/AntonOsika/gpt-engineer) and [BabyAGI](https://github.com/yoheinakajima/babyagi), serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\",\n",
+      "        0.5715497732162476\n",
+      "    ],\n",
+      "    [\n",
+      "        \"Challenges [\\\\#](\\\\#challenges)\\n\\nAfter going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:\\n\\n- **Finite context length**: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\\n\\n- **Challenges in long-term planning and task decomposition**: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\\n\\n- **Reliability of natural language interface**: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.\",\n",
+      "        0.5009307265281677\n",
+      "    ]\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import json\n",
+    "\n",
+    "retrieved_lines_with_distances = [\n",
+    "    (res[\"entity\"][\"text\"], res[\"distance\"]) for res in search_res[0]\n",
+    "]\n",
+    "print(json.dumps(retrieved_lines_with_distances, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fecbb753851d312",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Use LLM to get a RAG response\n",
+    "\n",
+    "Convert the retrieved documents into a string format.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "id": "c7d406532a6c0d59",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:53:04.763673Z",
+     "start_time": "2025-01-13T22:53:04.759580Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "context = \"\\n\".join(\n",
+    "    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34aabd19ee91b7d",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "id": "42c4f2f501f80607",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:53:05.704670Z",
+     "start_time": "2025-01-13T22:53:05.700521Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "SYSTEM_PROMPT = \"\"\"\n",
+    "Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.\n",
+    "\"\"\"\n",
+    "USER_PROMPT = f\"\"\"\n",
+    "Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.\n",
+    "<context>\n",
+    "{context}\n",
+    "</context>\n",
+    "<question>\n",
+    "{question}\n",
+    "</question>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fe76342b1c2ea6c",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Use OpenAI ChatGPT to generate a response based on the prompts.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "id": "c340830791e55c03",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-13T22:53:10.538528Z",
+     "start_time": "2025-01-13T22:53:06.674481Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The main components of a LLM-powered autonomous agent system are the Planning, Memory, and Tool use. \n",
+      "\n",
+      "1. Planning: The agent breaks down large tasks into smaller, manageable subgoals, and can self-reflect and learn from past mistakes, refining its actions for future steps.\n",
+      "\n",
+      "2. Memory: This includes short-term memory, which the model uses for in-context learning, and long-term memory, which allows the agent to retain and recall information over extended periods. \n",
+      "\n",
+      "3. Tool use: This component allows the agent to call external APIs for additional information that is not available in the model weights, like current information, code execution capacity, and access to proprietary information sources.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = openai_client.chat.completions.create(\n",
+    "    model=\"gpt-4o\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": USER_PROMPT},\n",
+    "    ],\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From 92d70c3a2601b0d68f5899cbb177df966000e89a Mon Sep 17 00:00:00 2001
From: Ryan Lin <linjinhong@yandex.com>
Date: Tue, 14 Jan 2025 17:30:47 +0800
Subject: [PATCH 2/2] build_RAG_with_milvus_and_crawl4ai

---
 .../build_RAG_with_milvus_and_crawl4ai.ipynb  | 814 ++++++++++++++++++
 1 file changed, 814 insertions(+)
 create mode 100644 bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb

diff --git a/bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb
new file mode 100644
index 000000000..6e0d72b22
--- /dev/null
+++ b/bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb
@@ -0,0 +1,814 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9e1ac4fceb263946",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>   <a href=\"https://github.com/milvus-io/bootcamp/blob/master/bootcamp/tutorials/integration/build_RAG_with_milvus_and_crawl4ai.ipynb\" target=\"_blank\">\n",
+    "    <img src=\"https://img.shields.io/badge/View%20on%20GitHub-555555?style=flat&logo=github&logoColor=white\" alt=\"GitHub Repository\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1e6d7d051e1b5d9",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "# Building RAG with Milvus and Crawl4AI\n",
+    "\n",
+    "[Crawl4AI](https://crawl4ai.com/mkdocs/) delivers blazing-fast, AI-ready web crawling for LLMs. Open-source and optimized for RAG, it simplifies scraping with advanced extraction and real-time performance.\n",
+    "\n",
+    "In this tutorial, we’ll show you how to build a Retrieval-Augmented Generation (RAG) pipeline using Milvus and Crawl4AI. The pipeline integrates Crawl4AI for web data crawling, Milvus for vector storage, and OpenAI for generating insightful, context-aware responses.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7f4056ca7aca338",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Preparation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9d871e3c89b776",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Dependencies and Environment\n",
+    "\n",
+    "To start, install the required dependencies by running the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "initial_id",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install -U crawl4ai pymilvus openai requests tqdm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6281694e62787a4b",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the \"Runtime\" menu at the top of the screen, and select \"Restart session\" from the dropdown menu)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "To fully set up crawl4ai, run the following commands:"
+   ],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "5f5a85b9a62416a3"
+  },
+  {
+   "cell_type": "code",
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[36m[INIT].... → Running post-installation setup...\u001b[0m\r\n",
+      "\u001b[36m[INIT].... → Installing Playwright browsers...\u001b[0m\r\n",
+      "\u001b[32m[COMPLETE] ● Playwright installation completed successfully.\u001b[0m\r\n",
+      "\u001b[36m[INIT].... → Starting database initialization...\u001b[0m\r\n",
+      "\u001b[32m[COMPLETE] ● Database initialization completed successfully.\u001b[0m\r\n",
+      "\u001b[32m[COMPLETE] ● Post-installation setup completed!\u001b[0m\r\n",
+      "\u001b[0m\u001b[36m[INIT].... → Running Crawl4AI health check...\u001b[0m\r\n",
+      "\u001b[36m[INIT].... → Crawl4AI 0.4.247\u001b[0m\r\n",
+      "\u001b[36m[TEST].... ℹ Testing crawling capabilities...\u001b[0m\r\n",
+      "\u001b[36m[EXPORT].. ℹ Exporting PDF and taking screenshot took 0.80s\u001b[0m\r\n",
+      "\u001b[32m[FETCH]... ↓ https://crawl4ai.com... | Status: \u001b[32mTrue\u001b[0m | Time: 4.22s\u001b[0m\r\n",
+      "\u001b[36m[SCRAPE].. ◆ Processed https://crawl4ai.com... | Time: 14ms\u001b[0m\r\n",
+      "\u001b[32m[COMPLETE] ● https://crawl4ai.com... | Status: \u001b[32mTrue\u001b[0m | Total: \u001b[33m4.23s\u001b[0m\u001b[0m\r\n",
+      "\u001b[32m[COMPLETE] ● ✅ Crawling test passed!\u001b[0m\r\n",
+      "\u001b[0m"
+     ]
+    }
+   ],
+   "source": [
+    "# Run post-installation setup\n",
+    "! crawl4ai-setup\n",
+    "\n",
+    "# Verify installation\n",
+    "! crawl4ai-doctor"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:09.133696Z",
+     "start_time": "2025-01-14T09:25:02.169833Z"
+    }
+   },
+   "id": "7eeefc4b638fbd22",
+   "execution_count": 1
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75596dd3e4d1d8de",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Setting Up OpenAI API Key\n",
+    "\n",
+    "We will use OpenAI as the LLM in this example. You should prepare the [OPENAI_API_KEY](https://platform.openai.com/docs/quickstart) as an environment variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "e14a99a051098a87",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:09.139980Z",
+     "start_time": "2025-01-14T09:25:09.134353Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-***********\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baf040ed69755a72",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Prepare the LLM and Embedding Model\n",
+    "\n",
+    "We initialize the OpenAI client to prepare the embedding model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "6ac107df-9e41-4afe-be70-1ff2cfd8a727",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:11.087058Z",
+     "start_time": "2025-01-14T09:25:10.712174Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "\n",
+    "openai_client = OpenAI()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fcd7358761d1e78",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Define a function to generate text embeddings using OpenAI client. We use the [text-embedding-3-small](https://platform.openai.com/docs/guides/embeddings) model as an example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "d90e40d0-44a4-43a2-8567-8c8d758d1bd4",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:13.161855Z",
+     "start_time": "2025-01-14T09:25:13.153595Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def emb_text(text):\n",
+    "    return (\n",
+    "        openai_client.embeddings.create(input=text, model=\"text-embedding-3-small\")\n",
+    "        .data[0]\n",
+    "        .embedding\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d4cc6ea-e4fe-4721-8c5a-e930c1c3ff9b",
+   "metadata": {},
+   "source": [
+    "Generate a test embedding and print its dimension and first few elements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "8be4219c-9ead-45bb-aea1-15dd1143b3f1",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:14.948153Z",
+     "start_time": "2025-01-14T09:25:14.442989Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1536\n",
+      "[0.009889289736747742, -0.005578675772994757, 0.00683477520942688, -0.03805781528353691, -0.01824733428657055, -0.04121600463986397, -0.007636285852640867, 0.03225184231996536, 0.018949154764413834, 9.352207416668534e-05]\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_embedding = emb_text(\"This is a test\")\n",
+    "embedding_dim = len(test_embedding)\n",
+    "print(embedding_dim)\n",
+    "print(test_embedding[:10])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4a0bb710a360517",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Crawl Data Using Crawl4AI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "100fa4fd9c1e003",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:18.538915Z",
+     "start_time": "2025-01-14T09:25:17.589353Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[INIT].... → Crawl4AI 0.4.247\n",
+      "[FETCH]... ↓ https://lilianweng.github.io/posts/2023-06-23-agen... | Status: True | Time: 0.07s\n",
+      "[COMPLETE] ● https://lilianweng.github.io/posts/2023-06-23-agen... | Status: True | Total: 0.08s\n"
+     ]
+    }
+   ],
+   "source": [
+    "from crawl4ai import *\n",
+    "\n",
+    "\n",
+    "async def crawl():\n",
+    "    async with AsyncWebCrawler() as crawler:\n",
+    "        result = await crawler.arun(\n",
+    "            url=\"https://lilianweng.github.io/posts/2023-06-23-agent/\",\n",
+    "        )\n",
+    "        return result.markdown\n",
+    "\n",
+    "\n",
+    "markdown_content = await crawl()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "229449707ff84d2d",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Process the Crawled Content\n",
+    "\n",
+    "To make the crawled content manageable for insertion into Milvus, we simply use \"# \" to separate the content, which can roughly separate the content of each main part of the crawled markdown file."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "1f03911422e6a973",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:24.778624Z",
+     "start_time": "2025-01-14T09:25:24.753005Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Section 1:\n",
+      "[Lil'Log](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/lilianweng.github.io/> \"Lil'Log \\(Alt + H\\)\")\n",
+      "  * |\n",
+      "\n",
+      "\n",
+      "  * [ Posts ](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/lilianweng.github.io/> \"Posts\")\n",
+      "  * [ Archive ](https://lilianweng.github.io/posts/2023-06-23-agent/<h...\n",
+      "--------------------------------------------------\n",
+      "Section 2:\n",
+      "LLM Powered Autonomous Agents \n",
+      "Date: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng \n",
+      "Table of Contents\n",
+      "  * [Agent System Overview](https://lilianweng.github.io/posts/2023-06-23-agent/<#agent-system-overview>)\n",
+      "  * [Component One: Planning](https://lilianweng.github.io/posts/2023...\n",
+      "--------------------------------------------------\n",
+      "Section 3:\n",
+      "Agent System Overview[#](https://lilianweng.github.io/posts/2023-06-23-agent/<#agent-system-overview>)\n",
+      "In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n",
+      "  * **Planning**\n",
+      "    * Subgoal and decomposition: The agent breaks down large t...\n",
+      "--------------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "def split_markdown_content(content):\n",
+    "    return [section.strip() for section in content.split(\"# \") if section.strip()]\n",
+    "\n",
+    "\n",
+    "# Process the crawled markdown content\n",
+    "sections = split_markdown_content(markdown_content)\n",
+    "\n",
+    "# Print the first few sections to understand the structure\n",
+    "for i, section in enumerate(sections[:3]):\n",
+    "    print(f\"Section {i+1}:\")\n",
+    "    print(section[:300] + \"...\")\n",
+    "    print(\"-\" * 50)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1353faca577a4d7",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "## Load Data into Milvus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08d46563-ac2f-4951-a1fc-920577750a44",
+   "metadata": {},
+   "source": [
+    "### Create the collection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1ca09192-a150-46eb-86c5-de9f10546a41",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:29.325267Z",
+     "start_time": "2025-01-14T09:25:27.696938Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:numexpr.utils:Note: NumExpr detected 10 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
+      "INFO:numexpr.utils:NumExpr defaulting to 8 threads.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from pymilvus import MilvusClient\n",
+    "\n",
+    "milvus_client = MilvusClient(uri=\"./milvus_demo.db\")\n",
+    "collection_name = \"my_rag_collection\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d4e1e20-4fb3-4a20-b8fd-5e20497b0c62",
+   "metadata": {},
+   "source": [
+    "> As for the argument of `MilvusClient`:\n",
+    "> - Setting the `uri` as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.\n",
+    "> - If you have large scale of data, you can set up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). In this setup, please use the server uri, e.g.`http://localhost:19530`, as your `uri`.\n",
+    "> - If you want to use [Zilliz Cloud](https://zilliz.com/cloud), the fully managed cloud service for Milvus, adjust the `uri` and `token`, which correspond to the [Public Endpoint and Api key](https://docs.zilliz.com/docs/on-zilliz-cloud-console#free-cluster-details) in Zilliz Cloud."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fadf5572-f0ea-4799-9247-2e92e58bfe20",
+   "metadata": {},
+   "source": [
+    "Check if the collection already exists and drop it if it does."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "d7e81fae-b312-408a-82f3-c301928bb9df",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:31.948948Z",
+     "start_time": "2025-01-14T09:25:31.935055Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "if milvus_client.has_collection(collection_name):\n",
+    "    milvus_client.drop_collection(collection_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41e082b4-93e1-438a-800b-ec34174d43f8",
+   "metadata": {},
+   "source": [
+    "Create a new collection with specified parameters.\n",
+    "\n",
+    "If we don’t specify any field information, Milvus will automatically create a default `id` field for primary key, and a `vector` field to store the vector data. A reserved JSON field is used to store non-schema-defined fields and their values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "97bd0406-ad92-4b18-8702-2dbb666e5f46",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:36.692203Z",
+     "start_time": "2025-01-14T09:25:36.167918Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "milvus_client.create_collection(\n",
+    "    collection_name=collection_name,\n",
+    "    dimension=embedding_dim,\n",
+    "    metric_type=\"IP\",  # Inner product distance\n",
+    "    consistency_level=\"Strong\",  # Strong consistency level\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a488258-64d5-4539-942f-dd50d90300e4",
+   "metadata": {},
+   "source": [
+    "### Insert data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "a2ac980d-5969-4fe9-97b7-5a65c78c9224",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:48.142063Z",
+     "start_time": "2025-01-14T09:25:38.181051Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Processing sections:   0%|          | 0/18 [00:00<?, ?it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:   6%|▌         | 1/18 [00:00<00:12,  1.37it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  11%|█         | 2/18 [00:01<00:11,  1.39it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  17%|█▋        | 3/18 [00:02<00:10,  1.40it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  22%|██▏       | 4/18 [00:02<00:07,  1.85it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  28%|██▊       | 5/18 [00:02<00:06,  2.06it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  33%|███▎      | 6/18 [00:03<00:06,  1.94it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  39%|███▉      | 7/18 [00:03<00:05,  2.14it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  44%|████▍     | 8/18 [00:04<00:04,  2.29it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  50%|█████     | 9/18 [00:04<00:04,  2.20it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  56%|█████▌    | 10/18 [00:05<00:03,  2.09it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  61%|██████    | 11/18 [00:06<00:04,  1.68it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  67%|██████▋   | 12/18 [00:06<00:04,  1.48it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  72%|███████▏  | 13/18 [00:07<00:02,  1.75it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  78%|███████▊  | 14/18 [00:07<00:01,  2.02it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  83%|████████▎ | 15/18 [00:07<00:01,  2.12it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  89%|████████▉ | 16/18 [00:08<00:01,  1.61it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections:  94%|█████████▍| 17/18 [00:09<00:00,  1.92it/s]INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "Processing sections: 100%|██████████| 18/18 [00:09<00:00,  1.83it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": "{'insert_count': 18, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], 'cost': 0}"
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "\n",
+    "data = []\n",
+    "for i, section in enumerate(tqdm(sections, desc=\"Processing sections\")):\n",
+    "    embedding = emb_text(section)\n",
+    "    data.append({\"id\": i, \"vector\": embedding, \"text\": section})\n",
+    "\n",
+    "# Insert data into Milvus\n",
+    "milvus_client.insert(collection_name=collection_name, data=data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01b502fb-21b4-48dc-b624-93b510538fbe",
+   "metadata": {},
+   "source": [
+    "## Build RAG"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7120f6b1-4f07-4b70-b1f1-dc47a9cd7524",
+   "metadata": {},
+   "source": [
+    "### Retrieve data for a query\n",
+    "\n",
+    "Let’s specify a query question about the website we just crawled.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "33f1f275-9038-4836-9189-d55efed7fab7",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:56.603131Z",
+     "start_time": "2025-01-14T09:25:56.595437Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "question = \"What are the main components of autonomous agents?\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0446d285-9e27-45be-a5f1-29bbf55fa2e9",
+   "metadata": {},
+   "source": [
+    "Search for the question in the collection and retrieve the semantic top-3 matches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "ad2e5d1e-6550-41d8-928b-a34562b86790",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:25:59.359036Z",
+     "start_time": "2025-01-14T09:25:58.737765Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "search_res = milvus_client.search(\n",
+    "    collection_name=collection_name,\n",
+    "    data=[emb_text(question)],\n",
+    "    limit=3,\n",
+    "    search_params={\"metric_type\": \"IP\", \"params\": {}},\n",
+    "    output_fields=[\"text\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "113d8b42-0397-4015-b7aa-8861563bcf14",
+   "metadata": {},
+   "source": [
+    "Let’s take a look at the search results of the query\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "ac85b108-8fbc-4eb9-82e1-6b9058c25388",
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:26:02.278534Z",
+     "start_time": "2025-01-14T09:26:02.270934Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "    [\n",
+      "        \"Agent System Overview[#](https://lilianweng.github.io/posts/2023-06-23-agent/<#agent-system-overview>)\\nIn a LLM-powered autonomous agent system, LLM functions as the agent\\u2019s brain, complemented by several key components:\\n  * **Planning**\\n    * Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\\n    * Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\\n  * **Memory**\\n    * Short-term memory: I would consider all the in-context learning (See [Prompt Engineering](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/lilianweng.github.io/posts/2023-03-15-prompt-engineering/>)) as utilizing short-term memory of the model to learn.\\n    * Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\\n  * **Tool use**\\n    * The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including current information, code execution capability, access to proprietary information sources and more.\\n\\n![](https://lilianweng.github.io/posts/2023-06-23-agent/agent-overview.png) Fig. 1. Overview of a LLM-powered autonomous agent system.\",\n",
+      "        0.6433743238449097\n",
+      "    ],\n",
+      "    [\n",
+      "        \"LLM Powered Autonomous Agents \\nDate: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng \\nTable of Contents\\n  * [Agent System Overview](https://lilianweng.github.io/posts/2023-06-23-agent/<#agent-system-overview>)\\n  * [Component One: Planning](https://lilianweng.github.io/posts/2023-06-23-agent/<#component-one-planning>)\\n    * [Task Decomposition](https://lilianweng.github.io/posts/2023-06-23-agent/<#task-decomposition>)\\n    * [Self-Reflection](https://lilianweng.github.io/posts/2023-06-23-agent/<#self-reflection>)\\n  * [Component Two: Memory](https://lilianweng.github.io/posts/2023-06-23-agent/<#component-two-memory>)\\n    * [Types of Memory](https://lilianweng.github.io/posts/2023-06-23-agent/<#types-of-memory>)\\n    * [Maximum Inner Product Search (MIPS)](https://lilianweng.github.io/posts/2023-06-23-agent/<#maximum-inner-product-search-mips>)\\n  * [Component Three: Tool Use](https://lilianweng.github.io/posts/2023-06-23-agent/<#component-three-tool-use>)\\n  * [Case Studies](https://lilianweng.github.io/posts/2023-06-23-agent/<#case-studies>)\\n    * [Scientific Discovery Agent](https://lilianweng.github.io/posts/2023-06-23-agent/<#scientific-discovery-agent>)\\n    * [Generative Agents Simulation](https://lilianweng.github.io/posts/2023-06-23-agent/<#generative-agents-simulation>)\\n    * [Proof-of-Concept Examples](https://lilianweng.github.io/posts/2023-06-23-agent/<#proof-of-concept-examples>)\\n  * [Challenges](https://lilianweng.github.io/posts/2023-06-23-agent/<#challenges>)\\n  * [Citation](https://lilianweng.github.io/posts/2023-06-23-agent/<#citation>)\\n  * [References](https://lilianweng.github.io/posts/2023-06-23-agent/<#references>)\\n\\n\\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as [AutoGPT](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/github.com/Significant-Gravitas/Auto-GPT>), [GPT-Engineer](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/github.com/AntonOsika/gpt-engineer>) and [BabyAGI](https://lilianweng.github.io/posts/2023-06-23-agent/<https:/github.com/yoheinakajima/babyagi>), serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\",\n",
+      "        0.5462194085121155\n",
+      "    ],\n",
+      "    [\n",
+      "        \"Component One: Planning[#](https://lilianweng.github.io/posts/2023-06-23-agent/<#component-one-planning>)\\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\\n#\",\n",
+      "        0.5223420858383179\n",
+      "    ]\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import json\n",
+    "\n",
+    "retrieved_lines_with_distances = [\n",
+    "    (res[\"entity\"][\"text\"], res[\"distance\"]) for res in search_res[0]\n",
+    "]\n",
+    "print(json.dumps(retrieved_lines_with_distances, indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fecbb753851d312",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "### Use LLM to get a RAG response\n",
+    "\n",
+    "Convert the retrieved documents into a string format.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "c7d406532a6c0d59",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:26:08.436450Z",
+     "start_time": "2025-01-14T09:26:08.427915Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "context = \"\\n\".join(\n",
+    "    [line_with_distance[0] for line_with_distance in retrieved_lines_with_distances]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34aabd19ee91b7d",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Define system and user prompts for the Lanage Model. This prompt is assembled with the retrieved documents from Milvus.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "42c4f2f501f80607",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:26:11.452524Z",
+     "start_time": "2025-01-14T09:26:11.445572Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "SYSTEM_PROMPT = \"\"\"\n",
+    "Human: You are an AI assistant. You are able to find answers to the questions from the contextual passage snippets provided.\n",
+    "\"\"\"\n",
+    "USER_PROMPT = f\"\"\"\n",
+    "Use the following pieces of information enclosed in <context> tags to provide an answer to the question enclosed in <question> tags.\n",
+    "<context>\n",
+    "{context}\n",
+    "</context>\n",
+    "<question>\n",
+    "{question}\n",
+    "</question>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fe76342b1c2ea6c",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "Use OpenAI ChatGPT to generate a response based on the prompts.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "c340830791e55c03",
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "end_time": "2025-01-14T09:26:26.095984Z",
+     "start_time": "2025-01-14T09:26:23.196524Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The main components of autonomous agents are:\n",
+      "\n",
+      "1. **Planning**:\n",
+      "   - Subgoal and decomposition: Breaking down large tasks into smaller, manageable subgoals.\n",
+      "   - Reflection and refinement: Self-criticism and reflection to learn from past actions and improve future steps.\n",
+      "\n",
+      "2. **Memory**:\n",
+      "   - Short-term memory: In-context learning using prompt engineering.\n",
+      "   - Long-term memory: Retaining and recalling information over extended periods using an external vector store and fast retrieval.\n",
+      "\n",
+      "3. **Tool use**:\n",
+      "   - Calling external APIs for information not contained in the model weights, accessing current information, code execution capabilities, and proprietary information sources.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = openai_client.chat.completions.create(\n",
+    "    model=\"gpt-4o\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n",
+    "        {\"role\": \"user\", \"content\": USER_PROMPT},\n",
+    "    ],\n",
+    ")\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "outputs": [],
+   "source": [],
+   "metadata": {
+    "collapsed": false
+   },
+   "id": "da31f19e0c8b5bb8"
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}