From cca75e28f7746c8ed65b1969440251b09e97fb4f Mon Sep 17 00:00:00 2001 From: Korey Stegared-Pace Date: Mon, 30 Sep 2024 15:03:58 +0200 Subject: [PATCH 1/3] added mistral sample --- 20-mistral/README.md | 0 .../python/githubmodels-assignment.ipynb | 62 +++++++++++++++++++ 2 files changed, 62 insertions(+) create mode 100644 20-mistral/README.md create mode 100644 20-mistral/python/githubmodels-assignment.ipynb diff --git a/20-mistral/README.md b/20-mistral/README.md new file mode 100644 index 000000000..e69de29bb diff --git a/20-mistral/python/githubmodels-assignment.ipynb b/20-mistral/python/githubmodels-assignment.ipynb new file mode 100644 index 000000000..7179edcfe --- /dev/null +++ b/20-mistral/python/githubmodels-assignment.ipynb @@ -0,0 +1,62 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "ename": "", + "evalue": "", + "output_type": "error", + "traceback": [ + "\u001b[1;31mRunning cells with 'Python 3.12.6' requires the ipykernel package.\n", + "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n", + "\u001b[1;31mCommand: '/opt/homebrew/bin/python3 -m pip install ipykernel -U --user --force-reinstall'" + ] + } + ], + "source": [ + "import os\n", + "from azure.ai.inference import ChatCompletionsClient\n", + "from azure.ai.inference.models import SystemMessage, UserMessage\n", + "from azure.core.credentials import AzureKeyCredential\n", + "\n", + "endpoint = \"https://models.inference.ai.azure.com\"\n", + "model_name = \"Mistral-large\"\n", + "token = os.environ[\"GITHUB_TOKEN\"]\n", + "\n", + "client = ChatCompletionsClient(\n", + " endpoint=endpoint,\n", + " credential=AzureKeyCredential(token),\n", + ")\n", + "\n", + "response = client.complete(\n", + " messages=[\n", + " SystemMessage(content=\"You are a helpful assistant.\"),\n", + " UserMessage(content=\"What is the capital of France?\"),\n", + " ],\n", + " temperature=1.0,\n", + " top_p=1.0,\n", + " max_tokens=1000,\n", + " model=model_name\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.12.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 6c8454abd538a3f6269d0af33afb931de0894189 Mon Sep 17 00:00:00 2001 From: Korey Stegared-Pace Date: Wed, 2 Oct 2024 11:00:24 +0000 Subject: [PATCH 2/3] Added Mistral Lesson --- .../python/githubmodels-assignment.ipynb | 509 +++++++++++++++++- 1 file changed, 498 insertions(+), 11 deletions(-) diff --git a/20-mistral/python/githubmodels-assignment.ipynb b/20-mistral/python/githubmodels-assignment.ipynb index 7179edcfe..ca965527a 100644 --- a/20-mistral/python/githubmodels-assignment.ipynb +++ b/20-mistral/python/githubmodels-assignment.ipynb @@ -1,26 +1,110 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Building with Mistral Models \n", + "\n", + "## Introduction \n", + "\n", + "This lesson will cover: \n", + "- Exploring the different Mistral Models \n", + "- Understanding the use-cases and scenarios for each model \n", + "- Code samples show the unique features of each model. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Mistral Models \n", + "\n", + "In this lesson, we will explore 3 different Mistral models: \n", + "**Mistral Large**, **Mistral Small** and **Mistral Nemo**. \n", + "\n", + "Each of these models are available free on the Github Model marketplace. The code in this notebook will be using this models to run the code. Here are more details on using Github Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst). \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral Large 2 (2407)\n", + "Mistral Large 2 is currently the flagship model from Mistral and is designed for enterprise use. \n", + "\n", + "The model is an upgrade to the original Mistral Large by offering \n", + "- Larger Context Window - 128k vs 32k \n", + "- Better performance on Math and Coding Tasks - 76.9% average accuracy vs 60.4% \n", + "- Increased multilingual performance - languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.\n", + "\n", + "With these features, Mistral Large excels at \n", + "- *Retrieval Augmented Generation (RAG)* - due to the larger context window\n", + "- *Function Calling* - this model has native function calling which allows integration with external tools and APIs. These calls can be made both in parallel or one after another in a sequential order. \n", + "- *Code Generation* - this model excels on Python, Java, TypeScript and C++ generation. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### RAG Example using Mistral Large 2 " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this example, we are using Mistral Large 2 to run a RAG pattern over a text document. The question is written in Korean and asks about the author's activities before college. \n", + "\n", + "It uses Cohere Embeddings Model to create embeddings of the text document as well as the question. For this sample, it uses the faiss Python package as a vector store. \n", + "\n", + "The prompt sent to the Mistral model includes both the questions and the retrieved chunks that are similar to the question. The Model then provides a natural language response. " + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 50, "metadata": {}, "outputs": [ { - "ename": "", - "evalue": "", - "output_type": "error", - "traceback": [ - "\u001b[1;31mRunning cells with 'Python 3.12.6' requires the ipykernel package.\n", - "\u001b[1;31mRun the following command to install 'ipykernel' into the Python environment. \n", - "\u001b[1;31mCommand: '/opt/homebrew/bin/python3 -m pip install ipykernel -U --user --force-reinstall'" + "name": "stdout", + "output_type": "stream", + "text": [ + "Requirement already satisfied: faiss-cpu in /home/codespace/.python/current/lib/python3.12/site-packages (1.8.0.post1)\n", + "Requirement already satisfied: numpy<2.0,>=1.0 in /home/codespace/.python/current/lib/python3.12/site-packages (from faiss-cpu) (1.26.4)\n", + "Requirement already satisfied: packaging in /home/codespace/.local/lib/python3.12/site-packages (from faiss-cpu) (24.1)\n", + "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ + "pip install faiss-cpu" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The author primarily engaged in two activities before college: writing and programming. In terms of writing, they wrote short stories, albeit not very good ones, with minimal plot and characters expressing strong feelings. For programming, they started writing programs on the IBM 1401 used for data processing during their 9th grade, at the age of 13 or 14. They used an early version of Fortran and typed programs on punch cards, later loading them into the card reader to run the program.\n" + ] + } + ], + "source": [ + "import requests\n", + "import numpy as np\n", + "import faiss\n", "import os\n", + "\n", "from azure.ai.inference import ChatCompletionsClient\n", "from azure.ai.inference.models import SystemMessage, UserMessage\n", "from azure.core.credentials import AzureKeyCredential\n", + "from azure.ai.inference import EmbeddingsClient\n", "\n", "endpoint = \"https://models.inference.ai.azure.com\"\n", "model_name = \"Mistral-large\"\n", @@ -31,10 +115,158 @@ " credential=AzureKeyCredential(token),\n", ")\n", "\n", - "response = client.complete(\n", + "response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')\n", + "text = response.text\n", + "\n", + "chunk_size = 2048\n", + "chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]\n", + "len(chunks)\n", + "\n", + "embed_model_name = \"cohere-embed-v3-multilingual\" \n", + "\n", + "embed_client = EmbeddingsClient(\n", + " endpoint=endpoint,\n", + " credential=AzureKeyCredential(token)\n", + ")\n", + "\n", + "embed_response = embed_client.embed(\n", + " input=chunks,\n", + " model=embed_model_name\n", + ")\n", + "\n", + "\n", + "\n", + "text_embeddings = []\n", + "for item in embed_response.data:\n", + " length = len(item.embedding)\n", + " text_embeddings.append(item.embedding)\n", + "text_embeddings = np.array(text_embeddings)\n", + "\n", + "\n", + "d = text_embeddings.shape[1]\n", + "index = faiss.IndexFlatL2(d)\n", + "index.add(text_embeddings)\n", + "\n", + "question = \"저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요??\"\n", + "\n", + "question_embedding = embed_client.embed(\n", + " input=[question],\n", + " model=embed_model_name\n", + ")\n", + "\n", + "question_embeddings = np.array(question_embedding.data[0].embedding)\n", + "\n", + "\n", + "D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index\n", + "retrieved_chunks = [chunks[i] for i in I.tolist()[0]]\n", + "\n", + "prompt = f\"\"\"\n", + "Context information is below.\n", + "---------------------\n", + "{retrieved_chunks}\n", + "---------------------\n", + "Given the context information and not prior knowledge, answer the query.\n", + "Query: {question}\n", + "Answer:\n", + "\"\"\"\n", + "\n", + "\n", + "chat_response = client.complete(\n", " messages=[\n", " SystemMessage(content=\"You are a helpful assistant.\"),\n", - " UserMessage(content=\"What is the capital of France?\"),\n", + " UserMessage(content=prompt),\n", + " ],\n", + " temperature=1.0,\n", + " top_p=1.0,\n", + " max_tokens=1000,\n", + " model=model_name\n", + ")\n", + "\n", + "print(chat_response.choices[0].message.content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral Small \n", + "Mistral Small is another model in the Mistral family of models under the premier/enterprise category. As the name implies, this model is a Small Language Model (SLM). The advantages of using Mistral Small are that it is: \n", + "- Cost Saving compared to Mistral LLMs like Mistral Large and NeMo - 80% price drop\n", + "- Low latency - faster response compared to Mistral's LLMs\n", + "- Flexible - can be deployed across different environments with less restrictions on required resources. \n", + "\n", + "\n", + "Mistral Small is great for: \n", + "- Text based tasks such as summarization, sentiment analysis and translation. \n", + "- Applications where frequent requests are made due to its cost effectiveness \n", + "- Low latency code tasks like review and code suggestions \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Comparing Mistral Small and Mistral Large \n", + "\n", + "To show differences in latency between Mistral Small and Large, run the below cells. \n", + "\n", + "You should see a difference in response times between 3-5 seconds. Also not the response lengths and style over the smae prompt. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os \n", + "endpoint = \"https://models.inference.ai.azure.com\"\n", + "model_name = \"Mistral-small\"\n", + "token = os.environ[\"GITHUB_TOKEN\"]\n", + "\n", + "client = ChatCompletionsClient(\n", + " endpoint=endpoint,\n", + " credential=AzureKeyCredential(token),\n", + ")\n", + "\n", + "response = client.complete(\n", + " messages=[\n", + " SystemMessage(content=\"You are a helpful coding assistant.\"),\n", + " UserMessage(content=\"Can you write a Python function to the fizz buzz test?\"),\n", + " ],\n", + " temperature=1.0,\n", + " top_p=1.0,\n", + " max_tokens=1000,\n", + " model=model_name\n", + ")\n", + "\n", + "print(response.choices[0].message.content)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azure.ai.inference import ChatCompletionsClient\n", + "from azure.ai.inference.models import SystemMessage, UserMessage\n", + "from azure.core.credentials import AzureKeyCredential\n", + "\n", + "endpoint = \"https://models.inference.ai.azure.com\"\n", + "model_name = \"Mistral-large\"\n", + "token = os.environ[\"GITHUB_TOKEN\"]\n", + "\n", + "client = ChatCompletionsClient(\n", + " endpoint=endpoint,\n", + " credential=AzureKeyCredential(token),\n", + ")\n", + "\n", + "response = client.complete(\n", + " messages=[\n", + " SystemMessage(content=\"You are a helpful coding assistant.\"),\n", + " UserMessage(content=\"Can you write a Python function to the fizz buzz test?\"),\n", " ],\n", " temperature=1.0,\n", " top_p=1.0,\n", @@ -44,6 +276,253 @@ "\n", "print(response.choices[0].message.content)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral NeMo\n", + "\n", + "Compared to the other two models discussed in this lesson, Mistral NeMo is the only free model with an Apache2 License. \n", + "\n", + "It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral 7B. \n", + "\n", + "Some other feature of the NeMo model are: \n", + "\n", + "- *More efficient tokenization:* This model using the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code. \n", + "\n", + "- *Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed. \n", + "\n", + "- *Native Function Calling* - Like Mistral Large, this model has been trained on function calling. This makes it unique as being one of the first open source models to do so. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Mistral NeMo\n", + "\n", + "Compared to the other two models discussed in this lesson, Mistral NeMo is the only free model with an Apache2 License. \n", + "\n", + "It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral 7B. \n", + "\n", + "Some other feature of the NeMo model are: \n", + "\n", + "- *More efficient tokenization:* This model uses the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code. \n", + "\n", + "- *Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed. \n", + "\n", + "- *Native Function Calling* - Like Mistral Large, this model has been trained on function calling. This makes it unique as being one of the first open source models to do so. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Comparing Tokenizers \n", + "\n", + "In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large. \n", + "\n", + "Both samples take the same prompt but you shoud see that NeMo returns back less tokens vs Mistral Large. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collecting mistral-common\n", + " Downloading mistral_common-1.4.4-py3-none-any.whl.metadata (4.6 kB)\n", + "Requirement already satisfied: jsonschema<5.0.0,>=4.21.1 in /home/codespace/.local/lib/python3.12/site-packages (from mistral-common) (4.23.0)\n", + "Requirement already satisfied: numpy>=1.25 in /home/codespace/.local/lib/python3.12/site-packages (from mistral-common) (2.1.1)\n", + "Requirement already satisfied: pillow<11.0.0,>=10.3.0 in /home/codespace/.local/lib/python3.12/site-packages (from mistral-common) (10.4.0)\n", + "Requirement already satisfied: pydantic<3.0.0,>=2.6.1 in /home/codespace/.python/current/lib/python3.12/site-packages (from mistral-common) (2.9.2)\n", + "Requirement already satisfied: requests<3.0.0,>=2.0.0 in /home/codespace/.local/lib/python3.12/site-packages (from mistral-common) (2.32.3)\n", + "Collecting sentencepiece==0.2.0 (from mistral-common)\n", + " Downloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)\n", + "Collecting tiktoken<0.8.0,>=0.7.0 (from mistral-common)\n", + " Downloading tiktoken-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)\n", + "Requirement already satisfied: typing-extensions<5.0.0,>=4.11.0 in /home/codespace/.python/current/lib/python3.12/site-packages (from mistral-common) (4.12.2)\n", + "Requirement already satisfied: attrs>=22.2.0 in /home/codespace/.local/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.21.1->mistral-common) (24.2.0)\n", + "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/codespace/.local/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.21.1->mistral-common) (2023.12.1)\n", + "Requirement already satisfied: referencing>=0.28.4 in /home/codespace/.local/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.21.1->mistral-common) (0.35.1)\n", + "Requirement already satisfied: rpds-py>=0.7.1 in /home/codespace/.local/lib/python3.12/site-packages (from jsonschema<5.0.0,>=4.21.1->mistral-common) (0.20.0)\n", + "Requirement already satisfied: annotated-types>=0.6.0 in /home/codespace/.python/current/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.6.1->mistral-common) (0.7.0)\n", + "Requirement already satisfied: pydantic-core==2.23.4 in /home/codespace/.python/current/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.6.1->mistral-common) (2.23.4)\n", + "Requirement already satisfied: charset-normalizer<4,>=2 in /home/codespace/.local/lib/python3.12/site-packages (from requests<3.0.0,>=2.0.0->mistral-common) (3.3.2)\n", + "Requirement already satisfied: idna<4,>=2.5 in /home/codespace/.local/lib/python3.12/site-packages (from requests<3.0.0,>=2.0.0->mistral-common) (3.10)\n", + "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/codespace/.local/lib/python3.12/site-packages (from requests<3.0.0,>=2.0.0->mistral-common) (2.2.3)\n", + "Requirement already satisfied: certifi>=2017.4.17 in /home/codespace/.local/lib/python3.12/site-packages (from requests<3.0.0,>=2.0.0->mistral-common) (2024.8.30)\n", + "Collecting regex>=2022.1.18 (from tiktoken<0.8.0,>=0.7.0->mistral-common)\n", + " Downloading regex-2024.9.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)\n", + "Downloading mistral_common-1.4.4-py3-none-any.whl (6.0 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.0/6.0 MB\u001b[0m \u001b[31m63.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading sentencepiece-0.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m19.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading tiktoken-0.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m16.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hDownloading regex-2024.9.11-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (797 kB)\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m797.0/797.0 kB\u001b[0m \u001b[31m15.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25hInstalling collected packages: sentencepiece, regex, tiktoken, mistral-common\n", + "Successfully installed mistral-common-1.4.4 regex-2024.9.11 sentencepiece-0.2.0 tiktoken-0.7.0\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "pip install mistral-common" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "128\n" + ] + } + ], + "source": [ + "# Import needed packages:\n", + "from mistral_common.protocol.instruct.messages import (\n", + " UserMessage,\n", + ")\n", + "from mistral_common.protocol.instruct.request import ChatCompletionRequest\n", + "from mistral_common.protocol.instruct.tool_calls import (\n", + " Function,\n", + " Tool,\n", + ")\n", + "from mistral_common.tokens.tokenizers.mistral import MistralTokenizer\n", + "\n", + "# Load Mistral tokenizer\n", + "\n", + "model_name = \"open-mistral-nemo\t\"\n", + "\n", + "tokenizer = MistralTokenizer.from_model(model_name)\n", + "\n", + "# Tokenize a list of messages\n", + "tokenized = tokenizer.encode_chat_completion(\n", + " ChatCompletionRequest(\n", + " tools=[\n", + " Tool(\n", + " function=Function(\n", + " name=\"get_current_weather\",\n", + " description=\"Get the current weather\",\n", + " parameters={\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city and state, e.g. San Francisco, CA\",\n", + " },\n", + " \"format\": {\n", + " \"type\": \"string\",\n", + " \"enum\": [\"celsius\", \"fahrenheit\"],\n", + " \"description\": \"The temperature unit to use. Infer this from the users location.\",\n", + " },\n", + " },\n", + " \"required\": [\"location\", \"format\"],\n", + " },\n", + " )\n", + " )\n", + " ],\n", + " messages=[\n", + " UserMessage(content=\"What's the weather like today in Paris\"),\n", + " ],\n", + " model=model_name,\n", + " )\n", + ")\n", + "tokens, text = tokenized.tokens, tokenized.text\n", + "\n", + "# Count the number of tokens\n", + "print(len(tokens))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "135\n" + ] + } + ], + "source": [ + "# Import needed packages:\n", + "from mistral_common.protocol.instruct.messages import (\n", + " UserMessage,\n", + ")\n", + "from mistral_common.protocol.instruct.request import ChatCompletionRequest\n", + "from mistral_common.protocol.instruct.tool_calls import (\n", + " Function,\n", + " Tool,\n", + ")\n", + "from mistral_common.tokens.tokenizers.mistral import MistralTokenizer\n", + "\n", + "# Load Mistral tokenizer\n", + "\n", + "model_name = \"mistral-large-latest\"\n", + "\n", + "tokenizer = MistralTokenizer.from_model(model_name)\n", + "\n", + "# Tokenize a list of messages\n", + "tokenized = tokenizer.encode_chat_completion(\n", + " ChatCompletionRequest(\n", + " tools=[\n", + " Tool(\n", + " function=Function(\n", + " name=\"get_current_weather\",\n", + " description=\"Get the current weather\",\n", + " parameters={\n", + " \"type\": \"object\",\n", + " \"properties\": {\n", + " \"location\": {\n", + " \"type\": \"string\",\n", + " \"description\": \"The city and state, e.g. San Francisco, CA\",\n", + " },\n", + " \"format\": {\n", + " \"type\": \"string\",\n", + " \"enum\": [\"celsius\", \"fahrenheit\"],\n", + " \"description\": \"The temperature unit to use. Infer this from the users location.\",\n", + " },\n", + " },\n", + " \"required\": [\"location\", \"format\"],\n", + " },\n", + " )\n", + " )\n", + " ],\n", + " messages=[\n", + " UserMessage(content=\"What's the weather like today in Paris\"),\n", + " ],\n", + " model=model_name,\n", + " )\n", + ")\n", + "tokens, text = tokenized.tokens, tokenized.text\n", + "\n", + "# Count the number of tokens\n", + "print(len(tokens))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Learning does not stop here, continue the Journey\n", + "\n", + "After completing this lesson, check out our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue leveling up your Generative AI knowledge!" + ] } ], "metadata": { @@ -53,8 +532,16 @@ "name": "python3" }, "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", "name": "python", - "version": "3.12.6" + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.1" } }, "nbformat": 4, From eab82719f8ca2c62017c0a536a47a0395bb1b3e9 Mon Sep 17 00:00:00 2001 From: Korey Stegared-Pace Date: Wed, 2 Oct 2024 11:18:21 +0000 Subject: [PATCH 3/3] cleaned up readme --- 20-mistral/README.md | 348 +++++++++++++++++++++++++++++++++++++++++++ README.md | 4 + 2 files changed, 352 insertions(+) diff --git a/20-mistral/README.md b/20-mistral/README.md index e69de29bb..386169edd 100644 --- a/20-mistral/README.md +++ b/20-mistral/README.md @@ -0,0 +1,348 @@ +# Building with Mistral Models + +## Introduction + +This lesson will cover: +- Exploring the different Mistral Models +- Understanding the use-cases and scenarios for each model +- Code samples show the unique features of each model. + +## The Mistral Models + +In this lesson, we will explore 3 different Mistral models: +**Mistral Large**, **Mistral Small** and **Mistral Nemo**. + +Each of these models are available free on the Github Model marketplace. The code in this notebook will be using this models to run the code. Here are more details on using Github Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst). + + +## Mistral Large 2 (2407) +Mistral Large 2 is currently the flagship model from Mistral and is designed for enterprise use. + +The model is an upgrade to the original Mistral Large by offering +- Larger Context Window - 128k vs 32k +- Better performance on Math and Coding Tasks - 76.9% average accuracy vs 60.4% +- Increased multilingual performance - languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi. + +With these features, Mistral Large excels at +- *Retrieval Augmented Generation (RAG)* - due to the larger context window +- *Function Calling* - this model has native function calling which allows integration with external tools and APIs. These calls can be made both in parallel or one after another in a sequential order. +- *Code Generation* - this model excels on Python, Java, TypeScript and C++ generation. + +### RAG Example using Mistral Large 2 + +In this example, we are using Mistral Large 2 to run a RAG pattern over a text document. The question is written in Korean and asks about the author's activities before college. + +It uses Cohere Embeddings Model to create embeddings of the text document as well as the question. For this sample, it uses the faiss Python package as a vector store. + +The prompt sent to the Mistral model includes both the questions and the retrieved chunks that are similar to the question. The Model then provides a natural language response. + +```python +pip install faiss-cpu +``` + +```python +import requests +import numpy as np +import faiss +import os + +from azure.ai.inference import ChatCompletionsClient +from azure.ai.inference.models import SystemMessage, UserMessage +from azure.core.credentials import AzureKeyCredential +from azure.ai.inference import EmbeddingsClient + +endpoint = "https://models.inference.ai.azure.com" +model_name = "Mistral-large" +token = os.environ["GITHUB_TOKEN"] + +client = ChatCompletionsClient( + endpoint=endpoint, + credential=AzureKeyCredential(token), +) + +response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt') +text = response.text + +chunk_size = 2048 +chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)] +len(chunks) + +embed_model_name = "cohere-embed-v3-multilingual" + +embed_client = EmbeddingsClient( + endpoint=endpoint, + credential=AzureKeyCredential(token) +) + +embed_response = embed_client.embed( + input=chunks, + model=embed_model_name +) + + + +text_embeddings = [] +for item in embed_response.data: + length = len(item.embedding) + text_embeddings.append(item.embedding) +text_embeddings = np.array(text_embeddings) + + +d = text_embeddings.shape[1] +index = faiss.IndexFlatL2(d) +index.add(text_embeddings) + +question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요??" + +question_embedding = embed_client.embed( + input=[question], + model=embed_model_name +) + +question_embeddings = np.array(question_embedding.data[0].embedding) + + +D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index +retrieved_chunks = [chunks[i] for i in I.tolist()[0]] + +prompt = f""" +Context information is below. +--------------------- +{retrieved_chunks} +--------------------- +Given the context information and not prior knowledge, answer the query. +Query: {question} +Answer: +""" + + +chat_response = client.complete( + messages=[ + SystemMessage(content="You are a helpful assistant."), + UserMessage(content=prompt), + ], + temperature=1.0, + top_p=1.0, + max_tokens=1000, + model=model_name +) + +print(chat_response.choices[0].message.content) +``` + +## Mistral Small +Mistral Small is another model in the Mistral family of models under the premier/enterprise category. As the name implies, this model is a Small Language Model (SLM). The advantages of using Mistral Small are that it is: +- Cost Saving compared to Mistral LLMs like Mistral Large and NeMo - 80% price drop +- Low latency - faster response compared to Mistral's LLMs +- Flexible - can be deployed across different environments with less restrictions on required resources. + + +Mistral Small is great for: +- Text based tasks such as summarization, sentiment analysis and translation. +- Applications where frequent requests are made due to its cost effectiveness +- Low latency code tasks like review and code suggestions + +## Comparing Mistral Small and Mistral Large + +To show differences in latency between Mistral Small and Large, run the below cells. + +You should see a difference in response times between 3-5 seconds. Also not the response lengths and style over the smae prompt. + +```python + +import os +endpoint = "https://models.inference.ai.azure.com" +model_name = "Mistral-small" +token = os.environ["GITHUB_TOKEN"] + +client = ChatCompletionsClient( + endpoint=endpoint, + credential=AzureKeyCredential(token), +) + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful coding assistant."), + UserMessage(content="Can you write a Python function to the fizz buzz test?"), + ], + temperature=1.0, + top_p=1.0, + max_tokens=1000, + model=model_name +) + +print(response.choices[0].message.content) + +``` + +```python + +import os +from azure.ai.inference import ChatCompletionsClient +from azure.ai.inference.models import SystemMessage, UserMessage +from azure.core.credentials import AzureKeyCredential + +endpoint = "https://models.inference.ai.azure.com" +model_name = "Mistral-large" +token = os.environ["GITHUB_TOKEN"] + +client = ChatCompletionsClient( + endpoint=endpoint, + credential=AzureKeyCredential(token), +) + +response = client.complete( + messages=[ + SystemMessage(content="You are a helpful coding assistant."), + UserMessage(content="Can you write a Python function to the fizz buzz test?"), + ], + temperature=1.0, + top_p=1.0, + max_tokens=1000, + model=model_name +) + +print(response.choices[0].message.content) + +``` + +## Mistral NeMo + +Compared to the other two models discussed in this lesson, Mistral NeMo is the only free model with an Apache2 License. + +It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral 7B. + +Some other feature of the NeMo model are: + +- *More efficient tokenization:* This model using the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code. + +- *Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed. + +- *Native Function Calling* - Like Mistral Large, this model has been trained on function calling. This makes it unique as being one of the first open source models to do so. + + +### Comparing Tokenizers + +In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large. + +Both samples take the same prompt but you shoud see that NeMo returns back less tokens vs Mistral Large. + +```bash +pip install mistral-common +``` + +```python +# Import needed packages: +from mistral_common.protocol.instruct.messages import ( + UserMessage, +) +from mistral_common.protocol.instruct.request import ChatCompletionRequest +from mistral_common.protocol.instruct.tool_calls import ( + Function, + Tool, +) +from mistral_common.tokens.tokenizers.mistral import MistralTokenizer + +# Load Mistral tokenizer + +model_name = "open-mistral-nemo " + +tokenizer = MistralTokenizer.from_model(model_name) + +# Tokenize a list of messages +tokenized = tokenizer.encode_chat_completion( + ChatCompletionRequest( + tools=[ + Tool( + function=Function( + name="get_current_weather", + description="Get the current weather", + parameters={ + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "format": { + "type": "string", + "enum": ["celsius", "fahrenheit"], + "description": "The temperature unit to use. Infer this from the users location.", + }, + }, + "required": ["location", "format"], + }, + ) + ) + ], + messages=[ + UserMessage(content="What's the weather like today in Paris"), + ], + model=model_name, + ) +) +tokens, text = tokenized.tokens, tokenized.text + +# Count the number of tokens +print(len(tokens)) +``` + +```python +# Import needed packages: +from mistral_common.protocol.instruct.messages import ( + UserMessage, +) +from mistral_common.protocol.instruct.request import ChatCompletionRequest +from mistral_common.protocol.instruct.tool_calls import ( + Function, + Tool, +) +from mistral_common.tokens.tokenizers.mistral import MistralTokenizer + +# Load Mistral tokenizer + +model_name = "mistral-large-latest" + +tokenizer = MistralTokenizer.from_model(model_name) + +# Tokenize a list of messages +tokenized = tokenizer.encode_chat_completion( + ChatCompletionRequest( + tools=[ + Tool( + function=Function( + name="get_current_weather", + description="Get the current weather", + parameters={ + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "format": { + "type": "string", + "enum": ["celsius", "fahrenheit"], + "description": "The temperature unit to use. Infer this from the users location.", + }, + }, + "required": ["location", "format"], + }, + ) + ) + ], + messages=[ + UserMessage(content="What's the weather like today in Paris"), + ], + model=model_name, + ) +) +tokens, text = tokenized.tokens, tokenized.text + +# Count the number of tokens +print(len(tokens)) +``` + +## Learning does not stop here, continue the Journey + +After completing this lesson, check out our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue leveling up your Generative AI knowledge! \ No newline at end of file diff --git a/README.md b/README.md index f1242fa79..b4f4e9f87 100644 --- a/README.md +++ b/README.md @@ -82,11 +82,15 @@ Do you have suggestions or found spelling or code errors? [Raise an issue](https | 16 | [Open Source Models and Hugging Face](./16-open-source-models/README.md?WT.mc_id=academic-105485-koreyst) | **Build:** An application using open source models available on Hugging Face | [Video](https://aka.ms/gen-ai-lesson16-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) | | 17 | [AI Agents](./17-ai-agents/README.md?WT.mc_id=academic-105485-koreyst) | **Build:** An application using an AI Agent Framework | [Video](https://aka.ms/gen-ai-lesson17-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) | | 18 | [Fine-Tuning LLMs](./18-fine-tuning/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The what, why and how of fine-tuning LLMs | [Video](https://aka.ms/gen-ai-lesson18-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) | +| 18 | [Building with SLMs](./19-slm/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The benefits of building with Small Language Models | Video Coming Soon | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) | +| 18 | [Building with Mistral Models](./20-mistral/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The features and differences of the Mistral Family Models | Video Coming Soon | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) | ### 🌟 Special thanks Special thanks to [**John Aziz**](https://www.linkedin.com/in/john0isaac/) for creating all of the GitHub Actions and workflows +[**Bernhard Merkle**](https://www.linkedin.com/in/bernhard-merkle-738b73/) for making key contributions to each lesson to improve the learner and code experience. + ## 🎒 Other Courses Our team produces other courses! Check out: