JohnSnowLabs · ArshaanNazir · Nov 9, 2023 · Nov 7, 2023 · Nov 7, 2023 · Nov 7, 2023
diff --git a/demo/tutorials/llm_notebooks/AI21_QA_Summarization_Testing_Notebook.ipynb b/demo/tutorials/llm_notebooks/AI21_QA_Summarization_Testing_Notebook.ipynb
@@ -162,60 +162,30 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 4,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "p_5nO14bvTzt",
-        "outputId": "cee6c5f4-6f32-4f72-e9db-440a410b59c7"
-      },
+      "execution_count": null,
+      "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", model={\"model\": \"j2-jumbo-instruct\", \"hub\":\"ai21\"}, data={\"data_source\": 'BoolQ-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"question-answering\", \n",
+        "                  model={\"model\": \"j2-jumbo-instruct\", \"hub\":\"ai21\"}, \n",
+        "                  data={\"data_source\" :\"BBQ\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
-      "attachments": {},
       "cell_type": "markdown",
-      "metadata": {
-        "id": "jWPAw9q0PwD1"
-      },
+      "metadata": {},
       "source": [
-        "We have specified task as QA, hub as AI21 and model as `j2-jumbo-instruct`.\n",
-        "\n",
-        "For dataset we used `BoolQ-test-tiny` which includes 50 lines from BoolQ-test. Other available datasets are:\n",
-        "\n",
-        "#### BoolQ\n",
-        "* `BoolQ-test-tiny`\n",
-        "* `BoolQ-test`\n",
-        "* `BoolQ-combined`\n",
-        "#### NQ-open\n",
-        "* `NQ-open-test`\n",
-        "* `NQ-open-combined`\n",
-        "* `NQ-open-test-tiny`\n",
-        "#### TruthfulQA\n",
-        "* `TruthfulQA-combined`\n",
-        "* `TruthfulQA-test`\n",
-        "* `TruthfulQA-tiny`\n",
-        "#### MMLU\n",
-        "* `MMLU-test`\n",
-        "* `MMLU-test-tiny`\n",
-        "#### OpenBookQA\n",
-        "* `OpenBookQA-test`\n",
-        "* `OpenBookQA-test-tiny`\n",
-        "#### QUAC\n",
-        "* `Quac-test`\n",
-        "* `Quac-test-tiny`\n",
-        "#### NarrativeQA\n",
-        "* `NarrativeQA-test`\n",
-        "* `NarrativeQA-test-tiny`\n",
-        "#### HellaSwag\n",
-        "* `HellaSwag-test`\n",
-        "* `HellaSwag-test-tiny`\n",
-        "#### BBQ\n",
-        "* `BBQ-test`\n",
-        "* `BBQ-test-tiny`"
+        "We have specified task as QA, hub as AI21 and model as `j2-jumbo-instruct`."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "For dataset we used `BoolQ` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#question-answering)"
       ]
     },
     {
@@ -1135,17 +1105,16 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 17,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "oDh3Zaa9EDfZ",
-        "outputId": "10443ac6-8c92-4e86-ef4e-7050962c4255"
-      },
+      "execution_count": null,
+      "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"}, data={\"data_source\": 'NQ-open-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"question-answering\", \n",
+        "                  model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"}, \n",
+        "                  data={\"data_source\" :\"NQ-open\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
@@ -1814,11 +1783,16 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 10,
+      "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"summarization\", model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"}, data={\"data_source\": 'XSum-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"summarization\", \n",
+        "                  model={\"model\": \"j2-jumbo-instruct\", \"hub\": \"ai21\"},\n",
+        "                  data={\"data_source\" :\"XSum\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
@@ -1829,10 +1803,7 @@
         "We have specified task as summarization, hub as AI21 and model as `j2-jumbo-instruct`.\n",
         "\n",
         "\n",
-        "For dataset we used XSum-test-tiny which includes 50 lines from XSum-test. Available datasets for summarization are:\n",
-        "\n",
-        "* `XSum-test`\n",
-        "* `XSum-test-tiny`"
+        "For dataset we used `XSum` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#summarization)"
       ]
     },
     {

diff --git a/demo/tutorials/llm_notebooks/Azure_OpenAI_QA_Summarization_Testing_Notebook.ipynb b/demo/tutorials/llm_notebooks/Azure_OpenAI_QA_Summarization_Testing_Notebook.ipynb
@@ -162,17 +162,16 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 4,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "p_5nO14bvTzt",
-        "outputId": "cee6c5f4-6f32-4f72-e9db-440a410b59c7"
-      },
+      "execution_count": null,
+      "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", model={\"model\": \"text-davinci-003\", \"hub\":\"azure-openai\"} data={\"data_source\": 'BoolQ-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"question-answering\", \n",
+        "                  model={\"model\": \"text-davinci-003\",\"hub\":\"azure-openai\"}, \n",
+        "                  data={\"data_source\" :\"BoolQ\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
@@ -184,38 +183,7 @@
       "source": [
         "We have specified task as QA, hub as OpenAI and model as text-davinci-003, text-davinci-002 whatever model available from azure openai services.\n",
         "\n",
-        "For dataset we used `BoolQ-test-tiny` which includes 50 lines from BoolQ-test. Other available datasets are:\n",
-        "\n",
-        "#### BoolQ\n",
-        "* `BoolQ-test-tiny`\n",
-        "* `BoolQ-test`\n",
-        "* `BoolQ-combined`\n",
-        "#### NQ-open\n",
-        "* `NQ-open-test`\n",
-        "* `NQ-open-combined`\n",
-        "* `NQ-open-test-tiny`\n",
-        "#### TruthfulQA\n",
-        "* `TruthfulQA-combined`\n",
-        "* `TruthfulQA-test`\n",
-        "* `TruthfulQA-tiny`\n",
-        "#### MMLU\n",
-        "* `MMLU-test`\n",
-        "* `MMLU-test-tiny`\n",
-        "#### OpenBookQA\n",
-        "* `OpenBookQA-test`\n",
-        "* `OpenBookQA-test-tiny`\n",
-        "#### QUAC\n",
-        "* `Quac-test`\n",
-        "* `Quac-test-tiny`\n",
-        "#### NarrativeQA\n",
-        "* `NarrativeQA-test`\n",
-        "* `NarrativeQA-test-tiny`\n",
-        "#### HellaSwag\n",
-        "* `HellaSwag-test`\n",
-        "* `HellaSwag-test-tiny`\n",
-        "#### BBQ\n",
-        "* `BBQ-test`\n",
-        "* `BBQ-test-tiny`"
+        "For dataset we used `BoolQ` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#question-answering)"
       ]
     },
     {
@@ -1120,18 +1088,16 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 14,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "oDh3Zaa9EDfZ",
-        "outputId": "10443ac6-8c92-4e86-ef4e-7050962c4255"
-      },
+      "execution_count": null,
+      "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task=\"question-answering\", model={\"model\": \"text-davinci-003\",\"hub\":\"azure-openai\"} data={\"data_source\": \n",
-        "'NQ-open-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"question-answering\", \n",
+        "                  model={\"model\": \"text-davinci-003\",\"hub\":\"azure-openai\"}, \n",
+        "                  data={\"data_source\" :\"NQ-open\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
@@ -1802,12 +1768,16 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 10,
+      "execution_count": null,
       "metadata": {},
       "outputs": [],
       "source": [
-        "harness = Harness(task='summarization',model={\"model\": 'text-davinci-003', \"hub\": \"azure-openai\"}, data={\"data_source\": \n",
-        "'XSum-test-tiny'})"
+        "harness = Harness(\n",
+        "                  task=\"summarization\", \n",
+        "                  model={\"model\": \"text-davinci-003\",\"hub\":\"azure-openai\"}, \n",
+        "                  data={\"data_source\" :\"XSum\",\n",
+        "                        \"split\":\"test-tiny\"}\n",
+        "                  )"
       ]
     },
     {
@@ -1817,10 +1787,8 @@
       "source": [
         "We have specified task as Summarization, hub as Azure-OpenAI and model as text-davinci-003, text-davinci-002 whatever model available from azure openai services.\n",
         "\n",
-        "For dataset we used XSum-test-tiny which includes 50 lines from XSum-test. Available datasets for summarization are:\n",
         "\n",
-        "* `XSum-test`\n",
-        "* `XSum-test-tiny`"
+        "For dataset we used `XSum` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#summarization)"
       ]
     },
     {

diff --git a/demo/tutorials/llm_notebooks/Clinical_Tests.ipynb b/demo/tutorials/llm_notebooks/Clinical_Tests.ipynb
@@ -59,7 +59,7 @@
       "source": [
         "import os\n",
         "\n",
-        "os.environ[\"OPENAI_API_KEY\"] = <ADD OPEN-AI-KEY>\n"
+        "os.environ[\"OPENAI_API_KEY\"] = \"<ADD OPEN-AI-KEY>\""
       ]
     },
     {
@@ -127,6 +127,19 @@
         "\n"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### **Dataset** : **Clinical**\n",
+        "\n",
+        "**Data Splits**\n",
+        "\n",
+        "- `Medical-files` \n",
+        "- `Gastroenterology-files`\n",
+        "- `Oromaxillofacial-files`"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -173,7 +186,9 @@
       ],
       "source": [
         "model = {\"model\": \"text-davinci-003\", \"hub\": \"openai\"}\n",
-        "data = {\"data_source\": \"Medical-files\"}\n",
+        "\n",
+        "data = {\"data_source\": \"Clinical\", \"split\":\"Medical-files\"}\n",
+        "\n",
         "harness = Harness(task=\"clinical-tests\", model=model, data=data)"
       ]
     },
@@ -2619,7 +2634,11 @@
         }
       ],
       "source": [
-        "harness = Harness(task=\"clinical-tests\",model={\"model\": \"text-davinci-003\", \"hub\": \"openai\"},data = {\"data_source\": \"Gastroenterology-files\"})"
+        "model = {\"model\": \"text-davinci-003\", \"hub\": \"openai\"}\n",
+        "\n",
+        "data = {\"data_source\": \"Clinical\", \"split\":\"Gastroenterology-files\"}\n",
+        "\n",
+        "harness = Harness(task=\"clinical-tests\", model=model, data=data)"
       ]
     },
     {
@@ -4981,7 +5000,11 @@
         }
       ],
       "source": [
-        "harness = Harness(task=\"clinical-tests\", model={\"model\": \"text-davinci-003\", \"hub\": \"openai\"},data = {\"data_source\": \"Oromaxillofacial-files\"})"
+        "model = {\"model\": \"text-davinci-003\", \"hub\": \"openai\"}\n",
+        "\n",
+        "data = {\"data_source\": \"Clinical\", \"split\":\"Oromaxillofacial-files\"}\n",
+        "\n",
+        "harness = Harness(task=\"clinical-tests\", model=model, data=data)"
       ]
     },
     {