Merge pull request #413 from microsoft/staging

Staging
microsoft · Sep 18, 2019 · 8fb28e0 · 8fb28e0
2 parents b44c655 + cb60519
commit 8fb28e0
Show file tree

Hide file tree

Showing 19 changed files with 94 additions and 660 deletions.
diff --git a/README.md b/README.md
@@ -85,7 +85,9 @@ The following is a list of related repositories that we like and think are usefu
 
 
 ## Build Status
-| Build Type | Branch | Status |  | Branch | Status |
-| --- | --- | --- | --- | --- | --- |
-| **Linux CPU** | master | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/cpu_integration_tests_linux?branchName=master)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=50&branchName=master) | | staging | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/cpu_integration_tests_linux?branchName=staging)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=50&branchName=staging) |
-| **Linux GPU** | master | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/gpu_integration_tests_linux?branchName=master)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=51&branchName=master) | | staging | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/gpu_integration_tests_linux?branchName=staging)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=51&branchName=staging) |
+| Build | Branch | Status |
+| --- | --- | --- |
+| **Linux CPU** | master | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/cpu_integration_tests_linux?branchName=master)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=50&branchName=master) |
+| **Linux CPU** | staging | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/cpu_integration_tests_linux?branchName=staging)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=50&branchName=staging) |
+| **Linux GPU** | master | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/gpu_integration_tests_linux?branchName=master)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=51&branchName=master) |
+| **Linux GPU** | staging | [![Build Status](https://dev.azure.com/best-practices/nlp/_apis/build/status/gpu_integration_tests_linux?branchName=staging)](https://dev.azure.com/best-practices/nlp/_build/latest?definitionId=51&branchName=staging) |
diff --git a/examples/README.md b/examples/README.md
@@ -2,7 +2,6 @@
 
 This folder contains examples and best practices, written in Jupyter notebooks, for building Natural Language Processing systems for the following scenarios.
 
-
 |Category|Applications|Methods|Languages|
 |---| ------------------------ | ------------------- |---|
 |[Text Classification](text_classification)|Topic Classification|BERT, XLNet|en, hi, ar|
@@ -14,3 +13,11 @@ This folder contains examples and best practices, written in Jupyter notebooks,
 |[Annotation](annotation)|Text Annotation|Doccano||
 |[Model Explainability](model_explainability)|DNN Layer Explanation|DUUDNM (Guan et al.)|
 
+## Data/Telemetry
+The Azure Machine Learning notebooks collect browser usage data and send it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement).
+
+To opt out of tracking, please go to the raw `.ipynb` files and remove the following line of code (the URL will be slightly different depending on the file):
+
+```sh
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/text_classification/tc_bert_azureml.png)"
+```
diff --git a/examples/entailment/entailment_xnli_bert_azureml.ipynb b/examples/entailment/entailment_xnli_bert_azureml.ipynb
@@ -14,6 +14,13 @@
     "\n",
     "**Note: To learn how to do pre-training on your own, please reference the [AzureML-BERT repo](https://github.com/microsoft/AzureML-BERT) created by Microsoft.**"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/entailment/entailment_xnli_bert_azureml.png)"
+   ]
   },
   {
    "cell_type": "code",

diff --git a/examples/question_answering/bidaf_aml_deep_dive.ipynb b/examples/question_answering/bidaf_aml_deep_dive.ipynb
@@ -15,6 +15,13 @@
    "source": [
     "# BiDAF Model Deep Dive on AzureML"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/question_answering/bidaf_aml_deep_dive.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/question_answering/pretrained-BERT-SQuAD-deep-dive-aml.ipynb b/examples/question_answering/pretrained-BERT-SQuAD-deep-dive-aml.ipynb
@@ -16,6 +16,13 @@
     "# Question Answering: Fine-Tune BERT on AzureML (PyTorch)\n",
     "**BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding** [\\[1\\]](#References)"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/question_answering/pretrained_BERT_SQuAD_deep_dive_aml.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/question_answering/question_answering_system_bidaf_quickstart.ipynb b/examples/question_answering/question_answering_system_bidaf_quickstart.ipynb
@@ -15,6 +15,13 @@
     "), [BiDAF](https://www.semanticscholar.org/paper/Bidirectional-Attention-Flow-for-Machine-Seo-Kembhavi/007ab5528b3bd310a80d553cccad4b78dc496b02\n",
     "), using Azure Container Instances ([ACI](https://azure.microsoft.com/en-us/services/container-instances/))."
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/question_answering/bidaf_quickstart.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/sentence_similarity/automl_local_deployment_aci.ipynb b/examples/sentence_similarity/automl_local_deployment_aci.ipynb
@@ -15,6 +15,13 @@
    "source": [
     "# Local Automated Machine Learning Model with ACI Deployment for Predicting Sentence Similarity"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/sentence_similarity/automl_local_deployment_aci.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/sentence_similarity/automl_with_pipelines_deployment_aks.ipynb b/examples/sentence_similarity/automl_with_pipelines_deployment_aks.ipynb
@@ -15,6 +15,13 @@
    "source": [
     "# AzureML Pipeline, AutoML, AKS Deployment for Sentence Similarity"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/sentence_similarity/automl_with_pipelines_deployment_aks.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/sentence_similarity/bert_senteval.ipynb b/examples/sentence_similarity/bert_senteval.ipynb
@@ -6,6 +6,13 @@
    "source": [
     "# Parallel Experimentation with BERT on AzureML"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/sentence_similarity/bert_senteval.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/sentence_similarity/gensen_aml_deep_dive.ipynb b/examples/sentence_similarity/gensen_aml_deep_dive.ipynb
@@ -16,6 +16,13 @@
     "# Training GenSen on AzureML with SNLI Dataset\n",
     "**GenSen: Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning** [\\[1\\]](#References)"
    ]
+  },
+    {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/sentence_similarity/gensen_aml_deep_dive.png)"
+   ]
   },
   {
    "cell_type": "markdown",

diff --git a/examples/text_classification/tc_bert_azureml.ipynb b/examples/text_classification/tc_bert_azureml.ipynb
@@ -11,6 +11,13 @@
     "# Text Classification of MultiNLI Sentences using BERT with Azure ML Pipelines"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/nlp/examples/text_classification/tc_bert_azureml.png)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/text_classification/tc_mnli_bert.ipynb b/examples/text_classification/tc_mnli_bert.ipynb
@@ -60,8 +60,7 @@
     "import torch\n",
     "import torch.nn as nn\n",
     "\n",
-    "from utils_nlp.dataset.multinli import load_pandas_df\n",
-    "from utils_nlp.eval.classification import eval_classification\n",
+    "from utils_nlp.dataset.multinli import load_pandas_df\n",    
     "from utils_nlp.models.bert.sequence_classification import BERTSequenceClassifier\n",
     "from utils_nlp.models.bert.common import Language, Tokenizer\n",
     "from utils_nlp.common.timer import Timer"

diff --git a/tests/integration/test_notebooks_text_classification.py b/tests/integration/test_notebooks_text_classification.py
@@ -49,18 +49,19 @@ def test_tc_dac_bert_ar(notebooks, tmp):
             NUM_GPUS=1,
             DATA_FOLDER=tmp,
             BERT_CACHE_DIR=tmp,
-            BATCH_SIZE=32,
+            MAX_LEN=175,
+            BATCH_SIZE=16,
             NUM_EPOCHS=1,
             TRAIN_SIZE=0.8,
-            NUM_ROWS=15000,
+            NUM_ROWS=8000,
             RANDOM_STATE=0,
         ),
     )
     result = sb.read_notebook(OUTPUT_NOTEBOOK).scraps.data_dict
-    assert pytest.approx(result["accuracy"], 0.93, abs=ABS_TOL)
-    assert pytest.approx(result["precision"], 0.91, abs=ABS_TOL)
-    assert pytest.approx(result["recall"], 0.91, abs=ABS_TOL)
-    assert pytest.approx(result["f1"], 0.91, abs=ABS_TOL)
+    assert pytest.approx(result["accuracy"], 0.871, abs=ABS_TOL)
+    assert pytest.approx(result["precision"], 0.865, abs=ABS_TOL)
+    assert pytest.approx(result["recall"], 0.852, abs=ABS_TOL)
+    assert pytest.approx(result["f1"], 0.845, abs=ABS_TOL)
 
 
 @pytest.mark.gpu

diff --git a/tools/repo_metrics/README.md b/tools/repo_metrics/README.md
diff --git a/tools/repo_metrics/__init__.py b/tools/repo_metrics/__init__.py
diff --git a/tools/repo_metrics/config_template.py b/tools/repo_metrics/config_template.py