Analyzing NLU model results

The NLU.DevOps CLI tool includes a sub-command that allows you to compare results returned from an NLU model via the test command with the expected results.

Getting Started

Run the following command:

dotnet nlu compare -e utterances.json -a results.json

The utterances.json argument is the path to the "expected" utterances file, usually the file path you supplied to the test command. The results.json is the path to the to the output utterances from a test command (see Testing an NLU model for more details). The two files must have the same number of utterances in the exact same order, which will be the case if you supply the same utterances.json to the compare command as you supplied to test.

The compare sub-command will generate sensitivity and specifity test results for the text, intents, and entities in the two files. I.e., it will identify true positives, true negatives, false positives, and false negatives for text, intents, entities and entity values.

For example, if you use the training cases supplied in Training an NLU model and the test cases supplied in Testing an NLU model on LUIS, you will recall that we had one resulting intent that was incorrectly labeled with the "None" intent. In this case, the compare command will generate passing tests (either true positive or true negative) for all text, intents, and entities, except for the one mismatched case. The mismatched case will generate a single failing test result, labeled as a false negative intent. Here is the specific output:

Test Discovery
  Start time: 2018-12-12 21:51:23Z
    End time: 2018-12-12 21:51:24Z
    Duration: 0.735 seconds

Errors, Failures and Warnings

1) Failed : NLU.DevOps.ModelPerformance.Tests.Tests.FalseNegativeIntent('PlayMusic', 'listen to hip hop')
Actual intent is 'None', expected 'PlayMusic'
   at NLU.DevOps.ModelPerformance.Tests.Tests.Fail(String because) in c:\src\NLU.DevOps\src\NLU.DevOps.ModelPerformance.Tests\Tests.cs:line 22

Run Settings
    Number of Test Workers: 4
    Work Directory: C:\src\sandbox\nlu-demo
    Internal Trace: Off

Test Run Summary
  Overall result: Failed
  Test Count: 18, Passed: 17, Failed: 1, Warnings: 0, Inconclusive: 0, Skipped: 0
    Failed Tests - Failures: 1, Errors: 0, Invalid: 0
  Start time: 2018-12-12 21:51:24Z
    End time: 2018-12-12 21:51:24Z
    Duration: 0.224 seconds

How text is compared

Currently, we do not have any way of overriding the string comparison logic used when comparing expected text versus actual text in the utterances. For now, the text is compared using StringComparison.OrdinalIngoreCase after all whitespace is normalized and punctuation is removed.

How entities are compared

Currently, we do not have any way of overriding the string comparison logic used when comparing expected entity text versus actual entity text in the utterances. For now, the entity text is compared using StringComparison.OrdinalIngoreCase after all whitespace is normalized and punctuation is removed. If the NLU provider does not specify the matchText in the actual entity, as is the case for Lex and Dialogflow, the entityValue is used to find a matching entity with the same value in either the matchText or entityValue in the expected entity.

Why have separate test cases for entity values?

The generic utterances model includes an entityValue property, which is the semantic or canonical form of the entity. Often times, it's useful enough to know that the NLU model identifies a match for the entity text in the utterance, so we created a separate test case type that compares the expected entity value with the actual entity value, in cases where this property is expected. We assert that the actual entity value contains the JSON subtree specified in the expected entity value.

Generating JSON test metadata

The compare command by default generates NUnit test output. You can also specify the --metadata flag to generate JSON output, which is currently used by the publishNLUResults input for the NLUTest Azure DevOps task. If the option is used, the CLI command will write a metadata.json file containing a JSON array of confusion matrix results and a statistics.json file containing a JSON object with counts for confusion matrix results, including results sliced by intent and entity type. The files will be written to the folder provided in the --output option, or the current working directory if not provided. See the example output for metadata.json and statistics.json for more details about the schema.

Detailed Usage

`-e, --expected`

The path to the expected labeled utterances.

Usually, this is the same file you supplied to the test command.

`-a, --actual`

The path to the result utterances from the test command.

Be sure to use the --output option when running the test command.

`-o, --output-folder`

(Optional) The path to write the NUnit test results. If not provided, the current working directory is used.

`-l, --label`

(Optional) A prefix for the test case names, in cases where you may want to publish multiple test runs for different options (e.g., simultaneously test text utterances and speech).

`-m, --metadata`

(Optional) Specified whether confusion matrix metadata should be generated in addition to NUnit test output. See Generating JSON test metadata for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare.md

Compare.md

Analyzing NLU model results

Getting Started

How text is compared

How entities are compared

Why have separate test cases for entity values?

Generating JSON test metadata

Detailed Usage

`-e, --expected`

`-a, --actual`

`-o, --output-folder`

`-l, --label`

`-m, --metadata`

Files

Compare.md

Latest commit

History

Compare.md

File metadata and controls

Analyzing NLU model results

Getting Started

How text is compared

How entities are compared

Why have separate test cases for entity values?

Generating JSON test metadata

Detailed Usage

-e, --expected

-a, --actual

-o, --output-folder

-l, --label

-m, --metadata

`-e, --expected`

`-a, --actual`

`-o, --output-folder`

`-l, --label`

`-m, --metadata`