Matcher

Check token patterns for spaCy's rule-based Matcher against a text and return the matches in the text, as well as the individual tokens and whether they're part of a match. For a usage example, see the Rule-based Matcher Explorer demo.

Installation

pip install -r requirements.txt
python app.py

API

GET `/models`

Get a list of available models and their human-readable name, keyed by model name.

Example response

{
    "en_core_web_sm": "English - en_core_web_sm (v2.0.0)"
}

POST `/match`

Match a pattern and return the matches and tokens.

Example request

{
    "text": "A match is a tool for starting a fire. Typically, modern matches are made of small wooden sticks or stiff paper. ",
    "model":"en_core_web_sm",
    "pattern": [
        {
            "POS": "ADJ",
            "OP": "?"
        },
        {
            "LEMMA": "match",
            "POS": "NOUN"
        },
        {
            "LEMMA": "be"
        }
    ]
}

Name	Type	Description
`text`	string	The text to match on.
`model`	string	The statistical model to use for tokenization.
`pattern`	list	The token pattern to match. Each object in the list describes one token and is keyed by token attributes.

Example response

{
    "matches": [
        {
            "start": 2,
            "end": 10,
            "label": "MATCH"
        },
        {
            "start": 50,
            "end": 68,
            "label": "MATCH"
        }
    ],
    "tokens": [
        {
            "start": 0,
            "end": 1,
            "label": "TOKEN"
        },
        {
            "start": 2,
            "end": 7,
            "label": "MATCH"
        },
        {
            "start": 8,
            "end": 10,
            "label": "MATCH"
        },
        {
            "start": 11,
            "end": 12,
            "label": "TOKEN"
        },
        {
            "start": 13,
            "end": 17,
            "label": "TOKEN"
        },
        {
            "start": 18,
            "end": 21,
            "label": "TOKEN"
        },
        {
            "start": 22,
            "end": 30,
            "label": "TOKEN"
        },
        {
            "start": 31,
            "end": 32,
            "label": "TOKEN"
        },
        {
            "start": 33,
            "end": 37,
            "label": "TOKEN"
        },
        {
            "start": 37,
            "end": 38,
            "label": "TOKEN"
        },
        {
            "start": 39,
            "end": 48,
            "label": "TOKEN"
        },
        {
            "start": 48,
            "end": 49,
            "label": "TOKEN"
        },
        {
            "start": 50,
            "end": 56,
            "label": "MATCH"
        },
        {
            "start": 57,
            "end": 64,
            "label": "MATCH"
        },
        {
            "start": 65,
            "end": 68,
            "label": "MATCH"
        },
        {
            "start": 69,
            "end": 73,
            "label": "TOKEN"
        },
        {
            "start": 74,
            "end": 76,
            "label": "TOKEN"
        },
        {
            "start": 77,
            "end": 82,
            "label": "TOKEN"
        },
        {
            "start": 83,
            "end": 89,
            "label": "TOKEN"
        },
        {
            "start": 90,
            "end": 96,
            "label": "TOKEN"
        },
        {
            "start": 97,
            "end": 99,
            "label": "TOKEN"
        },
        {
            "start": 100,
            "end": 105,
            "label": "TOKEN"
        },
        {
            "start": 106,
            "end": 111,
            "label": "TOKEN"
        },
        {
            "start": 111,
            "end": 112,
            "label": "TOKEN"
        }
    ]
}

Name	Type	Description
`matches`	list	The matches in the text.
`tokens`	list	The individual tokens in the text and whether they're part of a match.
`start`	number	Character offset the match or token starts on.
`end`	number	Character offset the match or token ends after.
`label`	string	`"MATCH"` for matched span, `"TOKEN"` for token span.

Usage Example (JavaScript)

function getMatches(text, model, pattern) {
    const options = {
        method: 'POST',
        headers: { 'Accept': 'application/json', 'Content-Type': 'application/json' },
        credentials: 'same-origin',
        body: JSON.stringify({ text, model, pattern })
    };
    fetch('/match', options)
        .then(res => res.json())
        .then(({ tokens, matches }) => {
            console.log(tokens, matches);
        });
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Matcher

Installation

API

GET `/models`

Example response

POST `/match`

Example request

Example response

Usage Example (JavaScript)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Matcher

Installation

API

GET /models

Example response

POST /match

Example request

Example response

Usage Example (JavaScript)

GET `/models`

POST `/match`