Skip to content
Gregor Leban edited this page Jun 23, 2023 · 2 revisions

There are several ways in which keywords can be treated when used in the search and in this page we will describe the differences between them. These options are called search modes and there are three of them: phrase, exact and simple search mode.

Phrase search mode (default)

When using the QueryArticles or QueryArticlesIter and specifying one or more keywords in your search, you are by default using the phrase search mode. What this means is, that when you, for example search using

query = QueryArticlesIter(keywords = QueryItems.OR(["Apple iPhone", "Microsoft Store]))

you will receive articles that mention a phrase "Apple iPhone" (together, as consecutive words) or a phrase "Microsoft Store". Articles that mention Apple and iPhone individually, not as a phrase, will not be returned.

The same kind of query can also be specified using the advanced query language. In that case, the query would look like this:

q = {
    "$query": {
        "$or": [
            { "keyword": "Apple iPhone" },
            { "keyword": "Microsoft Store" }
        ]
    }
}
query = QueryArticlesIter.initWithComplexQuery(q)

As we can see, we are using the $or operator and specify two objects with keyword conditions.

Exact search mode

In some cases we would like to be able to specify in a single sentence a more complicated keyword search condition. For those cases, you can use the Exact search mode. In this case, you can specify the same search like this:

query = QueryArticlesIter(
    keywords = "Apple iPhone OR Microsoft Store",
    keywordSearchMode = "exact")

In this case, we have provided both phrases in a single string, that has an OR between the two phrases. In exact search mode, all consecutive words that are not AND, OR, NEXT, NEAR or NOT will be considered as a part of a phrase that you are searching for. If you would want to search for articles that potentially mention Apple and iPhone in different parts of the article, the query could be modified like this:

query = QueryArticlesIter(
    keywords = "Apple AND iPhone OR Microsoft Store",
    keywordSearchMode = "exact")

By using the AND, we are now requesting that the articles should mention both Apple and iPhone, but not necessarily as a phrase.

If you'd like to use the advanced query language, you can specify that you want to use exact search mode like this:

q = {
    "$query": {
        {
            "keyword": "Apple iPhone OR Microsoft Store"
            "keywordSearchMode": "exact"
        }
    }
}
query = QueryArticlesIter.initWithComplexQuery(q)

Operators NEXT/X and NEAR/X

The best new feature of using the exact search mode is the possibility of using two additional operators - NEXT/X and NEAR/X, where X is a user set number.

In many use cases, we want to find articles where two keywords are mentioned close together, possibly in the same sentence. Closeness often implies that the words are related to each other. If we would be interested in learning about what Siemens is doing in terms of sustainability, ecology or renewable energy, we could specify the keyword parameter as:

"Siemens NEAR/15 (sustainability or ecology or renewable energy)"

In this case, the resulting articles would mention Siemens and any of the three keywords at most 15 word before or after Siemens. Instead of 15 you can of course use any other number.

Alternatively, if the order of words is important, you can use the NEXT/X operator. If NEXT would be used in the previous example, then the only returned articles would be those where Siemens is mentioned first and then any of those three words is mentioned at most 15 words later.

Operator precedence

Because you can use multiple operators in a single search, it is important for you to understand which operators have precedence over each other. The importance of operators is defined like this:

NEAR/x, NEXT/x > NOT > AND > OR

Specifying the keyword parameter like this:

"Donald Trump NEAR/10 tariff OR recession AND China NOT Mexico"

is therefore equivalent to this query:

(Donald Trump NEAR/10 tariff) OR (recession AND (China NOT Mexico))

To force different precedence you should group items using parentheses. More desirable results for the above query would likely be obtained by specifying it as such:

Donald Trump NEAR/10 (tariff OR recession) AND China NOT Mexico

In this case, we would first find results that mention tariff or recession and then find the subset of results that are close to the phrase Donald Trump.

Simple search mode

In some use cases you also have a more "Google-like" search task, where you have a bunch of keywords and you want to find results that best match the query. In this case, you don't require that all the keywords are mentioned in the text or that they appear in any particular order - you simply want results that mention as many of those keywords and as many times as possible.

An example of a search using simply search mode would be:

q = QueryArticlesIter(
    keywords = "tesla self driving car number of fatal accidents",
    keywordSearchMode = "simple")

Alternatively, using the advanced query language, you would specify the same query like this:

q = {
    "$query": {
        {
            "keyword": "tesla self driving car number of fatal accidents",
            "keywordSearchMode": "simple"
        }
    }
}
query = QueryArticlesIter.initWithComplexQuery(q)

When using the simple search mode, make sure that you sort results by relevance (rel) so that you get first the results that are best match to the provided list of keywords. If you want to find recent content, then simply add the dateStart parameter to limit the results to latest news.

for art in q.execQuery(er, sortBy = "rel", maxItems = 500):
    print(art)