Skip to content

Searching for events

Gregor Leban edited this page Apr 10, 2017 · 27 revisions

In order to search for events in Event Registry we provide two classes - QueryEvents and QueryEventsIter. Both classes can be used to find events using a set of various types of search conditions.

The class QueryEventsIter is meant to obtain an iterator, that makes it easy to iterate over all events that match the search conditions. Alternatively, the QueryEvents class can be used to obtain a broader range of information about the matching events in various forms. In case of QueryEvents, the results can be not only the list of events, but also a timeline distribution of the matching events over time, distribution of matching events into predefined categories, list of top concepts in the matching events, etc.

The returned information about events follows the Event data model.

QueryEventsIter

Example of usage

Before describing the class, here is a simple full example that prints the list of all events related to Barack Obama, from the latest to the oldest:

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryEventsIter(conceptUri = er.getConceptUri("Obama"))
for event in q.execQuery(er, sortBy = "date"):
    print event

Constructor

QueryEventsIter is a derived class from QueryEvents. It's constructor can accept the following arguments:

QueryEventsIter(keywords = "",
        conceptUri = [],
        sourceUri = [],
        locationUri = [],
        categoryUri = [],
        lang = [],
        dateStart = "",
        dateEnd = "",
        minArticlesInEvent = 0,
        maxArticlesInEvent = sys.maxint,
        dateMentionStart = "",
        dateMentionEnd = "",
        ignoreKeywords = "",
        ignoreConceptUri = [],
        ignoreLocationUri = [],
        ignoreSourceUri = [],
        ignoreCategoryUri = [],
        ignoreLang = [],
        categoryIncludeSub = True,
        ignoreCategoryIncludeSub = True)

The parameters for which you don't specify a value will be ignored. In order for the query to be valid (=it can be executed by Event Registry), it has to have at least one positive condition (conditions that start with ignore* do not count as positive conditions). The meaning of the arguments is as follows:

  • keywords: find events where articles mention all the specified keywords. In case of multiple keywords, separate them with space. Example: "apple iphone".
  • conceptUri: find events where the concept with the specified concept URI is important. A single concept URI can be provided as a string, while multiple concept URIs can be provided as a list of strings. If multiple concept URIs are provided, resulting events have to be about all of them. To obtain a concept URI based on a (partial) concept label use EventRegistry.getConceptUri().
  • sourceUri: find events that contain one or more articles that have been written by a news source sourceUri. If multiple sources are provided, resulting events have to be contain articles from all provided sources. Source URI for a given (partial) news source name can be obtained using EventRegistry.getNewsSourceUri().
  • locationUri: find events that occured at a particular location. Location URI can either be a city or a country. If multiple locations are provided, resulting events have to match any of the locations. Location URI for a given name can be obtained using EventRegistry.getLocationUri().
  • categoryUri: find events that are assigned into a particular category. If multiple categories are provided, resulting events have to be assigned to any of the categories. A category URI can be obtained based on a (partial) category name using EventRegistry.getCategoryUri().
  • lang: find events for which we found articles in the specified language. If more than one language is specified, resulting events has to be reported in any of the languages.
  • dateStart: find events that occured on or after dateStart. Date should be provided in YYYY-MM-DD format, datetime.time or datetime.datetime.
  • dateEnd: find events that occured before or on dateEnd. Date should be provided in YYYY-MM-DD format, datetime.time or datetime.datetime.
  • minArticlesInEvent: find events that have been reported in at least minArticlesInEvent articles (regardless of language).
  • maxArticlesInEvent: find events that have not been reported in more than maxArticlesInEvent articles (regardless of language).
  • dateMentionStart: find events where articles explicitly mention a date that is equal or greater than dateMentionStart.
  • dateMentionEnd: find events where articles explicitly mention a date that is lower or equal to dateMentionEnd.
  • ignoreKeywords: ignore events where articles about the event mention all provided keywords.
  • ignoreConceptUri: ignore events that are about all provided concepts.
  • ignoreLang: ignore events that are reported in any of the provided languages.
  • ignoreLocationUri: ignore events that occured in any of the provided locations. A location can be a city or a place
  • ignoreSourceUri: ignore events that have have articles which have been written by all specified news sources
  • categoryIncludeSub: when a category is specified using categoryUri, should also all subcategories be included?
  • ignoreCategoryIncludeSub: when a category is specified using ignoreCategoryUri, should also all subcategories be included?

Methods

The class QueryEventsIter has two main methods: count() and execQuery().

count(er) simply returns the number of events that match the specified conditions. er is the instance of the EventRegistry class.

execQuery method has the following format:

execQuery(er,
        sortBy = "rel",
        sortByAsc = False,
        returnInfo = ReturnInfo(),
        eventBatchSize = 200)
  • er: instance of the EventRegistry class.
  • sortBy: sets the order in which the resulting events are sorted, before returning. Options: date (by event date), rel (relevance to the query), size (number of articles), socialScore (amount of shares in social media).
  • sortByAsc: should the results be sorted in ascending order.
  • returnInfo: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details.
  • eventBatchSize: the parameter determines in how large batches will the events be downloaded.

QueryEvents

Example of usage

Before describing the QueryEvents() class and the event details that can be requested, let's look at an example of how it can be used:

from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryEvents()
# get events related to Barack Obama
q.addConcept(er.getConceptUri("Obama"))
# that have been reported also by BBC
q.addNewsSource(er.getNewsSourceUri("bbc"))
# return top 5 locations and organizations mentioned the most in these events
q.addRequestedResult(RequestEventsConceptAggr(conceptCount = 5,
    returnInfo = ReturnInfo(conceptInfo = ConceptInfoFlags(type = ["org", "loc"]))))
# execute the query
res = er.execQuery(q)

Constructor

QueryEvents constructor accepts the following arguments:

QueryEvents(keywords = "",
        conceptUri = [],
        sourceUri = [],
        locationUri = [],
        categoryUri = [],
        lang = [],
        dateStart = "",
        dateEnd = "",
        minArticlesInEvent = 0,
        maxArticlesInEvent = sys.maxint,
        dateMentionStart = "",
        dateMentionEnd = "",
        ignoreKeywords = "",
        ignoreConceptUri = [],
        ignoreLocationUri = [],
        ignoreSourceUri = [],
        ignoreCategoryUri = [],
        ignoreLang = [],
        categoryIncludeSub = True,
        ignoreCategoryIncludeSub = True)

The parameters for which you don't specify a value will be ignored. In order for the query to be valid (=it can be executed by Event Registry), it has to have at least one positive condition (conditions that start with ignore* do not count as positive conditions). The meaning of the arguments is as follows:

  • keywords: find events where articles mention all the specified keywords. In case of multiple keywords, separate them with space. Example: "apple iphone".
  • conceptUri: find events where the concept with the specified concept URI is important. A single concept URI can be provided as a string, while multiple concept URIs can be provided as a list of strings. If multiple concept URIs are provided, resulting events have to be about all of them. To obtain a concept URI based on a (partial) concept label use EventRegistry.getConceptUri().
  • sourceUri: find events that contain one or more articles that have been written by a news source sourceUri. If multiple sources are provided, resulting events have to be contain articles from all provided sources. Source URI for a given (partial) news source name can be obtained using EventRegistry.getNewsSourceUri().
  • locationUri: find events that occured at a particular location. Location URI can either be a city or a country. If multiple locations are provided, resulting events have to match any of the locations. Location URI for a given name can be obtained using EventRegistry.getLocationUri().
  • categoryUri: find events that are assigned into a particular category. If multiple categories are provided, resulting events have to be assigned to any of the categories. A category URI can be obtained based on a (partial) category name using EventRegistry.getCategoryUri().
  • lang: find events for which we found articles in the specified language. If more than one language is specified, resulting events has to be reported in any of the languages.
  • dateStart: find events that occured on or after dateStart. Date should be provided in YYYY-MM-DD format, datetime.time or datetime.datetime.
  • dateEnd: find events that occured before or on dateEnd. Date should be provided in YYYY-MM-DD format, datetime.time or datetime.datetime.
  • minArticlesInEvent: find events that have been reported in at least minArticlesInEvent articles (regardless of language).
  • maxArticlesInEvent: find events that have not been reported in more than maxArticlesInEvent articles (regardless of language).
  • dateMentionStart: find events where articles explicitly mention a date that is equal or greater than dateMentionStart.
  • dateMentionEnd: find events where articles explicitly mention a date that is lower or equal to dateMentionEnd.
  • ignoreKeywords: ignore events where articles about the event mention all provided keywords.
  • ignoreConceptUri: ignore events that are about all provided concepts.
  • ignoreLang: ignore events that are reported in any of the provided languages.
  • ignoreLocationUri: ignore events that occured in any of the provided locations. A location can be a city or a place
  • ignoreSourceUri: ignore events that have have articles which have been written by all specified news sources
  • categoryIncludeSub: when a category is specified using categoryUri, should also all subcategories be included?
  • ignoreCategoryIncludeSub: when a category is specified using ignoreCategoryUri, should also all subcategories be included?

Methods

QueryEvents class provides additional methods that can be used to specify relevant query information. Conditons can also be added using methods such as addConcept(conceptUri), addLocation(locationUri), addCategory(categoryUri), addNewsSource(sourceUri), addKeyword(keyword) and setDateLimit(startDate, endDate).

setEventUriList(uriList) is a special method where you can specify the set of event URIs that you want to use as the result. In this case, all query conditions are ignored and this set is used as the resulting set. All the return information about the events will be based on this set of events.

Returned information

When executing the query, there will be a set of events that will match the specified criteria. What information about these events is to be returned however still needs to be determined. Do you want to get the list of matching events? Maybe just the timeline when they happened? Maybe information where they happened?

The information to be returned by the matching events is determined by calling the addRequestedResult method. The addRequestedResult accepts as an argument an instance that has a base class RequestEvents. Below are the classes that can be specified in the addRequestedResult calls:

RequestEventsInfo

RequestEventsInfo(page = 1,
        count = 20,
        sortBy = "date", sortByAsc = False,
        returnInfo = ReturnInfo())

RequestEventsInfo class provides detailed information about the resulting events.

  • page: determines the page of the results to return (starting from 1).
  • count: determines the number of events to return. Max events that can be returned per call is 200.
  • sortBy: sets the order in which the resulting events are first sorted, before returning. Options: date (by event date), rel (relevance to the query), size (number of articles), socialScore (amount of shares in social media).
  • sortByAsc: should the results be sorted in ascending order.
  • returnInfo: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details.

RequestEventsUriList

RequestEventsUriList returns a simple list of event uris that match criteria. Useful if you wish to obtain the full list in a single query.

RequestEventsTimeAggr

RequestEventsTimeAggr computes how the resulting events are distributed over time.

RequestEventsKeywordAggr

RequestEventsKeywordAggr returns the keywords that summarize the best the resulting events.

RequestEventsLocAggr

RequestEventsLocAggr returns the information about the locations of the resulting events.

RequestEventsLocTimeAggr

RequestEventsLocTimeAggr provides combined details about the location and time of the resulting events.

RequestEventsConceptGraph

RequestEventsConceptGraph(conceptCount = 25,
        linkCount = 50,
        eventsSampleSize = 500,
        returnInfo = ReturnInfo())

RequestEventsConceptGraph returns a graph of concepts. Concepts are connected if they frequently occuring in the same events.

  • conceptCount: number of top concepts (nodes) to return.
  • linkCount: number of edges in the graph.
  • eventsSampleSize: on what sample of events should the graph be computed.
  • returnInfo: the details about the types of return data to include. See details.

RequestEventsConceptMatrix

RequestEventsConceptMatrix(conceptCount = 25,
        measure = "pmi",
        eventsSampleSize = 500,
        returnInfo = ReturnInfo())

RequestEventsConceptGraph computes a matrix of concepts and their dependencies. For individual concept pairs it returns how frequently they co-occur in the resulting events and how "surprising" this is, based on the frequency of individual concepts.

  • measure: the measure to be used for computing the "surprise factor". Options: pmi (pointwise mutual information), pairTfIdf (pair frequence * IDF of individual concepts), chiSquare.
  • eventsSampleSize: on what sample of events should the graph be computed.
  • returnInfo: the details about the types of return data to include. See details.

RequestEventsConceptTrends

RequestEventsConceptTrends(conceptCount = 10,
        returnInfo = ReturnInfo())

RequestEventsConceptTrends provides a list of most popular concepts in the results and how they daily trend over time.

  • conceptCount: number of top concepts to return.

RequestEventsSourceAggr

RequestEventsSourceAggr(sourceCount = 30,
        returnInfo = ReturnInfo())

RequestEventsSourceAggr provides a list of top news sources that have written the most articles in the resulting events.

  • sourceCount: number of top news sources to return.
  • returnInfo: the details about the types of return data to include. See details.

RequestEventsDateMentionAggr

RequestEventsDateMentionAggr provides information about the dates that have been found mentioned in the resulting events.

RequestEventsEventClusters

RequestEventsEventClusters(keywordCount = 30,
        maxEventsToCluster = 10000,
        returnInfo = ReturnInfo())

RequestEventsEventClusters clusters the resulting events based on the event concepts. The resulting events are clustered repeatedly using 2-means clustering in order to provide a hierarchical view of the data.

  • keywordCount: sets the number of keywords that will be returned for each cluster on each level.
  • maxEventsToCluster: determines the maximum number of events that will be clustered.

RequestEventsCategoryAggr

RequestEventsCategoryAggr returns information about what categories are the resulting events about.

Advanced query language

For some users, simply providing a list of conepts, keywords, sources etc. is not sufficient and a more complex way of providing a query is required. For such purposes we provide a query language where conditions can be specified in particular JSON object, that resembles the query language used by the MongoDB. The grammar for the language is as follows:

ComplexEventQuery ::=
{
	"include": CombinedQuery | BaseQuery,
	"exclude": null | CombinedQuery | BaseQuery
}

CombinedQuery ::=
{
        "$or": [ CombinedQuery | BaseQuery, ... ]
}

CombinedQuery ::=
{
	"$and": [ CombinedQuery | BaseQuery, ...	]
}

BaseQuery ::=
{
	"conceptUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"keyword": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"categoryUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"sourceUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"sourceLocationUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"locationUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"lang": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
	"startDate": null | string,
	"endDate": null | string,
	"dateMention": null | [string, ...]

	"minArticlesInEvent": null | int,
	"maxArticlesInEvent": null | int
}

Using this language you can specify a query that is not possible to express using the constructor parameters in QueryEvents or QueryEventsIter. Here are some examples of queries and what they would return:

A query that would return the list of events that are about Twitter or have been covered by Techcrunch or Arstechnica.

{
        "include": {
                "$or": [
                        { "conceptUri": "http://en.wikipedia.org/wiki/Twitter" },
                        {
                                "sourceUri": {
                                        "$or": ["techcrunch.com", "arstechnica.com"]
                                }
                        }
                ]
        }
}

A query that finds events that happened in London or Berlin and were not reported in Arabic or Spanish language:

{
        "include": {
                "locationUri": [
                        "http://en.wikipedia.org/wiki/London",
                        "http://en.wikipedia.org/wiki/Berlin"
                ]
        },
        "exclude": {
                "lang": {
                        "$or": ["ara", "spa"]
                }
        }
}

Depending on your preference, you can build such JSONs for these complex queries yourself or you can use the associated classes such as ComplexEventQuery(), CombinedQuery() and BaseQuery(). Below is an example where we search for events that are either about Donald Trump or are in the Politics category, but did not occur in February 2017 or mention Barack Obama:

er = EventRegistry()
trumpUri = er.getConceptUri("Trump")
obamaUri = er.getConceptUri("Obama")
politicsUri = er.getCategoryUri("politics")
cq = ComplexEventQuery(
        includeQuery = CombinedQuery.OR([
                BaseQuery(conceptUri = trumpUri),
                BaseQuery(categoryUri = politicsUri)
        ]),
        excludeQuery = CombinedQuery.OR([
                BaseQuery(dateStart = "2017-02-01", dateEnd = "2017-02-28"),
                BaseQuery(conceptUri = obamaUri)]
        ))
q = QueryEvents.initWithComplexQuery(cq)
q.setRequestedResult(RequestEventsInfo())
res = self.er.execQuery(q)

If you've built the JSON query yourself, you can also use like this:

er = EventRegistry()
q = QueryEvents.initWithComplexQuery("{ 'include': { ... } }")
q.setRequestedResult(RequestEventsInfo())
res = self.er.execQuery(q)

In this case you need to make sure you're providing a valid query in the JSON.