-
Notifications
You must be signed in to change notification settings - Fork 54
Searching for events
In order to search for events in Event Registry we provide two classes - QueryEvents
and QueryEventsIter
. Both classes can be used to find events using a set of various types of search conditions.
The class QueryEventsIter
is meant to obtain an iterator, that makes it easy to iterate over all events that match the search conditions. Alternatively, the QueryEvents
class can be used to obtain a broader range of information about the matching events in various forms. In case of QueryEvents
, the returned results can also be a timeline distribution of the matching events over time, distribution of matching events into predefined categories, list of top concepts in the matching events, etc.
The returned information about events follows the Event data model.
Example of usage
Before describing the class, here is a simple full example that prints the list of all events related to Barack Obama, from the latest to the oldest:
from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryEventsIter(conceptUri = er.getConceptUri("Obama"))
for event in q.execQuery(er, sortBy = "date"):
print event
Constructor
QueryEventsIter
is a derived class from QueryEvents
. It's constructor can accept the following arguments:
QueryEventsIter(keywords = None,
conceptUri = None,
categoryUri = None,
sourceUri = None,
sourceLocationUri = None,
sourceGroupUri = None,
authorUri = None,
locationUri = None,
lang = None,
dateStart = None,
dateEnd = None,
minArticlesInEvent = 0,
maxArticlesInEvent = sys.maxint,
dateMentionStart = None,
dateMentionEnd = None,
ignoreKeywords = None,
ignoreConceptUri = None,
ignoreCategoryUri = None,
ignoreSourceUri = None,
ignoreSourceLocationUri = None,
ignoreSourceGroupUri = None,
ignoreAuthorUri = None,
ignoreLocationUri = None,
ignoreLang = None,
keywordsLoc = "body"
ignoreKeywordsLoc = "body",
requestedResult = None)
The parameters for which you don't specify a value will be ignored. In order for the query to be valid (=it can be executed by Event Registry), it has to have at least one positive condition (conditions that start with ignore*
do not count as positive conditions). The meaning of the arguments is as follows:
-
keywords
: find events where articles mention the specified keywords. A single keyword/phrase can be provided as a string, multiple keywords/phrases can be provided as a list of strings. UseQueryItems.AND()
if all provided keywords/phrases should be mentioned, orQueryItems.OR()
if any of the keywords/phrases should be mentioned. -
conceptUri
: find events where the concept with the specified concept URI is important. A single concept uri can be provided as a string, multiple concept uris can be provided as a list of strings. UseQueryItems.AND()
if all provided concepts should be mentioned, orQueryItems.OR()
if any of the concepts should be mentioned. To obtain a concept URI based on a (partial) concept label useEventRegistry.getConceptUri()
. -
categoryUri
: find events that are assigned into a particular category. A single category URI can be provided as a string, multiple category URIs can be provided as a list of strings. UseQueryItems.AND()
if all provided categories should be mentioned, orQueryItems.OR()
if any of the categories should be mentioned. A category URI can be obtained based on a (partial) category name usingEventRegistry.getCategoryUri()
. -
sourceUri
: find events that contain one or more articles that have been written by a news sourcesourceUri
. If multiple sources should be considered useQueryItems.OR()
if any of the sources should report about the event, orQueryItems.AND()
if all of them need to report about that event. Source URI for a given (partial) news source name can be obtained usingEventRegistry.getNewsSourceUri()
. -
sourceLocationUri
: find events that were reported by news sources located in the given geographic location. If multiple source locations are provided, then put them into a list insideQueryItems.OR()
. Location URI can either be a city or a country. Location URI for a given (partial) name can be obtained usingEventRegistry.getLocationUri()
. -
sourceGroupUri
: find events that were reported by news sources that are assigned to the predefined source group(s). If multiple source groups are provided, then put them into a list insideQueryItems.OR()
. Source group URI for a given name can be obtained usingEventRegistry.getSourceGroupUri()
. -
authorUri
: find events that contain authors, which were written by a specific author. If multiple sources should be considered useQueryItems.OR()
if any of the authors should report about the event, orQueryItems.AND()
if all of them need to report about that event. To obtain the author URI based on a (partial) author name and potentially source domain name useEventRegistry.getAuthorUri()
. -
locationUri
: find events that occured at a particular location. Location URI can either be a city or a country. If multiple locations are provided, resulting events have to match any of the locations. Location URI for a given name can be obtained usingEventRegistry.getLocationUri()
. -
lang
: find events for which we found articles in the specified language. If more than one language is specified, resulting events has to be reported in any of the languages. See supported languages for the list of language codes to use. -
dateStart
: find events that occured on or afterdateStart
. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
dateEnd
: find events that occured before or ondateEnd
. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
minArticlesInEvent
: find events that have been reported in at leastminArticlesInEvent
articles (regardless of language). -
maxArticlesInEvent
: find events that have not been reported in more thanmaxArticlesInEvent
articles (regardless of language). -
dateMentionStart
: find events where articles explicitly mention a date that is equal or greater thandateMentionStart
. -
dateMentionEnd
: find events where articles explicitly mention a date that is lower or equal todateMentionEnd
. -
ignoreKeywords
: ignore events where articles about the event mention the provided keywords. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreConceptUri
: ignore events that are about all provided concepts. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreCategoryUri
: ignore events that are about the provided set of categories. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreSourceUri
: ignore events that have have articles which have been reported by the specified list of news sources. Specify value as string or list inQueryItems.OR()
. -
ignoreSourceLocationUri
: ignore events that have have articles which have been reported by news sources located at the specified geographic location(s). Specify value as string or list inQueryItems.OR()
. -
ignoreSourceGroupUri
: ignore events that have have articles which have been reported by the news sources assigned to the specified source groups. Specify value as string or list inQueryItems.OR()
. -
ignoreAuthorUri
: ignore events that contain articles that were reported by the specified author(s). Specify value as string or list inQueryItems.OR()
. -
ignoreLang
: ignore events that are reported in any of the provided languages. Specify value as string or list inQueryItems.OR()
. See supported languages for the list of language codes to use. -
ignoreLocationUri
: ignore events that occured in any of the provided locations. A location can be a city or a place. Specify value as string or list inQueryItems.OR()
. -
keywordsLoc
: where should we look when searching using the keywords provided bykeywords
parameter. "body" (default), "title", or "body,title" -
ignoreKeywordsLoc
: where should we look when data should be used when searching using the keywords provided byignoreKeywords
parameter. "body" (default), "title", or "body,title" -
requestedResult
: the information that should be returned as the result of the query. IfNone
then by default we setRequestEventsInfo()
.
When two or more parameters are specified in the constructor, the results will be computed in a way so that all conditions will be met. For example, if you specify QueryEventsIter(keywords = "Barack Obama", conceptUri = "http://en.wikipedia.org/wiki/White_House")
then the resulting events will contain articles that mention phrase Barack Obama
and have White House
as an important concept.
Creating QueryEventsIter
using static methods
The QueryEventsIter
class can also be initialized in the following way:
QueryEventsIter.initWithComplexQuery()
is a static method that can be used to create a complex query based on the advanced query language. You can call the method by providing an instance of ComplexEventQuery
class. Alternatively, you can also call the method with a python dict or a string containing the JSON object matching the language (see the examples).
Methods
The class QueryEventsIter
has two main methods: count()
and execQuery()
.
count(er)
simply returns the number of events that match the specified conditions. er
is the instance of the EventRegistry
class.
execQuery
method has the following format:
execQuery(er,
sortBy = "rel",
sortByAsc = False,
returnInfo = ReturnInfo(),
maxItems = -1)
-
er
: instance of theEventRegistry
class. -
sortBy
: sets the order in which the resulting events are sorted, before returning. Options:date
(by event date),rel
(relevance to the query),size
(number of articles),socialScore
(amount of shares in social media). -
sortByAsc
: should the results be sorted in ascending order. -
returnInfo
: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details. -
maxItems
: max number of events to return by the iterator. Use default (-1) to simply return all the events.
Example of usage
Before describing the QueryEvents()
class and the event details that can be requested, let's look at an example of how it can be used:
from eventregistry import *
er = EventRegistry(apiKey = YOUR_API_KEY)
q = QueryEvents(
conceptUri = er.getConceptUri("Obama"), # get events related to Barack Obama
sourceUri = er.getNewsSourceUri("bbc")) # that have been reported also by BBC
# return top 5 locations and organizations mentioned the most in these events
q.setRequestedResult(RequestEventsConceptAggr(conceptCount = 5,
returnInfo = ReturnInfo(conceptInfo = ConceptInfoFlags(type = ["org", "loc"]))))
# execute the query
res = er.execQuery(q)
Constructor
QueryEvents
constructor accepts the same arguments as QueryEventsIter
Creating QueryEvents
using static methods
The QueryEvents
class can also be initialized in two other ways:
QueryEvents.initWithEventUriWgtList()
is a static method that can be used to specify the set of event URIs and their weights that you want to use as the result. In this case, no query conditions are used and this set is used as the resulting set. All the return information about the events will be based on this set of events.
QueryEvents.initWithComplexQuery()
is another static method that can be used to create a complex query based on the advanced query language. You can call the method by providing an instance of ComplexEventQuery
class. Alternatively, you can also call the method with a python dict or a string containing the JSON object matching the language (see the examples).
When executing the query, there will be a set of events that will match the specified criteria. What information about these events is to be returned however still needs to be determined. Do you want to get the list of matching events? Maybe just the timeline when they happened? Maybe information where they happened?
The information to be returned by the matching events is determined by calling the setRequestedResult()
method. The setRequestedResult()
accepts as an argument an instance that has a base class RequestEvents
. Customers subscribed to a plan can additionally also use the addRequestedResult()
method which also accepts the same arguments. The only difference between the methods is that by calling addRequestedResult()
multiple times on a QueryEvents
instance, you can retrieve multiple results with a single query. Free users are only allowed one requested result per call and should just use the setRequestedResult()
method. Below are the classes that can be specified in the addRequestedResult()
and setRequestedResult()
calls:
RequestEventsInfo
RequestEventsInfo(page = 1,
count = 20,
sortBy = "date", sortByAsc = False,
returnInfo = ReturnInfo())
RequestEventsInfo
class provides detailed information about the resulting events.
-
page
: determines the page of the results to return (starting from 1). -
count
: determines the number of events to return. Max events that can be returned per call is 50. -
sortBy
: sets the order in which the resulting events are first sorted, before returning. Options:date
(by event date),rel
(relevance to the query),size
(number of articles),socialScore
(amount of shares in social media). -
sortByAsc
: should the results be sorted in ascending order. -
returnInfo
: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details.
RequestEventsUriWgtList
RequestEventsUriWgtList(page = 1,
count = 50000,
sortBy = "rel", sortByAsc = False)
RequestEventsUriWgtList
returns a simple list of event uris and their weights that match criteria. Useful if you wish to obtain the full list of event URIs in a single query.
-
page
: determines the page of the results to return (starting from 1). -
count
: determines the number of events to return. Max events that can be returned per call is 100.000. -
sortBy
: sets the order in which the resulting events are first sorted, before returning. Options:date
(by event date),rel
(relevance to the query),size
(number of articles),socialScore
(amount of shares in social media). -
sortByAsc
: should the results be sorted in ascending order.
RequestEventsTimeAggr
RequestEventsTimeAggr()
RequestEventsTimeAggr
computes how the resulting events are distributed over time. The constructor does not accept any additional arguments.
RequestEventsKeywordAggr
RequestEventsKeywordAggr(lang = "eng")
RequestEventsKeywordAggr
returns the keywords that best summarize the content the resulting events.
RequestEventsLocAggr
RequestEventsLocAggr(eventsSampleSize = 100000,
returnInfo = ReturnInfo())
RequestEventsLocAggr
returns the information about the geographic locations where the resulting events happened.
-
eventsSampleSize
: on what sample of results should the aggregate be computed (at most 100.000). -
returnInfo
: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details.
RequestEventsLocTimeAggr
RequestEventsLocTimeAggr(eventsSampleSize = 100000,
returnInfo = ReturnInfo())
RequestEventsLocTimeAggr
provides combined details about the location and time of the resulting events.
-
eventsSampleSize
: sample of events to use to compute the location aggregate (at most 100.000) -
returnInfo
: what details (about locations) should be included in the returned information
RequestEventsConceptGraph
RequestEventsConceptGraph(conceptCount = 50,
linkCount = 150,
eventsSampleSize = 50000,
returnInfo = ReturnInfo())
RequestEventsConceptGraph
returns a graph of concepts. Concepts are connected if they are frequently occuring in the same events.
-
conceptCount
: number of top concepts (nodes) to return (at most 1,000). -
linkCount
: number of edges in the graph (at most 2,000). -
eventsSampleSize
: on what sample of events should the graph be computed (at most 50.000). -
returnInfo
: the details about the types of returned data to include. See details.
RequestEventsConceptMatrix
RequestEventsConceptMatrix(conceptCount = 25,
measure = "pmi",
eventsSampleSize = 50000,
returnInfo = ReturnInfo())
RequestEventsConceptGraph
computes a matrix of concepts and their dependencies. For individual concept pairs it returns how frequently they co-occur in the resulting events and how surprising this is, based on the frequency of individual concepts.
-
measure
: the measure to be used for computing the "surprise factor". Options:pmi
(pointwise mutual information),pairTfIdf
(pair frequence * IDF of individual concepts),chiSquare
. -
eventsSampleSize
: on what sample of events should the aggregate be computed (at most 100.000). -
returnInfo
: the details about the types of returned data to include. See details.
RequestEventsConceptTrends
RequestEventsConceptTrends(
conceptUris = None,
conceptCount = 10,
returnInfo = ReturnInfo())
RequestEventsConceptTrends
provides a list of most popular concepts in the results and how they daily trend over time.
-
conceptUris
: list of concept URIs for which to return trending information. IfNone
, then top concepts will be automatically computed -
conceptCount
: number of top concepts to return. -
returnInfo
: the details about the returned concepts to include. See details.
RequestEventsSourceAggr
RequestEventsSourceAggr(sourceCount = 30,
eventsSampleSize = 50000,
returnInfo = ReturnInfo())
RequestEventsSourceAggr
provides a list of top news sources that have written the most articles in the resulting events.
-
sourceCount
: number of top news sources to return. -
eventsSampleSize
: on what sample of events should the aggregate be computed (at most 100.000). -
returnInfo
: the details about the types of returned data to include. See details.
RequestEventsDateMentionAggr
RequestEventsDateMentionAggr(minDaysApart = 0,
minDateMentionCount = 5,
eventsSampleSize = 50000)
RequestEventsDateMentionAggr
provides information about the dates that have been found mentioned in the resulting events.
-
minDaysApart
: ignore events that don't have a date that is more than this number of days apart from the tested event -
minDateMentionCount
: report only dates that are mentioned at least this number of times -
eventsSampleSize
: on what sample of results should the aggregate be computed (at most 100000)
RequestEventsEventClusters
RequestEventsEventClusters(keywordCount = 30,
maxEventsToCluster = 10000,
returnInfo = ReturnInfo())
RequestEventsEventClusters
clusters the resulting events based on the event concepts. The resulting events are clustered repeatedly using 2-means clustering in order to provide a hierarchical view of the events split into smaller parts.
-
keywordCount
: sets the number of keywords that will be returned for each cluster on each level. -
maxEventsToCluster
: determines the maximum number of events that will be clustered. -
returnInfo
: the details about the types of returned data to include. See details.
RequestEventsCategoryAggr
RequestEventsCategoryAggr(returnInfo = ReturnInfo())
RequestEventsCategoryAggr
returns information about what categories are the resulting events about.
-
returnInfo
: the details about the returned categories to include. See details.
RequestEventsRecentActivity
RequestEventsRecentActivity(maxEventCount = 50,
updatesAfterTm = None,
updatesAfterMinsAgo = None,
updatesUntilTm = None,
updatesUntilMinsAgo = None,
mandatoryLocation = True,
lang = None,
minAvgCosSim = 0,
returnInfo = ReturnInfo())
RequestEventsRecentActivity
is to be used to get the events that match the particular set of search conditions and were identified/updated in Event Registry after the specified time.
-
maxEventCount
: maximum number of events to return (max 2000). If more than 50 events are requested then the correspondingly higher number of tokens will be used with a single call. -
updatesAfterTm
: starting time after which the resulting events should be identified/updated in Event Registry. Specify a datetime instance or a string in format 'YYYY-MM-DDTHH:MM:SS.SSSS' that represents time in ISO format. When making consecutive calls, you can use valuecurrTime
returned from a previous call. -
updatesAfterMinsAgo
: instead of specifying theupdatesAfterTm
you can also simply ask to get content that was identified/updated after some minutes ago. You can use this if you are calling the API in regular time intervals. -
updatesUntilTm
: ending time before which the resulting events should be created/updated in Event Registry. Specify a datetime instance or a string in format 'YYYY-MM-DDTHH:MM:SS.SSSS' that represents time in ISO format. -
updatesUntilMinsAgo
: instead of specifying theupdatesUntilTm
you can also simply ask to get events that were created/updated before this number of minutes ago. -
mandatoryLocation
: should we limit events to only those for which we were able to identify it's geographical location? -
lang
: limit resulting events to the ones that are reported in the selected language(s). -
minAvgCosSim
: what should be the minimum similarity of the articles in the returned events? Higher value will only return events with articles with similar content. Valid values are between 0 and 1. -
returnInfo
: the details about the types of returned data to include. See details.
For many users, simply providing a list of concepts, keywords, sources etc. is not sufficient and a more complex way of specifying a query is required. For such purposes we provide a query language where conditions can be specified in particular JSON object, that resembles the query language used by the MongoDB. The grammar for the language is as follows:
ComplexEventQuery ::=
{
"$query": CombinedQuery | BaseQuery,
}
CombinedQuery ::=
{
"$or": [ CombinedQuery | BaseQuery, ... ],
"$not": null | CombinedQuery | BaseQuery
}
CombinedQuery ::=
{
"$and": [ CombinedQuery | BaseQuery, ... ],
"$not": null | CombinedQuery | BaseQuery
}
BaseQuery ::=
{
"conceptUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"keyword": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"categoryUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"lang": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"sourceUri": null | string | { "$or": [ string, ... ]},
"sourceLocationUri": null | string | { "$or": [ string, ... ]},
"sourceGroupUri": null | string | { "$or": [ string, ... ]},
"authorUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"locationUri": null | string | { "$or": [ string, ... ]},
"dateStart": null | string,
"dateEnd": null | string,
"dateMention": null | [string, ...],
"keywordLoc": null | "body" | "title" | "title,body",
"minArticlesInEvent": null | int,
"maxArticlesInEvent": null | int,
"$not": null | CombinedQuery | BaseQuery
}
Each complex event query needs to be a JSON object that has a $query
key. The $query
key must contain another JSON object that should be parsable as a CombinedQuery
or a BaseQuery
. A CombinedQuery
can be used to specify a list of conditions, where all ($and
) or any ($or
) conditions should hold. The CombinedQuery
can also contain a $not
key containing another CombinedQuery
or BaseQuery
defining the results that should be excluded from the results computed by the $and
or $or
conditions. The BaseQuery
represents a JSON object with actual conditions to search for. These (positive) conditions can include concepts, keywords, categories, sources, authors, etc. to search for. If multiple conditions are specified, for example, a conceptUri
as well as a sourceUri
, then results will have to match all the conditions. The BaseQuery
can also contain the $not
key specifying results to exclude from the results matching the positive conditions of the BaseQuery
. A BaseQuery
containing only the $not
key is not a valid query (since it has no positive conditions).
Using this language you can specify queries that are not possible to express using the constructor parameters in QueryEvents
or QueryEventsIter
.
Here are some examples of queries and what they would return:
A query that would return the list of events that are about Twitter or have been covered by Techcrunch and Arstechnica.
{
"$query": {
"$or": [
{ "conceptUri": "http://en.wikipedia.org/wiki/Twitter" },
{ "sourceUri": { "$and": ["techcrunch.com", "arstechnica.com"] } }
]
}
}
A query that finds events that happened in London or Berlin and were not reported in Arabic or Spanish language:
{
"$query": {
"locationUri": [
"http://en.wikipedia.org/wiki/London",
"http://en.wikipedia.org/wiki/Berlin"
],
"$not": {
"lang": {
"$or": ["ara", "spa"]
}
}
}
}
Depending on your preference, you can build such JSONs for these complex queries yourself or you can use the associated classes such as ComplexEventQuery()
, CombinedQuery()
and BaseQuery()
. Below is an example where we search for events that are either about Donald Trump or are in the Politics category, but did not occur in February 2017 or mention Barack Obama:
er = EventRegistry()
trumpUri = er.getConceptUri("Trump")
obamaUri = er.getConceptUri("Obama")
politicsUri = er.getCategoryUri("politics")
cq = ComplexEventQuery(
CombinedQuery.OR(
[
BaseQuery(conceptUri = trumpUri),
BaseQuery(categoryUri = politicsUri)
],
exclude = CombinedQuery.OR([
BaseQuery(dateStart = "2017-02-01", dateEnd = "2017-02-28"),
BaseQuery(conceptUri = obamaUri)])
))
q = QueryEvents.initWithComplexQuery(cq)
q.setRequestedResult(RequestEventsInfo())
res = self.er.execQuery(q)
If you've built the JSON query yourself, you can also use like this:
er = EventRegistry()
q = QueryEvents.initWithComplexQuery("{ '$query': { ... } }")
q.setRequestedResult(RequestEventsInfo())
res = self.er.execQuery(q)
In this case you need to make sure you're providing a valid query in the JSON.
Similarly, if you would like to use the QueryEventsIter
to quickly iterate over the results, you can also use the initWithComplexQuery
method like this:
er = EventRegistry()
q = QueryEventsIter.initWithComplexQuery("{ '$query': { ... } }")
for event in q.execQuery(er):
print event
Core Information
Usage tracking
Terminology
EventRegistry
class
ReturnInfo
class
Data models for returned information
Finding concepts for keywords
Filtering content by news sources
Text analytics
Semantic annotation, categorization, sentiment
Searching
Searching for events
Searching for articles
Article/event info
Get event information
Get article information
Other
Supported languages
Different ways to search using keywords
Feed of new articles/events
Social media shares
Daily trends
Find the event for your own text
Article URL to URI mapping