Skip to content

Releases: EventRegistry/event-registry-python

Added semanticSimilarity, extractArticleInfo analytics endpoint, many backend changes

29 Jun 06:38
Compare
Choose a tag to compare

Added

  • Text analytics: added Analytics.semanticSimilarity API call. It can be used to determine how semantically related two documents are. The documents can be in the same or different languages.
  • Text analytics: added Analytics.extractArticleInfo API call. It provides functionality to extract article title, body, date, author and other information from the given URL.
  • dataType parameter when searching for articles. Event Registry is now separating collected content by data type. The possible values for data type are "news", "pr" (for PR content) and "blogs" (we will start indexing and providing blog content shortly). The dataType parameter can be set in the QueryArticles and QueryArticlesIter classes as well as in the EventRegistry.getNewsSourceUri and EventRegistry.suggestNewsSources.

Changed

  • changed params in the GetCounts and GetCountsEx classes.
  • EventRegistry.suggestNewsSources() and EventRegistry.getNewsSourceUri() now also accepts dataType parameter, which is by default ["news", "pr"]. It determines what kind of data sources to include in the generated suggestions.
  • QueryArticles and QueryArticlesIter classes now supports additional parameter dataType that determines what type of data should be returned. By default, the value is news. For now it can also be pr or an array with both values.

Removed

  • Removed the RequestEventArticleUris, RequestArticlesUriList, RequestEventsUriList due to backend changes. Use the equivalent *UriWgt* version of the classes.
  • Removed RequestArticlesUrlList class since it is not supported anymore.
  • Removed QueryArticles.addRequestedResult(), QueryEvents.addRequestedResult(), QueryArticle.addRequestedResult(), QueryEvent.addRequestedResult(), and Query.clearRequestedResults(). As before, a single result type can be requested per call so the methods are not usable. Use setRequestedResult() methods.
  • Data model change: We removed the id property from different returned data objects. Although the documentation clearly stated that the property is for internal use only, users commonly used the property, which caused potential issues.

Added text analytics class; extraction of article links and sentiment

01 Dec 07:58
Compare
Choose a tag to compare

Added

  • Added a class Analytics that can be used to semantically annotate a document, categorize the document into a predefined taxonomy of categories or to detect a language of a text. In future, more analytics methods will be added to this class. NOTE: the functionality is currently in BETA. The API calls or the provided outputs may change in the future.
  • Added property links into the output of the article format. It contains the list of URLs extracted from the article body (not from the whole HTML but just the part containing the body).
  • Added sentiment to the news articles. The sentiment property will be by default added to the output format for the article. It can be null if the property is not set.

Removed

  • Removed the flag details from all the *InfoFlags that had it (ArticleInfoFlag, SourceInfoFlag, etc.). All the properties provided previously by this property are provided anyway using the other flags.
  • Removed the flag flags from all the *InfoFlags. The flag represents some internal properties that are not publicly useful.

flags for allowed use of archive, token usage monitoring, ...

14 Oct 17:55
Compare
Choose a tag to compare

Added

  • Added a flag allowUseOfArchive to EventRegistry constructor. The flag determines if queries made by that EventRegistry instance can use the archive data (data since Jan 2014) or just the recent data (last 31 days of content). Queries made on the archive use more of your data plan tokens so if you just want to use the recent content, make sure that you set the flag to False. Note that archive data can be accessed only by paid subscribers.
  • Added EventRegistry.printLastReqStats() which prints to console some stats regarding the latest executed request. It prints whether the archive was used in the query, the number of tokens used by the request, etc.
  • Added a parameter allowUseOfArchive to the EventRegistry.execQuery() method. It can be used to override the flag about the use of archive that was set when constructing the Event Registry class.
  • Added version checking on the startup. If your version of the module is of lower value than the latest version, we print a warning.

Changed

  • Changed the maximum number of articles and events that can be returned per search. The maximum number of returned articles can be 100 and the number of events can be 50.

Deprecated

Removed

  • Removed the query parameters categoryIncludeSub and ignoreCategoryIncludeSub. The flag is set to true and can not be changed.
  • Removed parameter maxItems from QueryArticlesIter.execQuery() and QueryEventIter.execQuery(). The iterator will always cache the maximum number of items that can be returned with a single query.

Fixed

  • When using the article and event iterators, the iterators now automatically know if the archive should be used when downloading different pages matching the search results.

search in title vs body, extraction of video links, find places and sources at a location, ...

21 Aug 14:25
Compare
Choose a tag to compare

Added

  • QueryArticles and QueryArticlesIter now support additional constructor argument keywordsLoc which allows users to specify where should the keywords provided using keywords occur. Default is body (the keywords should be mentioned in the body of the article), other valid options are title (should be mentioned in the article's title) or title,body (should be mentioned anywhere in the article).
  • QueryArticles and QueryArticlesIter: same as keywordsLoc determines keyword location for keywords, an ignoreKeywordsLoc parameter can also be specified for determining the location of the keywords to ignore, which are determined by ignoreKeywords parameter.
  • When using the advanced query language, you can now also specify keywordLoc parameter in the BaseQuery.
  • added EventRegistry.suggestLocationsAtCoordinate() method which returns geographic places near the given geo locations
  • added EventRegistry.suggestSourcesAtCoordinate() method which returns the list of news sources that are close to the given geographic location
  • added EventRegistry.suggestSourcesAtPlace() method that can return a list of news sources that we are crawling at the specified place or country. The input argument has to be a location URI obtained by calling EventRegistry.getLocationUri().
  • added EventRegistry.getUrl() method which for a given query object returns the url that can be used to make a direct HTTP request.
  • added videos property to Article data model. When one or more videos were identified in an article you can retrieve them by setting video=True flag in ArticleInfoFlags.
  • added category weights to articles. Our models currently produce weights for each of the categories associated with an article. The weights are in range 1 to 100. The weights were present even before, but their value was always 100.

Changed

  • When querying for articles, we now by default return full article body. Previously we returned 300 characters by default.
  • ArticleMapper.getArticleUri() now returns None or string, no longer a list. We no longer store multiple versions of the articles with the same url.
  • we've changed the order of parameters in ArticleInfoFlags. In case you didn't set parameter values by name, then check if it matches the desired properties. The change was done to reflect importance and usability of individual parameters.

Removed

  • EventRegistry.getArticleUris() no longer accepts parameter includeAllVersions.

Support for search by source location, source groups; sorting by alexa rankings

12 Jun 17:57
Compare
Choose a tag to compare

Added

  • QueryArticles and QueryEvents: When creating an instance of the class using a parameter that is a list (such as conceptUri, categoryUri, ...) you can (should) now provide the list using the QueryItems.AND() or QueryItems.OR() methods to explicitly define whether Boolean AND or OR should be used between the multiple items. If just a list is provided instead, a warning will be displayed in the console output. If a single value is used for the parameter, it is still perfectly ok to provide it directly as string.
  • QueryArticles and QueryEvents: Added two new supported parameters sourceLocationUri and sourceGroupUri. Parameter sourceLocationUri can be used to specify a location URI (obtained with EventRegistry.getLocationUri) to use a set of news sources from a specific geographic location. The locations used can be cities or countries. sourceGroupUri can be used to use in search a set of news sources that belong to a manually curated list of news sources (such as top business related sources, top entertainment sources, ...). See next item to see how to find the values for this parameter.
  • EventRegistry class. Added methods suggestSourceGroups() and getSourceGroupUri() that can be used to get the list of news source groups that match a given name/uri (suggestSourceGroups()) or the single top suggestion (getSourceGroupUri()). Source groups are that can be used to find or filter content to a specific set of publishers.
  • when querying a list of articles, valid sortBy values are now also sourceAlexaGlobalRank (global rank of the news source) and sourceAlexaCountryRank (country rank of the news source).
  • SourceInfoFlags flag image was added which, if True adds image and thumbImage fields to the returned source information.

Changed

  • QueryArticles and QueryEvents: Default values for parameters conceptUri, categoryUri and other parameters that accept lists were changed from [] to None to reflect the preference for using QueryItems class when specifying an array of values.
  • QueryArticles and QueryEvents: changed method setArticleUriList() to a static method initWithArticleUriList() to avoid mistakenly creating an instance with query parameters and additionaly caling the setArticleUriList().
  • QueryArticles and QueryEvents: method initWithComplexQuery() now accepts also query as a string value, not only instances of ComplexArticleQuery and ComplexEventQuery.
  • SourceInfoFlags flag importance was changed to ranking since now we return multiple rankings for the source
  • SourceInfoFlags flag tags was changed to sourceGroups since term tags was too generic.
  • Articles data model has changed: socialScore property is now named shares to better represent the content. The returned object can now include also shares on Google Plus, Pinterest, LinkedIn. The name of the parameter socialScore in ArticleInfoFlags was also changed to shares.
  • Source data model has changed: importance property was changed to an object ranking containing multiple indicators of source importance.

Deprecated

  • when sorting articles, sortBy value sourceImportance is now deprecated. Use value sourceImportanceRank. Is is equvalent to reversed value of sourceImportance therefore also make sure to negate your existing value of sortByAsc value. The parameter was changed to make it comparable to added sorting options sourceAlexaGlobalRank and sourceAlexaCountryRank which also represent rankings (lower value means better value).

Removed

  • QueryArticles and QueryEvents: removed the conceptOper parameter. It's functionality is now replaced by providing the array of values inside QueryItems.AND() or QueryItems.OR().
  • QueryArticles and QueryEvents: removed the utility methods addConcept(), addLocation(), addCategory(), addNewsSource(), addKeyword(), setDateLimit(), setDateMentionLimit(). The values of these parameters should be set when initializing the object. The methods were removed since users used static method initWithComplexQuery() and additionally calling these methods which had no effect on the results.

Support for complex queries

11 Apr 21:36
Compare
Choose a tag to compare

For power users we have added a query language that can use nested query objects and AND and OR operators on all query items.

All details about the query language are described on our documentation page:

Query language for events

Query language for articles

Python 3 support & Iterator bug fixes

22 Mar 08:56
Compare
Choose a tag to compare
  • This release adds support for iterators for Python 3.
  • A bug fix was made for a bug that occurred when iterating over a large set of results

Iterators & required use of apiKey

06 Mar 07:34
Compare
Choose a tag to compare

In this release we introduce two major changes. The first change is the possibility of using iterators to iterate over search results containing events and articles.
Details and an example of the iterator can be read on the blog post: http://blog.eventregistry.org/2017/03/05/simplifying-the-data-access-with-iterators/
as welll as in the documentation:
https://github.com/EventRegistry/event-registry-python/wiki/Searching-for-events#queryeventsiter
https://github.com/EventRegistry/event-registry-python/wiki/Searching-for-articles#queryarticlesiter
https://github.com/EventRegistry/event-registry-python/wiki/Get-event-information#queryeventarticlesiter

The other significant change is that we have removed the EventRegistry.login() method. The users should now authenticate using their API key. You can specify your API key when you create EventRegistry instance:

er = EventRegistry(apiKey = YOUR_API_KEY)

If you don't know how to obtain your API key, please check the documentation: https://github.com/EventRegistry/event-registry-python/wiki/EventRegistry-class#authorization