## [v8.4]() (2018-08-24)

**Added** - added `EventRegistry.getUsageInfo()` method, which returns the number of used tokens and the total number of available tokens for the given user. The existing methods `EventRegisty.getRemainingAvailableRequests()` and `EventRegistry.getDailyAvailableRequests()` are still there, but their value is only valid after making at least one request. - added searching of articles and events based on article authors. You can now provide `authorUri` parameter when creating the `QueryArticles` and `QueryEvents` instances. - added author related methods to `EventRegistry` class: `EventRegistry.suggestAuthors()` to obtain uris of authors for given (partial) name and `EventRegistry.getAuthorUri()` to obtain a single author uri for the given (partial) name. - added ability to search articles and events by authors. `QueryArticles` and `QueryEvents` constructors now also accept `authorUri` parameter that can be used to limit the results to articles/events by those authors. Use `QueryOper.AND()` or `QueryOper.OR()` to specify multiple authors in the same query. - BETA: added a filter for returning only articles that are written by sources that have a certain ranking. The filter can be specified by setting the parameters `startSourceRankPercentile` and `endSourceRankPercentile` when creating the `QueryArticles` instance. The default value for `startSourceRankPercentile` is 0 and for `endSourceRankPercentile` is 100. The values that can be set are not any value between 0 and 100 but has to be a number divisible by 10. By setting `startSourceRankPercentile` to 0 and `endSourceRankPercentile` to 20 you would get only articles from top ranked news sources (according to [Alexa site ranking](https://www.alexa.com/siteinfo)) that would amount to about *approximately 20%* of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. The reason for this choice is that top ranked 10% of news sources writes about 30% of all news content and our choice normalizes this effect. This feature could potentially change in the future. - `QueryEventArticlesIter` is now able to return only a subset of articles assigned to an event. You can use the same filters as with the `QueryArticles` constructor and you can specify them when constructing the instance of `QueryEventArticlesIter`. The same kind of filtering is also possible if you want to use the `RequestEventArticles()` class instead. - added some parameters and changed default values in some of the result types to reflect the backend changes. - added optional parameter `proxyUrl` to `Analytics.extractArticleInfo()`. It can be used to download article info through a proxy that you provide (to avoid potential GDPR issues). The `proxyUrl` should be in format `{schema}://{username}:{pass}@{proxy url/ip}`.
EventRegistry · Aug 24, 2018 · e09a66c · e09a66c
1 parent 30e3bad
commit e09a66c
Show file tree

Hide file tree

Showing 12 changed files with 486 additions and 105 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,17 @@
 # Change Log
 
+## [v8.4]() (2018-08-24)
+
+**Added**
+- added `EventRegistry.getUsageInfo()` method, which returns the number of used tokens and the total number of available tokens for the given user. The existing methods `EventRegisty.getRemainingAvailableRequests()` and `EventRegistry.getDailyAvailableRequests()` are still there, but their value is only valid after making at least one request.
+- added searching of articles and events based on article authors. You can now provide `authorUri` parameter when creating the `QueryArticles` and `QueryEvents` instances.
+- added author related methods to `EventRegistry` class: `EventRegistry.suggestAuthors()` to obtain uris of authors for given (partial) name and `EventRegistry.getAuthorUri()` to obtain a single author uri for the given (partial) name.
+- added ability to search articles and events by authors. `QueryArticles` and `QueryEvents` constructors now also accept `authorUri` parameter that can be used to limit the results to articles/events by those authors. Use `QueryOper.AND()` or `QueryOper.OR()` to specify multiple authors in the same query.
+- BETA: added a filter for returning only articles that are written by sources that have a certain ranking. The filter can be specified by setting the parameters `startSourceRankPercentile` and `endSourceRankPercentile` when creating the `QueryArticles` instance. The default value for `startSourceRankPercentile` is 0 and for `endSourceRankPercentile` is 100. The values that can be set are not any value between 0 and 100 but has to be a number divisible by 10. By setting `startSourceRankPercentile` to 0 and `endSourceRankPercentile` to 20 you would get only articles from top ranked news sources (according to [Alexa site ranking](https://www.alexa.com/siteinfo)) that would amount to about *approximately 20%* of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. The reason for this choice is that top ranked 10% of news sources writes about 30% of all news content and our choice normalizes this effect. This feature could potentially change in the future.
+- `QueryEventArticlesIter` is now able to return only a subset of articles assigned to an event. You can use the same filters as with the `QueryArticles` constructor and you can specify them when constructing the instance of `QueryEventArticlesIter`. The same kind of filtering is also possible if you want to use the `RequestEventArticles()` class instead.
+- added some parameters and changed default values in some of the result types to reflect the backend changes.
+- added optional parameter `proxyUrl` to `Analytics.extractArticleInfo()`. It can be used to download article info through a proxy that you provide (to avoid potential GDPR issues). The `proxyUrl` should be in format `{schema}://{username}:{pass}@{proxy url/ip}`.
+
 ## [v8.3.1]() (2018-08-12)
 
 **Updated**

diff --git a/eventregistry/Analytics.py b/eventregistry/Analytics.py
@@ -75,13 +75,18 @@ def detectLanguage(self, text):
         return self._er.jsonRequestAnalytics("/api/v1/detectLanguage", { "text": text })
 
 
-    def extractArticleInfo(self, url):
+    def extractArticleInfo(self, url, proxyUrl = None):
         """
         extract all available information about an article available at url `url`. Returned information will include
         article title, body, authors, links in the articles, ...
+        @param url: article url to extract article information from
+        @param proxyUrl: proxy that should be used for downloading article information. format: {schema}://{username}:{pass}@{proxy url/ip}
         @returns: dict
         """
-        return self._er.jsonRequestAnalytics("/api/v1/extractArticleInfo", { "url": url })
+        params = { "url": url }
+        if proxyUrl:
+            params["proxyUrl"] = proxyUrl
+        return self._er.jsonRequestAnalytics("/api/v1/extractArticleInfo", params)
 
 
     def ner(self, text):

diff --git a/eventregistry/Base.py b/eventregistry/Base.py
@@ -196,27 +196,6 @@ def _getQueryParams(self):
         return dict(self.queryParams)
 
 
-
-class Query(QueryParamsBase):
-    def __init__(self):
-        QueryParamsBase.__init__(self)
-        self.resultTypeList = []
-
-
-    def _getQueryParams(self):
-        """encode the request."""
-        allParams = {}
-        if len(self.resultTypeList) == 0:
-            raise ValueError("The query does not have any result type specified. No sense in performing such a query")
-        allParams.update(self.queryParams)
-        for request in self.resultTypeList:
-            allParams.update(request.__dict__)
-        # all requests in resultTypeList have "resultType" so each call to .update() overrides the previous one
-        # since we want to store them all we have to add them here:
-        allParams["resultType"] = [request.__dict__["resultType"] for request in self.resultTypeList]
-        return allParams
-
-
     def _setQueryArrVal(self, value, propName, propOperName, defaultOperName):
         """
         parse the value "value" and use it to set the property propName and the operator with name propOperName
@@ -251,4 +230,27 @@ def _setQueryArrVal(self, value, propName, propOperName, defaultOperName):
 
         # there should be no other valid types
         else:
-            assert False, "Parameter '%s' was of unsupported type. It should either be None, a string or an instance of QueryItems" % (propName)
+            assert False, "Parameter '%s' was of unsupported type. It should either be None, a string or an instance of QueryItems" % (propName)
+
+
+
+class Query(QueryParamsBase):
+    def __init__(self):
+        QueryParamsBase.__init__(self)
+        self.resultTypeList = []
+
+
+    def _getQueryParams(self):
+        """encode the request."""
+        allParams = {}
+        if len(self.resultTypeList) == 0:
+            raise ValueError("The query does not have any result type specified. No sense in performing such a query")
+        allParams.update(self.queryParams)
+        for request in self.resultTypeList:
+            allParams.update(request.__dict__)
+        # all requests in resultTypeList have "resultType" so each call to .update() overrides the previous one
+        # since we want to store them all we have to add them here:
+        allParams["resultType"] = [request.__dict__["resultType"] for request in self.resultTypeList]
+        return allParams
+
+
diff --git a/eventregistry/EventRegistry.py b/eventregistry/EventRegistry.py
@@ -141,15 +141,20 @@ def printConsole(self, text):
 
 
     def getRemainingAvailableRequests(self):
-        """get the number of requests that are still available for the user today"""
+        """get the number of requests that are still available for the user today. Information is only accessible after you make some query."""
         return self._remainingAvailableRequests
 
 
     def getDailyAvailableRequests(self):
-        """get the total number of requests that the user can make in a day"""
+        """get the total number of requests that the user can make in a day. Information is only accessible after you make some query."""
         return self._dailyAvailableRequests
 
 
+    def getUsageInfo(self):
+        """return the number of used and total available tokens. Can be used at any time (also before making queries)"""
+        return self.jsonRequest("/api/v1/usage", { "apiKey": self._apiKey })
+
+
     def getUrl(self, query):
         """
         return the url that can be used to get the content that matches the query
@@ -349,7 +354,7 @@ def suggestConcepts(self, prefix, sources = ["concepts"], lang = "eng", conceptL
         params = { "prefix": prefix, "source": sources, "lang": lang, "conceptLang": conceptLang, "page": page, "count": count}
         params.update(returnInfo.getParams())
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestConcepts", params)
+        return self.jsonRequest("/json/suggestConceptsFast", params)
 
 
     def suggestCategories(self, prefix, page = 1, count = 20, returnInfo = ReturnInfo(), **kwargs):
@@ -364,7 +369,7 @@ def suggestCategories(self, prefix, page = 1, count = 20, returnInfo = ReturnInf
         params = { "prefix": prefix, "page": page, "count": count }
         params.update(returnInfo.getParams())
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestCategories", params)
+        return self.jsonRequest("/json/suggestCategoriesFast", params)
 
 
     def suggestNewsSources(self, prefix, dataType = ["news", "pr", "blog"], page = 1, count = 20, **kwargs):
@@ -378,7 +383,7 @@ def suggestNewsSources(self, prefix, dataType = ["news", "pr", "blog"], page = 1
         assert page > 0, "page parameter should be above 0"
         params = {"prefix": prefix, "dataType": dataType, "page": page, "count": count}
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestSources", params)
+        return self.jsonRequest("/json/suggestSourcesFast", params)
 
 
     def suggestSourceGroups(self, prefix, page = 1, count = 20, **kwargs):
@@ -413,7 +418,7 @@ def suggestLocations(self, prefix, sources = ["place", "country"], lang = "eng",
             assert len(sortByDistanceTo) == 2, "The sortByDistanceTo should contain two float numbers"
             params["closeToLat"] = sortByDistanceTo[0]
             params["closeToLon"] = sortByDistanceTo[1]
-        return self.jsonRequest("/json/suggestLocations", params)
+        return self.jsonRequest("/json/suggestLocationsFast", params)
 
 
     def suggestLocationsAtCoordinate(self, latitude, longitude, radiusKm, limitToCities = False, lang = "eng", count = 20, ignoreNonWiki = True, returnInfo = ReturnInfo(), **kwargs):
@@ -433,7 +438,7 @@ def suggestLocationsAtCoordinate(self, latitude, longitude, radiusKm, limitToCit
         params = { "action": "getLocationsAtCoordinate", "lat": latitude, "lon": longitude, "radius": radiusKm, "limitToCities": limitToCities, "count": count, "lang": lang }
         params.update(returnInfo.getParams())
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestLocations", params)
+        return self.jsonRequest("/json/suggestLocationsFast", params)
 
 
     def suggestSourcesAtCoordinate(self, latitude, longitude, radiusKm, count = 20, **kwargs):
@@ -448,7 +453,7 @@ def suggestSourcesAtCoordinate(self, latitude, longitude, radiusKm, count = 20,
         assert isinstance(longitude, (int, float)), "The 'longitude' should be a number"
         params = {"action": "getSourcesAtCoordinate", "lat": latitude, "lon": longitude, "radius": radiusKm, "count": count}
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestSources", params)
+        return self.jsonRequest("/json/suggestSourcesFast", params)
 
 
     def suggestSourcesAtPlace(self, conceptUri, dataType = "news", page = 1, count = 20, **kwargs):
@@ -461,7 +466,21 @@ def suggestSourcesAtPlace(self, conceptUri, dataType = "news", page = 1, count =
         """
         params = {"action": "getSourcesAtPlace", "conceptUri": conceptUri, "page": page, "count": count, "dataType": dataType}
         params.update(kwargs)
-        return self.jsonRequest("/json/suggestSources", params)
+        return self.jsonRequest("/json/suggestSourcesFast", params)
+
+
+    def suggestAuthors(self, prefix, page = 1, count = 20, **kwargs):
+        """
+        return a list of news sources that match the prefix
+        @param prefix: input text that should be contained in the author name and source url
+        @param page: page of results
+        @param count: number of returned suggestions
+        """
+        assert page > 0, "page parameter should be above 0"
+        params = {"prefix": prefix, "page": page, "count": count}
+        params.update(kwargs)
+        return self.jsonRequest("/json/suggestAuthorsFast", params)
+
 
 
     def suggestConceptClasses(self, prefix, lang = "eng", conceptLang = "eng", source = ["dbpedia", "custom"], page = 1, count = 20, returnInfo = ReturnInfo(), **kwargs):
@@ -552,6 +571,13 @@ def getNewsSourceUri(self, sourceName, dataType = ["news", "pr", "blog"]):
         return None
 
 
+    def getSourceUri(self, sourceName, dataType=["news", "pr", "blog"]):
+        """
+        alternative (shorter) name for the method getNewsSourceUri()
+        """
+        return self.getNewsSourceUri(sourceName, dataType)
+
+
     def getSourceGroupUri(self, sourceGroupName):
         """
         return the URI of the source group that best matches the name
@@ -600,6 +626,18 @@ def getCustomConceptUri(self, label, lang = "eng"):
         return None
 
 
+    def getAuthorUri(self, authorName):
+        """
+        return author uri that that is the best match for the given author name (and potentially source url)
+        if there are multiple matches for the given author name, they are sorted based on the number of articles they have written (from most to least frequent)
+        @param authorName: partial or full name of the author, potentially also containing the source url (e.g. "george brown nytimes")
+        """
+        matches = self.suggestAuthors(authorName)
+        if matches != None and isinstance(matches, list) and len(matches) > 0 and "uri" in matches[0]:
+            return matches[0]["uri"]
+        return None
+
+
     @staticmethod
     def getUriFromUriWgt(uriWgtList):
         """

diff --git a/eventregistry/Query.py b/eventregistry/Query.py
@@ -33,22 +33,24 @@ def __init__(self,
                  dateEnd = None,
                  dateMention = None,
                  sourceLocationUri = None,
-                 sourceGroupUri = None,
+                 sourceGroupUri=None,
+                 authorUri = None,
                  keywordLoc = "body",
                  minMaxArticlesInEvent = None,
                  exclude = None):
         """
-        @param keyword: keyword(s) to query. Either None, string or QueryItems
-        @param conceptUri: concept(s) to query. Either None, string or QueryItems
-        @param sourceUri: source(s) to query. Either None, string or QueryItems
-        @param locationUri: location(s) to query. Either None, string or QueryItems
-        @param categoryUri: categories to query. Either None, string or QueryItems
-        @param lang: language(s) to query. Either None, string or QueryItems
+        @param keyword: keyword(s) to query. Either None, string or QueryItems instance
+        @param conceptUri: concept(s) to query. Either None, string or QueryItems instance
+        @param sourceUri: source(s) to query. Either None, string or QueryItems instance
+        @param locationUri: location(s) to query. Either None, string or QueryItems instance
+        @param categoryUri: categories to query. Either None, string or QueryItems instance
+        @param lang: language(s) to query. Either None, string or QueryItems instance
         @param dateStart: starting date. Either None, string or date or datetime
         @param dateEnd: ending date. Either None, string or date or datetime
         @param dateMention: search by mentioned dates - Either None, string or date or datetime or a list of these types
         @param sourceLocationUri: find content generated by news sources at the specified geographic location - can be a city URI or a country URI. Multiple items can be provided using a list
         @param sourceGroupUri: a single or multiple source group URIs. A source group is a group of news sources, commonly defined based on common topic or importance
+        @param authorUri: author(s) to query. Either None, string or QueryItems instance
         @param keywordLoc: where should we look when searching using the keywords provided by "keyword" parameter. "body" (default), "title", or "body,title"
         @param minMaxArticlesInEvent: a tuple containing the minimum and maximum number of articles that should be in the resulting events. Parameter relevant only if querying events
         @param exclude: a instance of BaseQuery, CombinedQuery or None. Used to filter out results matching the other criteria specified in this query
@@ -78,6 +80,8 @@ def __init__(self,
 
         self._setQueryArrVal("sourceLocationUri", sourceLocationUri)
         self._setQueryArrVal("sourceGroupUri", sourceGroupUri)
+        self._setQueryArrVal("authorUri", authorUri)
+
         if keywordLoc != "body":
             self._queryObj["keywordLoc"] = keywordLoc