From 485a6b868cdf077cceea11cc34a67689838f8993 Mon Sep 17 00:00:00 2001 From: Karl Broman Date: Mon, 22 Jan 2024 12:46:47 -0600 Subject: [PATCH] Fix vignette for change to arxiv_cats dataset (Issue #60) --- DESCRIPTION | 2 +- NEWS.md | 9 +++++++++ inst/doc/aRxiv.html | 47 +++++++++++++++++++++++++++------------------ vignettes/aRxiv.Rmd | 11 +++++++++-- 4 files changed, 47 insertions(+), 22 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index f03d63c..67c9837 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: aRxiv Title: Interface to the arXiv API -Version: 0.8 +Version: 0.9.1 Date: 2024-01-22 Authors@R: c(person("Karthik", "Ram", role="aut", email="karthik.ram@gmail.com", comment=c(ORCID = "0000-0002-0233-1757")), diff --git a/NEWS.md b/NEWS.md index adc6145..d3cb9a9 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,12 @@ +aRxiv 0.9.1 +----------- + +### BUG FIXES + +* Small revision to aRxiv vignette to deal with the change in the + structure of the `arxiv_cats` dataset. + + aRxiv 0.8 --------- diff --git a/inst/doc/aRxiv.html b/inst/doc/aRxiv.html index 634f1b6..c052d2d 100644 --- a/inst/doc/aRxiv.html +++ b/inst/doc/aRxiv.html @@ -487,18 +487,27 @@

Search terms

Subject classifications

arXiv has a set of 155 subject classifications, searchable with the prefix cat:. The aRxiv package contains a dataset -arxiv_cats containing the abbreviations and descriptions. -Here are the statistics categories.

-
arxiv_cats[grep('^stat', arxiv_cats$abbreviation),]
-
## [1] category          field             subfield          short_description long_description 
-## <0 rows> (or 0-length row.names)
+arxiv_cats containing the categories, short and long +descriptions, as well as field (and, for Physics, subfield). Here are +the column names.

+
colnames(arxiv_cats)
+
## [1] "category"          "field"             "subfield"          "short_description" "long_description"
+

Here are the statistics categories.

+
arxiv_cats[arxiv_cats$field=="Statistics", c("category", "short_description")]
+
##     category short_description
+## 150  stat.AP      Applications
+## 151  stat.CO       Computation
+## 152  stat.ME       Methodology
+## 153  stat.ML  Machine Learning
+## 154  stat.OT  Other Statistics
+## 155  stat.TH Statistics Theory

To search these categories, you need to include either the full term or use the * wildcard.

-
arxiv_count('cat:stat')
+
arxiv_count('cat:stat')
## [1] 0
-
arxiv_count('cat:stat.AP')
+
arxiv_count('cat:stat.AP')
## [1] 17577
-
arxiv_count('cat:stat*')
+
arxiv_count('cat:stat*')
## [1] 114647
@@ -513,17 +522,17 @@

Dates and ranges of dates

2007-10-18 12:25:34. You can use * for a wildcard for the times. For example, to get all manuscripts with initial submission on 2007-10-18:

-
arxiv_count('submittedDate:20071018*')
+
arxiv_count('submittedDate:20071018*')
## [1] 196

But you can’t use the wildcard within the dates.

-
arxiv_count('submittedDate:2007*')
+
arxiv_count('submittedDate:2007*')
## [1] 0

To get a count of all manuscripts with original submission in 2007, use a date range, like [from_date TO to_date]. (If you give a partial date, it’s treated as the earliest date/time that matches, and the range appears to be up to but not including the second date/time.)

-
arxiv_count('submittedDate:[2007 TO 2008]')
+
arxiv_count('submittedDate:[2007 TO 2008]')
## [1] 55749
@@ -531,8 +540,8 @@

Dates and ranges of dates

Search results

The output of arxiv_search() is a data frame with the following columns.

-
res <- arxiv_search('au:"Peter Hall"')
-names(res)
+
res <- arxiv_search('au:"Peter Hall"')
+names(res)
##  [1] "id"               "submitted"        "updated"          "title"            "abstract"        
 ##  [6] "authors"          "affiliations"     "link_abstract"    "link_pdf"         "link_doi"        
 ## [11] "comment"          "journal_ref"      "doi"              "primary_category" "categories"
@@ -551,9 +560,9 @@

Search results

Classification System (e.g., F.2.2). These are not searchable with cat: but are searchable with a general search. -
arxiv_count("cat:14J60")
+
arxiv_count("cat:14J60")
## [1] 0
-
arxiv_count("14J60")
+
arxiv_count("14J60")
## [1] 870
@@ -567,9 +576,9 @@

Sorting results

the order in id_list.

Here’s an example, to sort the results by the date the manuscripts were last updated, in descending order.

-
res <- arxiv_search('au:"Peter Hall" AND ti:deconvolution',
-                    sort_by="updated", ascending=FALSE)
-res$updated
+
res <- arxiv_search('au:"Peter Hall" AND ti:deconvolution',
+                    sort_by="updated", ascending=FALSE)
+res$updated
## [1] "2010-03-01 11:33:37" "2008-10-27 14:27:52" "2008-04-04 12:19:05" "2007-10-18 12:25:34"
@@ -605,7 +614,7 @@

Limit time between search requests

period for the delay configurable with the R option "aRxiv_delay" (in seconds). The default is 3 seconds.

To reduce the delay to 1 second, use:

-
options(aRxiv_delay=1)
+
options(aRxiv_delay=1)

Don’t do searches in parallel (e.g., via the parallel package). You may be locked out from the arXiv API.

diff --git a/vignettes/aRxiv.Rmd b/vignettes/aRxiv.Rmd index 32aa781..75d79dd 100644 --- a/vignettes/aRxiv.Rmd +++ b/vignettes/aRxiv.Rmd @@ -185,11 +185,18 @@ arxiv_count('au:"P Hall"') arXiv has a set of `r nrow(arxiv_cats)` subject classifications, searchable with the prefix `cat:`. The aRxiv package contains a -dataset `arxiv_cats` containing the abbreviations and descriptions. +dataset `arxiv_cats` containing the categories, short and long +descriptions, as well as field (and, for Physics, subfield). +Here are the column names. + +```{r arxiv_cats_colnames} +colnames(arxiv_cats) +``` + Here are the statistics categories. ```{r arxiv_cats} -arxiv_cats[grep('^stat', arxiv_cats$abbreviation),] +arxiv_cats[arxiv_cats$field=="Statistics", c("category", "short_description")] ``` To search these categories, you need to include either the full term