diff --git a/_script/knit_site.R b/_script/knit_site.R index 25e9cf2..43a4a56 100644 --- a/_script/knit_site.R +++ b/_script/knit_site.R @@ -54,4 +54,5 @@ setwd(path) do_knit(option, quiet=TRUE) setwd(oldpath) warnings() +Sys.sleep(1) diff --git a/_theme/templates/footer.html b/_theme/templates/footer.html index 73a8ce8..a1d66ac 100644 --- a/_theme/templates/footer.html +++ b/_theme/templates/footer.html @@ -4,8 +4,9 @@

-© 2023-2024, the authors   —   source +© 2024

-{% endblock %} – \ No newline at end of file +{% endblock %} – + diff --git a/makesite.bat b/makesite.bat index b455546..c96902a 100644 --- a/makesite.bat +++ b/makesite.bat @@ -1,8 +1,8 @@ @ECHO OFF +rm -r build\html Rscript.exe --vanilla _script\knit_site.R clean Rscript.exe --vanilla _script\copy_reports.R -rm -r build\html make html diff --git a/source/_R/data.rmd b/source/_R/aggregated.rmd similarity index 73% rename from source/_R/data.rmd rename to source/_R/aggregated.rmd index f80a0c4..66b24b7 100644 --- a/source/_R/data.rmd +++ b/source/_R/aggregated.rmd @@ -5,7 +5,6 @@ editor_options: chunk_output_type: console --- - ```{r setup, include=FALSE} carob_path = "../../../carob/" @@ -53,7 +52,9 @@ x$Group <- kableExtra::cell_spec(x$Group, "html", link=xurl) cdate <- format(Sys.time(), "%e %B %Y") ``` -We aggregate [standardized](done.html) agricultural research data by groups that have similar variables. Below is a table with the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). +We aggregate [standardized](done.html) agricultural research data by groups of data collected for similar goals and with similar variables. These groups make it easier for us to organize our work but it is important to note that they are not mutually exclusive. For example the "fertilizer" group aggregates experiments and surveys with crop yield and fertilizer application data. While the emphasis of the "agronomy", "survey", and "varieties" data is different, they may also contain fertilizer application data. Likewise, the "varieties" data are about comparing crop varieties, but variety names are also reported in the "fertilizer" group. This means that you may want to consider using data from multiple groups. The maize and wheat varieties have their own groups because of the large amount of data in these groups, and because they have some unique terms. + +The table below shows the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records. @@ -67,6 +68,7 @@ kable(x, "html", escape = FALSE) |> kable_classic(full_width = FALSE) |> kable_styling(bootstrap_options = c("striped", "hover")) ``` +
Here is a map with all locations for which we have at least one observation. diff --git a/source/_R/contributors.rmd b/source/_R/contributors.rmd index 6b84750..61c68d8 100644 --- a/source/_R/contributors.rmd +++ b/source/_R/contributors.rmd @@ -24,52 +24,53 @@ groups <- gsub("_metadata.csv$", "", basename(f)) groups <- gsub("^carob_", "", groups) d <- lapply(f, read.csv) +d <- do.call(carobiner::bindr, d) + get_tab <- function(d, var) { - p <- lapply(1:length(d), \(i) { - n <- d[[i]][[var]] - n <- unlist(strsplit(n, ";")) - data.frame(id=i, p=trimws(n)) - } - ) - p <- do.call(rbind, p) + n <- d[[var]] + n <- strsplit(n, ";") + n <- lapply(1:length(n), \(i) { + if (length(n[[i]])== 0) return(NULL) + data.frame(id=i, p=trimws(unlist(n[[i]]))) + }) + p <- do.call(rbind, n) tab <- as.data.frame(table(p$p)) colnames(tab) <- c("name", "datasets") tab } - pp <- sapply(1:length(d), \(i) { - d[[i]][["authors"]] - } - ) |> unlist() +#pp <- sapply(1:length(d), \(i) { +# d[[i]][["authors"]] +# } +# ) |> unlist() get_authors <- function(d) { - p <- lapply(1:length(d), \(i) { - n <- d[[i]][["authors"]] - n <- unlist(strsplit(n, ";")) - data.frame(id=i, p=trimws(n)) - } - ) - p <- do.call(rbind, p) + n <- d[["authors"]] + n <- strsplit(n, ";") + n <- lapply(1:length(n), \(i) { + if (length(n[[i]])== 0) return(NULL) + data.frame(id=i, p=unlist(n[[i]])) + }) + p <- do.call(rbind, n) p$p <- gsub(",", ", ", p$p) p$p <- gsub(", ", ", ", p$p) p$p[grep("applicable", p$p, TRUE)] <- "" p$p <- gsub("Andrew J. McDonald", "McDonald, Andrew", p$p) p$p <- gsub("Sherpa R. Sonam", "Sherpa, Sonam", p$p) p$p <- gsub("Abubakar H.Inuwa", "Abubakar H. Inuwa", p$p) - p$p <- gsub("Wortmann, Charles$", "Wortmann, Charles S.", p$p) + p$p[grepl("Wortmann", p$p)] <- "Wortmann, Charles S." p$p <- gsub("Wiredu Alexanda Nimo", "Wiredu, Alexanda Nimo", p$p) p$p <- gsub("Bolo, Peter$", "Bolo, Peter Omondi", p$p) - nms1 <- c("Sherpa Sonam", "Poonia Shishpal", "Kumar Sunil", "Sharma Sachin", "Ajay Anurag", "Wu William", "Singh Balwinder", "McDonald Andrew", "Hood-Nowotny Rebbeca", "Majaliwa Jackson", "Tumuhairwe John-Baptist", "Quispe Katherine", "Okoth John", "Kyei-Boahen Stephen") nms2 <- gsub(" ", ", ", nms1) for (i in 1:length(nms1)) p$p <- gsub(nms1[i], nms2[i], p$p) i <- grepl(",|ICRAF|ABC|ICRISAT|SARI|IITA|ISRIC|CIMMYT|TLC|University|IWIN|ARI|Program|ILRI|Davis", p$p) - n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE) - p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", ")) + n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE) + p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", ")) p$p <- gsub("-, ", "", p$p) # "-, RHoMIS" p$p <- gsub("Anurag, Ajay", "Ajay, Anurag", p$p) p$p <- gsub("Sila, Andrew Musili", "Sila, Andrew", p$p) @@ -78,10 +79,14 @@ get_authors <- function(d) { p$p <- gsub("Tor Gunnar", "Tor-Gunnar", p$p) p$p <- gsub("Winowiecki, Leigh$", "Winowiecki, Leigh Ann", p$p) p$p <- gsub("Balemi, T$", "Balemi, Tesfaye", p$p) + p$p <- gsub("))", ")", p$p) + p$p[p$p == "World Agroforestry (ICRAF)"] <- "World Agroforestry Center (ICRAF)" + + p <- p[!grepl("ABC|ILRI|CIMMYT|ICRISAT|IITA|ICRAF|University of California", p$p), ] tab <- as.data.frame(table(p$p)) colnames(tab) <- c("name", "datasets") - tab[tab$name != "", ] + tab[tab$name != "", ] } @@ -89,6 +94,9 @@ cartab <- get_tab(d, "carob_contributor") autab <- get_authors(d) intab <- get_tab(d, "data_institute") +inst <- carobiner::accepted_values("institute") +intab <- merge(intab, inst, by="name", all.x=TRUE) +intab$name <- paste0('', intab$name, "") ``` diff --git a/source/_R/todo.rmd b/source/_R/todo.rmd index 9099ea3..1b45055 100644 --- a/source/_R/todo.rmd +++ b/source/_R/todo.rmd @@ -9,7 +9,7 @@ editor_options: carob_path = "../../../carob/" ``` -Below is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo). +Browse our to-do list below to pick a dataset you can Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo).
diff --git a/source/aggregated.rst b/source/aggregated.rst index 4f1452b..15a7c26 100644 --- a/source/aggregated.rst +++ b/source/aggregated.rst @@ -1,7 +1,21 @@ +.. raw:: html + +
+ Aggregated data =============== .. raw:: html - :file: _R/data.html +
+
+ + +.. raw:: html + :file: _R/aggregated.html + + +.. raw:: html + +
diff --git a/source/compile.rst b/source/compile.rst index f5003ee..09e1ac2 100644 --- a/source/compile.rst +++ b/source/compile.rst @@ -1,6 +1,17 @@ +.. raw:: html + +
+ Compile ======= +.. raw:: html + +
+
+ + + You can compile Carob data yourself if you have basic familiarity with the *git* and *R* software. 1. Install software @@ -27,3 +38,7 @@ You can compile Carob data yourself if you have basic familiarity with the *git* Once in a while, to **update** to the latest version, you can do ``git pull`` and then run the commands described under #3 again. It is also good to regularly install the latest version of "carobiner" with ``remotes::install_github("reagro/carobiner")``. + +.. raw:: html + +
diff --git a/source/conf.py b/source/conf.py index c56fa57..30543a2 100644 --- a/source/conf.py +++ b/source/conf.py @@ -167,7 +167,7 @@ def setup(app): #html_split_index = False # If true, links to the reST sources are added to the pages. -#html_show_sourcelink = True +html_show_sourcelink = False # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. html_show_sphinx = False diff --git a/source/contribute.rst b/source/contribute.rst index d29acb9..105ff9c 100644 --- a/source/contribute.rst +++ b/source/contribute.rst @@ -1,6 +1,17 @@ +.. raw:: html + +
+ + Contribute ========== +.. raw:: html + +
+
+ + *Carob* is an open-source community project to standardize and aggregate agricultural research data. Anyone is invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github `_ (in the ``scripts`` folder). @@ -20,3 +31,7 @@ To contribute you can follow these steps If this procedure is too complicated we can also work with you in other ways. You can always drop us an email at carob.data@gmail.com, or raise an `issue `_ + +.. raw:: html + +
diff --git a/source/contributors.rst b/source/contributors.rst index 34ecdee..5b6665a 100644 --- a/source/contributors.rst +++ b/source/contributors.rst @@ -1,8 +1,22 @@ +.. raw:: html + +
+ Contributors ============ +.. raw:: html + +
+
+ + Here we lists the contributors of Carob scripts and of the contributors of the data that we have processed. Names are followed by the number of scripts/datasets. .. raw:: html :file: _R/contributors.html + +.. raw:: html + +
diff --git a/source/done.rst b/source/done.rst index 39e450c..f394197 100644 --- a/source/done.rst +++ b/source/done.rst @@ -1,7 +1,20 @@ +.. raw:: html + +
+ Standardized data ================= +.. raw:: html + +
+
+ + .. raw:: html :file: _R/done.html +.. raw:: html + +
diff --git a/source/download.rst b/source/download.rst index ca32d71..3e61a21 100644 --- a/source/download.rst +++ b/source/download.rst @@ -1,6 +1,16 @@ +.. raw:: html + +
+ Download ======== +.. raw:: html + +
+
+ + From this page you can download aggregated data. Currently only the "fertilizer" and "maize_trials" groups are available here, but more will follow. Note that here you can only download data that has a Creative Commons `license `_. To get the other aggregated datasets, you can download and process the sources yourself. See below for instructions. @@ -80,3 +90,7 @@ Note that the data available here are new. They represent our first attempt to s + +.. raw:: html + +
diff --git a/source/index.rst b/source/index.rst index 15157f8..ce18ed2 100644 --- a/source/index.rst +++ b/source/index.rst @@ -1,18 +1,24 @@ .. intro +.. raw:: html + +
+ Carob ===== +.. raw:: html + +
+
.. image:: /_static/carob.png :width: 150 :alt: CAROB logo - :target: https://github.com/reagro/carob :align: left -*Carob* is the open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) framework supported by CGIAR `_ to facilitate agricultural research. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing `_! +*Carob* produces large standardized data sets to facilitate agricultural research. It is an open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) project supported by CGIAR `_. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing `_! -| Contact: carob.data@gmail.com @@ -27,6 +33,11 @@ Contact: carob.data@gmail.com :target: https://ucdavis.edu +.. raw:: html + +
+ + .. toctree:: :hidden: :maxdepth: 3 diff --git a/source/introduction.rst b/source/introduction.rst index 26d7493..26519d0 100644 --- a/source/introduction.rst +++ b/source/introduction.rst @@ -1,6 +1,15 @@ +.. raw:: html + +
+ Introduction ============ +.. raw:: html + +
+
+ *Carob* is a community project that uses a collaborative and open-source approach to standardize agricultural research data from experiments and surveys. We produce (1) scripts that standardize open research data and (2) aggregated data sets that can be used in research and development. We follow the `terminag `__ standard and use the `carobiner `__ *R* package to check for compliance, and to compile the data. @@ -15,3 +24,6 @@ We also hope that by using the `terminag `__ *Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR `_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing `_! +.. raw:: html + +
diff --git a/source/standard.rst b/source/standard.rst index cca638d..54981cf 100644 --- a/source/standard.rst +++ b/source/standard.rst @@ -1,11 +1,28 @@ +.. raw:: html + +
+ The standard ============ +.. raw:: html + +
+
+ *Carob* uses the **terminag** standard that defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values. The *terminag* standard can be used "stand-alone" for your own data, and as part of the data compilation done through the Carob project. The standard is defined in a number of tables that are available on the `termiag `__ github site and via the R package `carobiner `__. + .. raw:: html :file: _R/standard.html + +.. raw:: html + +
+ + + diff --git a/source/todo.rst b/source/todo.rst index 896a445..ecd1094 100644 --- a/source/todo.rst +++ b/source/todo.rst @@ -1,7 +1,22 @@ +.. raw:: html + +
+ To-do list ========== +.. raw:: html + +
+
+ + .. raw:: html :file: _R/todo.html + +.. raw:: html + +
+