no title

reagro · Jul 13, 2024 · 9972cd5 · 9972cd5
1 parent 3ab3060
commit 9972cd5
Show file tree

Hide file tree

Showing 17 changed files with 186 additions and 34 deletions.
diff --git a/_script/knit_site.R b/_script/knit_site.R
@@ -54,4 +54,5 @@ setwd(path)
 do_knit(option, quiet=TRUE)
 setwd(oldpath)
 warnings()
+Sys.sleep(1)
 
diff --git a/_theme/templates/footer.html b/_theme/templates/footer.html
@@ -4,8 +4,9 @@
 
 <p style="text-align:right;">
 <small>
-© 2023-2024, the authors &nbsp;&nbsp;&mdash;&nbsp;&nbsp; <a href="https://github.com/reagro/carob-data">source</a>
+© 2024
 </small>
 </p>
 
-{% endblock %} –
+{% endblock %} –
+
diff --git a/makesite.bat b/makesite.bat
@@ -1,8 +1,8 @@
 @ECHO OFF
 
+rm -r build\html
 Rscript.exe --vanilla _script\knit_site.R clean
 Rscript.exe --vanilla _script\copy_reports.R 
-rm -r build\html
 make html
 
 
diff --git a/source/_R/data.rmd → source/_R/aggregated.rmd b/source/_R/data.rmd → source/_R/aggregated.rmd
@@ -5,7 +5,6 @@ editor_options:
   chunk_output_type: console
 ---
 
-
 ```{r setup, include=FALSE}
 carob_path = "../../../carob/"
 
@@ -53,7 +52,9 @@ x$Group <- kableExtra::cell_spec(x$Group, "html", link=xurl)
 cdate <- format(Sys.time(), "%e %B %Y")
 ```
 
-We aggregate [standardized](done.html) agricultural research data by groups that have similar variables. Below is a table with the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). 
+We aggregate [standardized](done.html) agricultural research data by groups of data collected for similar goals and with similar variables. These groups make it easier for us to organize our work but it is important to note that they are not mutually exclusive. For example the "fertilizer" group aggregates experiments and surveys with crop yield and fertilizer application data. While the emphasis of the "agronomy", "survey", and "varieties" data is different, they may also contain fertilizer application data. Likewise, the "varieties" data are about comparing crop varieties, but variety names are also reported in the "fertilizer" group. This means that you may want to consider using data from multiple groups. The maize and wheat varieties have their own groups because of the large amount of data in these groups, and because they have some unique terms.
+
+The table below shows the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). 
 
 As of `r  cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records. 
 
@@ -67,6 +68,7 @@ kable(x, "html", escape = FALSE) |> kable_classic(full_width = FALSE) |>
   kable_styling(bootstrap_options = c("striped", "hover")) 
 ```
 
+
 <br>
 
 Here is a map with all locations for which we have at least one observation.

diff --git a/source/_R/contributors.rmd b/source/_R/contributors.rmd
@@ -24,52 +24,53 @@ groups <- gsub("_metadata.csv$", "", basename(f))
 groups <- gsub("^carob_", "", groups)
 
 d <- lapply(f, read.csv)
+d <- do.call(carobiner::bindr, d)
+
 
 get_tab <- function(d, var) {
-	p <- lapply(1:length(d), \(i) {
-				n <- d[[i]][[var]]
-				n <- unlist(strsplit(n, ";"))
-				data.frame(id=i, p=trimws(n))
-			}
-		)
-	p <- do.call(rbind, p)
+	n <- d[[var]]
+	n <- strsplit(n, ";")
+	n <- lapply(1:length(n), \(i) {
+		if (length(n[[i]])== 0) return(NULL)
+		data.frame(id=i, p=trimws(unlist(n[[i]])))
+	})
+	p <- do.call(rbind, n)
 	tab <- as.data.frame(table(p$p))
 	colnames(tab) <- c("name", "datasets")
 	tab 
 }
 
-	pp <- sapply(1:length(d), \(i) {
-				d[[i]][["authors"]]
-			}
-		) |> unlist()
+#pp <- sapply(1:length(d), \(i) {
+#			d[[i]][["authors"]]
+#			}
+#		) |> unlist()
 
 
 get_authors <- function(d) {
-	p <- lapply(1:length(d), \(i) {
-				n <- d[[i]][["authors"]]
-				n <- unlist(strsplit(n, ";"))
-				data.frame(id=i, p=trimws(n))
-			}
-		)
-	p <- do.call(rbind, p)
+	n <- d[["authors"]]
+	n <- strsplit(n, ";")
+	n <- lapply(1:length(n), \(i) {
+		if (length(n[[i]])== 0) return(NULL)
+		data.frame(id=i, p=unlist(n[[i]]))
+	})
+	p <- do.call(rbind, n)
 	p$p <- gsub(",", ", ", p$p)
 	p$p <- gsub(",  ", ", ", p$p)
 	p$p[grep("applicable", p$p, TRUE)] <- ""
 	p$p <- gsub("Andrew J. McDonald", "McDonald, Andrew", p$p) 
 	p$p <- gsub("Sherpa R. Sonam", "Sherpa, Sonam", p$p) 
 	p$p <- gsub("Abubakar H.Inuwa", "Abubakar H. Inuwa", p$p) 
-	p$p <- gsub("Wortmann, Charles$", "Wortmann, Charles S.", p$p) 
+	p$p[grepl("Wortmann", p$p)] <- "Wortmann, Charles S."
 	p$p <- gsub("Wiredu Alexanda Nimo", "Wiredu, Alexanda Nimo", p$p) 
 	p$p <- gsub("Bolo, Peter$", "Bolo, Peter Omondi", p$p) 
-	  
 
 	nms1 <- c("Sherpa Sonam", "Poonia Shishpal", "Kumar Sunil", "Sharma Sachin", "Ajay Anurag", "Wu William", "Singh Balwinder", "McDonald Andrew", "Hood-Nowotny Rebbeca", "Majaliwa Jackson", "Tumuhairwe John-Baptist", "Quispe Katherine", "Okoth John",  "Kyei-Boahen Stephen")
 	nms2 <- gsub(" ", ", ", nms1)
 	for (i in 1:length(nms1)) p$p <- gsub(nms1[i], nms2[i], p$p)
 
 	i <- grepl(",|ICRAF|ABC|ICRISAT|SARI|IITA|ISRIC|CIMMYT|TLC|University|IWIN|ARI|Program|ILRI|Davis", p$p)
-  n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE)
-  p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", "))
+	n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE)
+	p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", "))
 	p$p <- gsub("-, ", "", p$p) # "-, RHoMIS"
 	p$p <- gsub("Anurag, Ajay", "Ajay, Anurag", p$p)
 	p$p <- gsub("Sila, Andrew Musili", "Sila, Andrew", p$p) 
@@ -78,17 +79,24 @@ get_authors <- function(d) {
 	p$p <- gsub("Tor Gunnar", "Tor-Gunnar", p$p) 
 	p$p <- gsub("Winowiecki, Leigh$", "Winowiecki, Leigh Ann", p$p) 
 	p$p <- gsub("Balemi, T$", "Balemi, Tesfaye", p$p)
+	p$p <- gsub("))", ")", p$p)
+	p$p[p$p == "World Agroforestry (ICRAF)"] <- "World Agroforestry Center (ICRAF)"
+	
+	p <- p[!grepl("ABC|ILRI|CIMMYT|ICRISAT|IITA|ICRAF|University of California", p$p), ]
 	
 	tab <- as.data.frame(table(p$p))
 	colnames(tab) <- c("name", "datasets")
-  tab[tab$name != "", ]
+	tab[tab$name != "", ]
 }
 
 
 cartab <- get_tab(d, "carob_contributor")
 autab <- get_authors(d)
 
 intab <- get_tab(d, "data_institute")
+inst <- carobiner::accepted_values("institute")
+intab <- merge(intab, inst, by="name", all.x=TRUE)
+intab$name <- paste0('<a href="https://', intab$URL, '">', intab$name, "</a>")
 
 ```
 

diff --git a/source/_R/todo.rmd b/source/_R/todo.rmd
@@ -9,7 +9,7 @@ editor_options:
 carob_path = "../../../carob/"
 ```
 
-Below is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo). 
+Browse our to-do list below to pick a dataset you can Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo). 
 
 </br>
 

diff --git a/source/aggregated.rst b/source/aggregated.rst
@@ -1,7 +1,21 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Aggregated data
 ===============
 
 .. raw:: html
-   :file: _R/data.html
 
+   </div>
+   <div style="visibility: visible;">
+
+
+.. raw:: html
+   :file: _R/aggregated.html
+
+
+.. raw:: html
+
+   </div>
 
diff --git a/source/compile.rst b/source/compile.rst
@@ -1,6 +1,17 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Compile
 =======
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
+
+
 You can compile Carob data yourself if you have basic familiarity with the *git* and *R* software. 
 
 1. Install software
@@ -27,3 +38,7 @@ You can compile Carob data yourself if you have basic familiarity with the *git*
 
   Once in a while, to **update** to the latest version, you can do ``git pull`` and then run the commands described under #3 again. It is also good to regularly install the latest version of "carobiner" with ``remotes::install_github("reagro/carobiner")``. 
 
+
+.. raw:: html
+
+   </div>
diff --git a/source/conf.py b/source/conf.py
@@ -167,7 +167,7 @@ def setup(app):
 #html_split_index = False
 
 # If true, links to the reST sources are added to the pages.
-#html_show_sourcelink = True
+html_show_sourcelink = False
 
 # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
 html_show_sphinx = False

diff --git a/source/contribute.rst b/source/contribute.rst
@@ -1,6 +1,17 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
+
 Contribute
 ==========
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
+
 *Carob* is an open-source community project to standardize and aggregate agricultural research data.
 
 Anyone is invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github <https://github.com/reagro/carob/>`_ (in the ``scripts`` folder).
@@ -20,3 +31,7 @@ To contribute you can follow these steps
 
 If this procedure is too complicated we can also work with you in other ways. You can always drop us an email at [email protected], or raise an `issue <https://github.com/reagro/carob/issues>`_
 
+
+.. raw:: html
+
+   </div>
diff --git a/source/contributors.rst b/source/contributors.rst
@@ -1,8 +1,22 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Contributors
 ============
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
+
 Here we lists the contributors of Carob scripts and of the contributors of the data that we have processed. Names are followed by the number of scripts/datasets.
 
 .. raw:: html
    :file: _R/contributors.html
 
+
+.. raw:: html
+
+   </div>
diff --git a/source/done.rst b/source/done.rst
@@ -1,7 +1,20 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Standardized data
 =================
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
+
 .. raw:: html
    :file: _R/done.html
 
 
+.. raw:: html
+
+   </div>
diff --git a/source/download.rst b/source/download.rst
@@ -1,6 +1,16 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Download
 ========
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
+
 From this page you can download aggregated data. Currently only the "fertilizer" and "maize_trials" groups are available here, but more will follow. Note that here you can only download data that has a Creative Commons `license <licenses.html>`_. 
 
 To get the other aggregated datasets, you can download and process the sources yourself. See below for instructions.
@@ -80,3 +90,7 @@ Note that the data available here are new. They represent our first attempt to s
     </embed>
 
 
+
+.. raw:: html
+
+   </div>
diff --git a/source/index.rst b/source/index.rst
@@ -1,18 +1,24 @@
 .. intro 
 
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Carob
 =====
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
 
 .. image:: /_static/carob.png
   :width: 150
   :alt: CAROB logo
-  :target: https://github.com/reagro/carob
   :align: left
 
-*Carob* is the open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to facilitate agricultural research. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_! 
+*Carob* produces large standardized data sets to facilitate agricultural research. It is an open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) project supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_! 
 
-|
 
 Contact: [email protected]
 
@@ -27,6 +33,11 @@ Contact: [email protected]
   :target: https://ucdavis.edu
 
 
+.. raw:: html
+
+   </div>
+
+
 .. toctree::
 	:hidden:
 	:maxdepth: 3

diff --git a/source/introduction.rst b/source/introduction.rst
@@ -1,6 +1,15 @@
+.. raw:: html
+
+   <div style="visibility: hidden;">
+
 Introduction
 ============
 
+.. raw:: html
+
+   </div>
+   <div style="visibility: visible;">
+
 *Carob* is a community project that uses a collaborative and open-source approach to standardize agricultural research data from experiments and surveys. We produce (1) scripts that standardize open research data and (2) aggregated data sets that can be used in research and development.
 
 We follow the `terminag <https://github.com/reagro/terminag>`__ standard and use the `carobiner <https://github.com/reagro/carobiner>`__ *R* package to check for compliance, and to compile the data.
@@ -15,3 +24,6 @@ We also hope that by using the `terminag <https://github.com/reagro/terminag>`__
 
 *Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_! 
 
+.. raw:: html
+
+   </div>
-Original file line number
+Diff line change
@@ Expand Up / @@ -9,7 +9,7 @@ editor_options: @@
     carob_path = "../../../carob/"
     ```
-    Below is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo).
+    Browse our to-do list below to pick a dataset you can Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo).
     </br>
@@ Expand Down @@