Skip to content

Commit

Permalink
no title
Browse files Browse the repository at this point in the history
  • Loading branch information
rhijmans committed Jul 13, 2024
1 parent 3ab3060 commit 9972cd5
Show file tree
Hide file tree
Showing 17 changed files with 186 additions and 34 deletions.
1 change: 1 addition & 0 deletions _script/knit_site.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,5 @@ setwd(path)
do_knit(option, quiet=TRUE)
setwd(oldpath)
warnings()
Sys.sleep(1)

5 changes: 3 additions & 2 deletions _theme/templates/footer.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@

<p style="text-align:right;">
<small>
© 2023-2024, the authors &nbsp;&nbsp;&mdash;&nbsp;&nbsp; <a href="https://github.com/reagro/carob-data">source</a>
© 2024
</small>
</p>

{% endblock %} –
{% endblock %} –

2 changes: 1 addition & 1 deletion makesite.bat
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
@ECHO OFF

rm -r build\html
Rscript.exe --vanilla _script\knit_site.R clean
Rscript.exe --vanilla _script\copy_reports.R
rm -r build\html
make html


6 changes: 4 additions & 2 deletions source/_R/data.rmd → source/_R/aggregated.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ editor_options:
chunk_output_type: console
---


```{r setup, include=FALSE}
carob_path = "../../../carob/"
Expand Down Expand Up @@ -53,7 +52,9 @@ x$Group <- kableExtra::cell_spec(x$Group, "html", link=xurl)
cdate <- format(Sys.time(), "%e %B %Y")
```

We aggregate [standardized](done.html) agricultural research data by groups that have similar variables. Below is a table with the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html).
We aggregate [standardized](done.html) agricultural research data by groups of data collected for similar goals and with similar variables. These groups make it easier for us to organize our work but it is important to note that they are not mutually exclusive. For example the "fertilizer" group aggregates experiments and surveys with crop yield and fertilizer application data. While the emphasis of the "agronomy", "survey", and "varieties" data is different, they may also contain fertilizer application data. Likewise, the "varieties" data are about comparing crop varieties, but variety names are also reported in the "fertilizer" group. This means that you may want to consider using data from multiple groups. The maize and wheat varieties have their own groups because of the large amount of data in these groups, and because they have some unique terms.

The table below shows the current groups and the number of original datasets and records in each group. The groups that end on "_trials" have multi-location variety trial data. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html).

As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records.

Expand All @@ -67,6 +68,7 @@ kable(x, "html", escape = FALSE) |> kable_classic(full_width = FALSE) |>
kable_styling(bootstrap_options = c("striped", "hover"))
```


<br>

Here is a map with all locations for which we have at least one observation.
Expand Down
54 changes: 31 additions & 23 deletions source/_R/contributors.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,52 +24,53 @@ groups <- gsub("_metadata.csv$", "", basename(f))
groups <- gsub("^carob_", "", groups)
d <- lapply(f, read.csv)
d <- do.call(carobiner::bindr, d)
get_tab <- function(d, var) {
p <- lapply(1:length(d), \(i) {
n <- d[[i]][[var]]
n <- unlist(strsplit(n, ";"))
data.frame(id=i, p=trimws(n))
}
)
p <- do.call(rbind, p)
n <- d[[var]]
n <- strsplit(n, ";")
n <- lapply(1:length(n), \(i) {
if (length(n[[i]])== 0) return(NULL)
data.frame(id=i, p=trimws(unlist(n[[i]])))
})
p <- do.call(rbind, n)
tab <- as.data.frame(table(p$p))
colnames(tab) <- c("name", "datasets")
tab
}
pp <- sapply(1:length(d), \(i) {
d[[i]][["authors"]]
}
) |> unlist()
#pp <- sapply(1:length(d), \(i) {
# d[[i]][["authors"]]
# }
# ) |> unlist()
get_authors <- function(d) {
p <- lapply(1:length(d), \(i) {
n <- d[[i]][["authors"]]
n <- unlist(strsplit(n, ";"))
data.frame(id=i, p=trimws(n))
}
)
p <- do.call(rbind, p)
n <- d[["authors"]]
n <- strsplit(n, ";")
n <- lapply(1:length(n), \(i) {
if (length(n[[i]])== 0) return(NULL)
data.frame(id=i, p=unlist(n[[i]]))
})
p <- do.call(rbind, n)
p$p <- gsub(",", ", ", p$p)
p$p <- gsub(", ", ", ", p$p)
p$p[grep("applicable", p$p, TRUE)] <- ""
p$p <- gsub("Andrew J. McDonald", "McDonald, Andrew", p$p)
p$p <- gsub("Sherpa R. Sonam", "Sherpa, Sonam", p$p)
p$p <- gsub("Abubakar H.Inuwa", "Abubakar H. Inuwa", p$p)
p$p <- gsub("Wortmann, Charles$", "Wortmann, Charles S.", p$p)
p$p[grepl("Wortmann", p$p)] <- "Wortmann, Charles S."
p$p <- gsub("Wiredu Alexanda Nimo", "Wiredu, Alexanda Nimo", p$p)
p$p <- gsub("Bolo, Peter$", "Bolo, Peter Omondi", p$p)
nms1 <- c("Sherpa Sonam", "Poonia Shishpal", "Kumar Sunil", "Sharma Sachin", "Ajay Anurag", "Wu William", "Singh Balwinder", "McDonald Andrew", "Hood-Nowotny Rebbeca", "Majaliwa Jackson", "Tumuhairwe John-Baptist", "Quispe Katherine", "Okoth John", "Kyei-Boahen Stephen")
nms2 <- gsub(" ", ", ", nms1)
for (i in 1:length(nms1)) p$p <- gsub(nms1[i], nms2[i], p$p)
i <- grepl(",|ICRAF|ABC|ICRISAT|SARI|IITA|ISRIC|CIMMYT|TLC|University|IWIN|ARI|Program|ILRI|Davis", p$p)
n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE)
p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", "))
n <- strsplit(p$p[!i], " \\s*(?=[^ ]+$)", perl=TRUE)
p$p[!i] <- sapply(n, \(i) paste0(rev(i), collapse=", "))
p$p <- gsub("-, ", "", p$p) # "-, RHoMIS"
p$p <- gsub("Anurag, Ajay", "Ajay, Anurag", p$p)
p$p <- gsub("Sila, Andrew Musili", "Sila, Andrew", p$p)
Expand All @@ -78,17 +79,24 @@ get_authors <- function(d) {
p$p <- gsub("Tor Gunnar", "Tor-Gunnar", p$p)
p$p <- gsub("Winowiecki, Leigh$", "Winowiecki, Leigh Ann", p$p)
p$p <- gsub("Balemi, T$", "Balemi, Tesfaye", p$p)
p$p <- gsub("))", ")", p$p)
p$p[p$p == "World Agroforestry (ICRAF)"] <- "World Agroforestry Center (ICRAF)"
p <- p[!grepl("ABC|ILRI|CIMMYT|ICRISAT|IITA|ICRAF|University of California", p$p), ]
tab <- as.data.frame(table(p$p))
colnames(tab) <- c("name", "datasets")
tab[tab$name != "", ]
tab[tab$name != "", ]
}
cartab <- get_tab(d, "carob_contributor")
autab <- get_authors(d)
intab <- get_tab(d, "data_institute")
inst <- carobiner::accepted_values("institute")
intab <- merge(intab, inst, by="name", all.x=TRUE)
intab$name <- paste0('<a href="https://', intab$URL, '">', intab$name, "</a>")
```

Expand Down
2 changes: 1 addition & 1 deletion source/_R/todo.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ editor_options:
carob_path = "../../../carob/"
```

Below is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo).
Browse our to-do list below to pick a dataset you can Carobize. You can also use the [Gardian](https://gardian.bigdata.cgiar.org) search engine to discover new datasets. Since there can be a delay in updating what is shown here, before you start working on a dataset, you should check with `carobiner::on_github` if it has already been done (and has been added to the github repo).

</br>

Expand Down
16 changes: 15 additions & 1 deletion source/aggregated.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
.. raw:: html

<div style="visibility: hidden;">

Aggregated data
===============

.. raw:: html
:file: _R/data.html

</div>
<div style="visibility: visible;">


.. raw:: html
:file: _R/aggregated.html


.. raw:: html

</div>

15 changes: 15 additions & 0 deletions source/compile.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
.. raw:: html

<div style="visibility: hidden;">

Compile
=======

.. raw:: html

</div>
<div style="visibility: visible;">



You can compile Carob data yourself if you have basic familiarity with the *git* and *R* software.

1. Install software
Expand All @@ -27,3 +38,7 @@ You can compile Carob data yourself if you have basic familiarity with the *git*

Once in a while, to **update** to the latest version, you can do ``git pull`` and then run the commands described under #3 again. It is also good to regularly install the latest version of "carobiner" with ``remotes::install_github("reagro/carobiner")``.


.. raw:: html

</div>
2 changes: 1 addition & 1 deletion source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def setup(app):
#html_split_index = False

# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
html_show_sourcelink = False

# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
html_show_sphinx = False
Expand Down
15 changes: 15 additions & 0 deletions source/contribute.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
.. raw:: html

<div style="visibility: hidden;">


Contribute
==========

.. raw:: html

</div>
<div style="visibility: visible;">


*Carob* is an open-source community project to standardize and aggregate agricultural research data.

Anyone is invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github <https://github.com/reagro/carob/>`_ (in the ``scripts`` folder).
Expand All @@ -20,3 +31,7 @@ To contribute you can follow these steps

If this procedure is too complicated we can also work with you in other ways. You can always drop us an email at [email protected], or raise an `issue <https://github.com/reagro/carob/issues>`_


.. raw:: html

</div>
14 changes: 14 additions & 0 deletions source/contributors.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,22 @@
.. raw:: html

<div style="visibility: hidden;">

Contributors
============

.. raw:: html

</div>
<div style="visibility: visible;">


Here we lists the contributors of Carob scripts and of the contributors of the data that we have processed. Names are followed by the number of scripts/datasets.

.. raw:: html
:file: _R/contributors.html


.. raw:: html

</div>
13 changes: 13 additions & 0 deletions source/done.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
.. raw:: html

<div style="visibility: hidden;">

Standardized data
=================

.. raw:: html

</div>
<div style="visibility: visible;">


.. raw:: html
:file: _R/done.html


.. raw:: html

</div>
14 changes: 14 additions & 0 deletions source/download.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
.. raw:: html

<div style="visibility: hidden;">

Download
========

.. raw:: html

</div>
<div style="visibility: visible;">


From this page you can download aggregated data. Currently only the "fertilizer" and "maize_trials" groups are available here, but more will follow. Note that here you can only download data that has a Creative Commons `license <licenses.html>`_.

To get the other aggregated datasets, you can download and process the sources yourself. See below for instructions.
Expand Down Expand Up @@ -80,3 +90,7 @@ Note that the data available here are new. They represent our first attempt to s
</embed>



.. raw:: html

</div>
17 changes: 14 additions & 3 deletions source/index.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
.. intro
.. raw:: html

<div style="visibility: hidden;">

Carob
=====

.. raw:: html

</div>
<div style="visibility: visible;">

.. image:: /_static/carob.png
:width: 150
:alt: CAROB logo
:target: https://github.com/reagro/carob
:align: left

*Carob* is the open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to facilitate agricultural research. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!
*Carob* produces large standardized data sets to facilitate agricultural research. It is an open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) project supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!

|

Contact: [email protected]

Expand All @@ -27,6 +33,11 @@ Contact: [email protected]
:target: https://ucdavis.edu


.. raw:: html

</div>


.. toctree::
:hidden:
:maxdepth: 3
Expand Down
12 changes: 12 additions & 0 deletions source/introduction.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
.. raw:: html

<div style="visibility: hidden;">

Introduction
============

.. raw:: html

</div>
<div style="visibility: visible;">

*Carob* is a community project that uses a collaborative and open-source approach to standardize agricultural research data from experiments and surveys. We produce (1) scripts that standardize open research data and (2) aggregated data sets that can be used in research and development.

We follow the `terminag <https://github.com/reagro/terminag>`__ standard and use the `carobiner <https://github.com/reagro/carobiner>`__ *R* package to check for compliance, and to compile the data.
Expand All @@ -15,3 +24,6 @@ We also hope that by using the `terminag <https://github.com/reagro/terminag>`__

*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!

.. raw:: html

</div>
Loading

0 comments on commit 9972cd5

Please sign in to comment.