Skip to content

Commit

Permalink
m
Browse files Browse the repository at this point in the history
  • Loading branch information
rhijmans committed Jul 15, 2024
1 parent 0290e84 commit 625dcf6
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 11 deletions.
6 changes: 3 additions & 3 deletions source/_R/aggregated.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ x$Group <- kableExtra::cell_spec(x$Group, "html", link=xurl)
cdate <- format(Sys.time(), "%e %B %Y")
```

We aggregate agricultural research data by groups of data collected for similar goals and with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records.
We aggregate agricultural research **data by groups** of data collected for similar goals and with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records.

<br>

Expand All @@ -82,12 +82,12 @@ w <- project(w, "+proj=hatano")
g <- graticule(60, 30, "+proj=hatano")
v <- project(v, "+proj=hatano")
plot(g, col=gray(.95), background="azure", lwd=1.5, mar=c(.4, .2, 0, 0))
plot(g, col=gray(.95), background="azure", lwd=1.5, mar=c(.4, .2, 0, 0), lab.cex=.4)
plot(w, add=TRUE, col="light gray", border="white", lwd=1.5)
points(v, col="pink", cex=.5)
points(v, col="red", cex=.25)
lines(w, col="dark gray", lwd=1)
text(-14000000, -9500000, paste("Carob locations (", cdate, ")"), xpd=TRUE, cex=.5)
halo(-12000000, -6500000, paste("Carob locations\n ", cdate), xpd=TRUE, cex=.6)
```
</br>
</br>
Expand Down
2 changes: 1 addition & 1 deletion source/contribute.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Contribute
<div style="visibility: visible;">


*Carob* is an open-source community project to standardize and aggregate agricultural research data. You are invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github <https://github.com/reagro/carob/>`_ (in the ``scripts`` folder).
*Carob* is an open-source community project to standardize and aggregate agricultural research data. You are invited to **contribute** to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github <https://github.com/reagro/carob/>`_ (in the ``scripts`` folder).

To contribute you can follow these steps

Expand Down
2 changes: 1 addition & 1 deletion source/contributors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Contributors
<div style="visibility: visible;">


Here are the contributors of Carob scripts, and of the data that we have processed. Names are followed by the number of scripts or datasets.
We thank the **contributors** of Carob scripts, and the providers of the data that they have standardized. Names are followed by the number of scripts or datasets.

.. raw:: html
:file: _R/contributors.html
Expand Down
10 changes: 5 additions & 5 deletions source/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ Introduction
</div>
<div style="visibility: visible;">

*Carob* is a community project that uses a collaborative and open-source approach to standardize agricultural research data from experiments and surveys. We produce (1) scripts that standardize open research data and (2) aggregated data sets that can be used in research and development.
*Carob* is a community project that uses a collaborative and open-source approach to **standardize and aggregate** open agricultural research data from experiments and surveys. The goal is to facilitate the further use of these data in research and development.

We follow the `terminag <https://github.com/reagro/terminag>`__ standard and use the `carobiner <https://github.com/reagro/carobiner>`__ *R* package to check for compliance, and to compile the data. The data that we have compiled so far are described `here <data.html>`_. You can download some of the compiled data from this site; and you can also use the scripts to generate all the data `yourself <compile.html>`__.
The project used *R* scripts to standardize individual data sets. We follow the `terminag <https://github.com/reagro/terminag>`__ standard and use the `carobiner <https://github.com/reagro/carobiner>`__ *R* package to check for compliance, and to compile the data. The data that we have compiled so far are described `here <data.html>`_. You can download some of the standardized data from this site; and you can also use the scripts to generate all the data `yourself <compile.html>`__.

There now is a substantial amount of raw primary research data available, especially from the `CGIAR <https://gardian.bigdata.cgiar.org>`_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example <https://www.nature.com/articles/s43016-021-00370-1>`_). Unfortunately, it is very time consuming to re-use research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, units, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming.
There now is a substantial amount of raw primary research data available, especially from the `CGIAR <https://gardian.bigdata.cgiar.org>`_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example <https://www.nature.com/articles/s43016-021-00370-1>`_). Unfortunately, it can be very difficult and time consuming to re-use existing research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, units, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming.

*Carob* scripts solve this problem and make research data accessible for reuse.

We also hope that by using the `terminag <https://github.com/reagro/terminag>`__ standard; and the tools to check datasets for being compliant (implemented in the *R* package "carobiner" available on `github <https://github.com/reagro/carobiner>`_), researchers will be able to improve the quality of the datasets that they create. This would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published.
We also hope that by using the `terminag <https://github.com/reagro/terminag>`__ standard; and the tools to check datasets for being compliant (implemented in the *R* package "carobiner" available on `github <https://github.com/reagro/carobiner>`_), researchers will be able to improve the quality of the new datasets that they create. First and foremost, because this would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published.

*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!
*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support predictive agronomy analytics (machine learning). All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!

.. raw:: html

Expand Down
2 changes: 1 addition & 1 deletion source/standard.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Standard
</div>
<div style="visibility: visible;">

*Carob* uses the **terminag** standard that defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values.
*Carob* uses the *terminag* standard** that defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values.

The *terminag* standard can be used "stand-alone" for your own data, and as part of the data compilation done through the Carob project.

Expand Down

0 comments on commit 625dcf6

Please sign in to comment.