Skip to content

Commit

Permalink
expansion
Browse files Browse the repository at this point in the history
  • Loading branch information
rhijmans committed Feb 24, 2024
1 parent 13d0cd4 commit dfc8a6c
Show file tree
Hide file tree
Showing 14 changed files with 83 additions and 38 deletions.
2 changes: 1 addition & 1 deletion _theme/templates/footer.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<p style="text-align:right;">
<small>
© 2023, the authors &nbsp;&nbsp;&mdash;&nbsp;&nbsp; <a href="https://github.com/reagro/carob-data">source</a>
© 2023-2024, the authors &nbsp;&nbsp;&mdash;&nbsp;&nbsp; <a href="https://github.com/reagro/carob-data">source</a>
</small>
</p>

Expand Down
2 changes: 1 addition & 1 deletion makesite.bat
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
@ECHO OFF

Rscript.exe --vanilla _script\knit_site.R clean
make html
Rscript.exe --vanilla _script\copy_reports.R
make html


8 changes: 6 additions & 2 deletions source/_R/todo.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,23 @@ carob_path = "../../../carob/"
library(kableExtra)
```

This is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize.
Below is our to-do list. Feel free to browse the list and pick a dataset you want to Carobize.
Or use `Gardian <https://gardian.bigdata.cgiar.org>`_ to discover new datasets.


|
|

```{r todo, echo=FALSE}
carobiner:::update_todo(carob_path)
#carobiner:::update_todo(carob_path)
ftodo <- file.path(carob_path, "todo", "to-do.csv")
x <- read.csv(ftodo)
x <- x[, c("uri", "title", "crop", "country", "provider")]
uri <- gsub("https://doi.org/", "doi:", x$uri)
uri <- gsub("https://hdl.handle.net/", "hdl:", uri)
x$uri <- paste0('<a href="', x$uri, '">', uri,'</a>')
x$title <- paste0(x$title, ". ", x$provider, ". ", x$uri)
x$provider <- x$uri <- NULL
DT::datatable(x, escape=FALSE, rownames=FALSE)
```

7 changes: 7 additions & 0 deletions source/aggregated.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Aggregated data
===============

.. raw:: html
:file: _R/data.html


12 changes: 12 additions & 0 deletions source/compile.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Compile
=======

To compile Carob data you can use follow these steps

1. **fork** the *Carob* `repo <https://github.com/reagro/carob/>`_ to your github account.
2. **clone** the forked repo to your computer.
3. **install** `R` package "carobiner" with ``remotes::install_github("reagro/carobiner")``
4. In the command line, go to repo and run "build.bat" (or build.sh on linux systems)
5. Use the files in the "data/compiled" folder


9 changes: 4 additions & 5 deletions source/contribute.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
Contribute
==========

*Carob* is an open-source community project that standardizes and aggregates agricultural research data.
*Carob* is an open-source community project to standardize and aggregate agricultural research data.

Anyone is invited to contribute to *Carob* by writing an `R` script for a dataset of interest, or by improving an existing script. All scripts are available on the `github <https://github.com/reagro/carob/>`_ site (in the ``scripts`` folder).
Anyone is invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on the `github <https://github.com/reagro/carob/>`_ site (in the ``scripts`` folder).

A great place to discover new data sets is the `Gardian <https://gardian.bigdata.cgiar.org>`_ website.
Also see our `to-do list <todo.html>`_ for ideas (and check our `done list <done.html>`_ to make sure you do not replicate what has already been done).
A great place to discover new data sets is the `Gardian <https://gardian.bigdata.cgiar.org>`_ search engine. You can also look at our `to-do list <todo.html>`_ for ideas (and do check our `done list <done.html>`_ to make sure you do not work on a dataset that has already been processed).

The best approach to contributing is to follow these steps
To contribute you can follow these steps

1. **fork** the *Carob* `repo <https://github.com/reagro/carob/>`_ to your github account.
2. **clone** the forked repo to your computer.
Expand Down
8 changes: 8 additions & 0 deletions source/contributors.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Contributors
============

Here is a table with the names of *Carob* contributors and the number of datasets that they have contributed.

.. raw:: html
:file: _R/contributors.html

12 changes: 0 additions & 12 deletions source/data.rst

This file was deleted.

6 changes: 2 additions & 4 deletions source/done.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
:orphan:

Datasets
========
Standardized data
=================

.. raw:: html
:file: _R/done.html
Expand Down
6 changes: 4 additions & 2 deletions source/download.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Download
========

Compile and download a *Carob* dataset. Currently only the "Response to fertilizer" group is made available here, but more will follow. Note that here you can only download data that has a CC `license <licenses.html>`_. To compile all available data, and get the lastest updates, you can download and process the sources yourself. See the *Carob* github `repo <https://github.com/reagro/carob/>`_ for instructions.
From this page you can download aggregated data. Currently only the "Response to fertilizer" group is made available here, but more will follow. Note that here you can only download data that has a Creative Commons `license <licenses.html>`_.

Note that the data here are a rough first attempt. It will likely have some errors. We are aiming to have a set of much cleaner and better documented datasets available by the end of 2023.
To get the other aggregated datasets, you can download and process the sources yourself. See `this page <compile.html>`_ for instructions.

Note that the data available here are new. They represent our first attempt to standardize widely variable data. There will likely be some errors from the orginial data that remain, or errors that we have introduced. We are aiming to provide cleaner and better documented datasets durig the course of 2024.

.. raw:: html

Expand Down
18 changes: 9 additions & 9 deletions source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,8 @@
Carob
=====

The *Carob* project cleans and transforms agricultural research data from experiments and surveys into a standard format, and aggregates individual data sets into larger databases that can be used in further research. The data that we have compiled so far is described `here <data.html>`_. You can `download <download.html>`_ compiled data from this site, or generate them yourself using the *Carob* `scripts <https://github.com/reagro/carob>`_.

This is an open-source, collaborative, community project to which you can `contribute <contribute.html>`_! All data transformations are done with *R* scripts, making it easy to enhance the workflows as needs arise, and to correct mistakes.

There now is a substantial amount of raw primary research data available, especially from the `CGIAR <https://gardian.bigdata.cgiar.org>`_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example <https://www.nature.com/articles/s43016-021-00370-1>`_). Unfortunately, it is very time consuming to re-use research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming.

This is the problem that *Carob* aims to solve. *Carob* makes it much easier to reuse raw research data. Once a script has been written to standardize a dataset, these data can be readily used by others as well. Or you can using a script and expand it, for example, to include additional variables, without having to start from scratch.

We also hope that by using the *Carob standards* and the tools to check datasets for being compliant (implemented in the *R* package "carobiner"), researchers will be able to improve the quality of the datasets that they create. This would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published.
*Carob* is the open-source, collaborative and community based *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support agricultural research. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!


.. image:: /_static/carob.png
Expand All @@ -28,6 +21,13 @@ We also hope that by using the *Carob standards* and the tools to check datasets
:hidden:
:maxdepth: 3

data
introduction
standard
done
aggregated
download
compile
contribute
contributors
todo

20 changes: 20 additions & 0 deletions source/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Introduction
============

*Carob* is an open-source, collaborative and community project that provides

- A data standard for agricultural research data from experiments and surveys
- Software to check for compliance with this standard
- Scripts to transform open research data into a standard format
- Aggregated data sets that can be used in research and development

The data that we have compiled so far is described `here <data.html>`_. You can `download <download.html>`_ compiled data from this site, or generate them yourself using the *Carob* `scripts <https://github.com/reagro/carob>`_.

There now is a substantial amount of raw primary research data available, especially from the `CGIAR <https://gardian.bigdata.cgiar.org>`_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example <https://www.nature.com/articles/s43016-021-00370-1>`_). Unfortunately, it is very time consuming to re-use research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, units, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming.

*Carob* scripts solve this problem and make research data accessible for reuse.

We also hope that by using the *Carob standards* and the tools to check datasets for being compliant (implemented in the *R* package "carobiner" available on `github <https://github.com/reagro/carobiner>`_), researchers will be able to improve the quality of the datasets that they create. This would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published.

*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR <https://www.cgiar.org/initiative/excellence-in-agronomy/>`_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing <contribute.html>`_!

9 changes: 9 additions & 0 deletions source/standard.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
The standard
============

The *Carob* standard defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values.

The standard can be used "stand-alone" for your own data, and as part of the data compilation done through the Carob project.

The standard is defined in a number of tables that are available on the github site and via the R package carobiner. The standard is not an ontology in the sense that, with very few exceptions, we do not semantically relate variables to each other.

2 changes: 0 additions & 2 deletions source/todo.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
:orphan:

To-do list
==========

Expand Down

0 comments on commit dfc8a6c

Please sign in to comment.