diff --git a/source/_R/aggregated.rmd b/source/_R/aggregated.rmd index 84caf76..3954f9f 100644 --- a/source/_R/aggregated.rmd +++ b/source/_R/aggregated.rmd @@ -56,7 +56,7 @@ x$Group <- kableExtra::cell_spec(x$Group, "html", link=xurl) cdate <- format(Sys.time(), "%e %B %Y") ``` -We aggregate agricultural research data by groups of data collected for similar goals and with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records. +We aggregate agricultural research **data by groups** of data collected for similar goals and with similar variables. The table below shows the current groups and the number of original datasets and records in each group. We also show these numbers for the datasets that have a Creative Commons (CC) [license](licenses.html). As of `r cdate`, we have processed `r sum(x$Datasets)` original [data sets](done.html) containing a total of `r format(sum(x$Records), big.mark=",")` records.
@@ -82,12 +82,12 @@ w <- project(w, "+proj=hatano") g <- graticule(60, 30, "+proj=hatano") v <- project(v, "+proj=hatano") -plot(g, col=gray(.95), background="azure", lwd=1.5, mar=c(.4, .2, 0, 0)) +plot(g, col=gray(.95), background="azure", lwd=1.5, mar=c(.4, .2, 0, 0), lab.cex=.4) plot(w, add=TRUE, col="light gray", border="white", lwd=1.5) points(v, col="pink", cex=.5) points(v, col="red", cex=.25) lines(w, col="dark gray", lwd=1) -text(-14000000, -9500000, paste("Carob locations (", cdate, ")"), xpd=TRUE, cex=.5) +halo(-12000000, -6500000, paste("Carob locations\n ", cdate), xpd=TRUE, cex=.6) ```

diff --git a/source/contribute.rst b/source/contribute.rst index b488363..e53eb3c 100644 --- a/source/contribute.rst +++ b/source/contribute.rst @@ -12,7 +12,7 @@ Contribute
-*Carob* is an open-source community project to standardize and aggregate agricultural research data. You are invited to contribute to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github `_ (in the ``scripts`` folder). +*Carob* is an open-source community project to standardize and aggregate agricultural research data. You are invited to **contribute** to *Carob* by contributing `R` scripts for datasets of interest, or by improving existing scripts. All scripts are available on `github `_ (in the ``scripts`` folder). To contribute you can follow these steps diff --git a/source/contributors.rst b/source/contributors.rst index 38bcfa8..511d0a1 100644 --- a/source/contributors.rst +++ b/source/contributors.rst @@ -11,7 +11,7 @@ Contributors
-Here are the contributors of Carob scripts, and of the data that we have processed. Names are followed by the number of scripts or datasets. +We thank the **contributors** of Carob scripts, and the providers of the data that they have standardized. Names are followed by the number of scripts or datasets. .. raw:: html :file: _R/contributors.html diff --git a/source/introduction.rst b/source/introduction.rst index e1b9bc5..81fbd5f 100644 --- a/source/introduction.rst +++ b/source/introduction.rst @@ -10,17 +10,17 @@ Introduction
-*Carob* is a community project that uses a collaborative and open-source approach to standardize agricultural research data from experiments and surveys. We produce (1) scripts that standardize open research data and (2) aggregated data sets that can be used in research and development. +*Carob* is a community project that uses a collaborative and open-source approach to **standardize and aggregate** open agricultural research data from experiments and surveys. The goal is to facilitate the further use of these data in research and development. -We follow the `terminag `__ standard and use the `carobiner `__ *R* package to check for compliance, and to compile the data. The data that we have compiled so far are described `here `_. You can download some of the compiled data from this site; and you can also use the scripts to generate all the data `yourself `__. +The project used *R* scripts to standardize individual data sets. We follow the `terminag `__ standard and use the `carobiner `__ *R* package to check for compliance, and to compile the data. The data that we have compiled so far are described `here `_. You can download some of the standardized data from this site; and you can also use the scripts to generate all the data `yourself `__. -There now is a substantial amount of raw primary research data available, especially from the `CGIAR `_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example `_). Unfortunately, it is very time consuming to re-use research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, units, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming. +There now is a substantial amount of raw primary research data available, especially from the `CGIAR `_ international agricultural research centers. This provides ample opportunity to combines these data to address important additional research questions (here is `an example `_). Unfortunately, it can be very difficult and time consuming to re-use existing research data. This is because, with a few exceptions, each dataset is organized differently. Datasets have their own set of variable names, accepted values, units, and file structures. Even two files *within* a dataset may have discrepancies. Moreover, the published data is often incomplete and needs to be augmented with information gleaned from publications. Most datasets also have mistakes, especially in the location data and spelling. These mistakes can often be corrected (or removed), but doing that can be very time consuming. *Carob* scripts solve this problem and make research data accessible for reuse. -We also hope that by using the `terminag `__ standard; and the tools to check datasets for being compliant (implemented in the *R* package "carobiner" available on `github `_), researchers will be able to improve the quality of the datasets that they create. This would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published. +We also hope that by using the `terminag `__ standard; and the tools to check datasets for being compliant (implemented in the *R* package "carobiner" available on `github `_), researchers will be able to improve the quality of the new datasets that they create. First and foremost, because this would make their own research more efficient and effective. Their work would also have more visibility and impact, as more people would work with the data once they are published. -*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR `_ to support predictive agronomy analytics. All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing `_! +*Carob* is the *Extract, Transform, and Load* `(ETL) framework supported by CGIAR `_ to support predictive agronomy analytics (machine learning). All data transformations are done with *R* scripts, making it easy to enhance the standardization process as needs arise, and to correct mistakes. Please consider `contributing `_! .. raw:: html diff --git a/source/standard.rst b/source/standard.rst index d60301e..d8fe4eb 100644 --- a/source/standard.rst +++ b/source/standard.rst @@ -10,7 +10,7 @@ Standard
-*Carob* uses the **terminag** standard that defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values. +*Carob* uses the *terminag* standard** that defines a controlled vocabulary of variable names, their units, and acceptable (ranges of) values. The *terminag* standard can be used "stand-alone" for your own data, and as part of the data compilation done through the Carob project.