Skip to content

Commit

Permalink
Add example of a reshape to wide format
Browse files Browse the repository at this point in the history
  • Loading branch information
paulrougieux committed Jan 10, 2022
1 parent 3f0dd3a commit 6f19562
Showing 1 changed file with 33 additions and 1 deletion.
34 changes: 33 additions & 1 deletion FAOSTAT/vignettes/FAOSTAT.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ In order to access to an indicator in FAOSTAT using the API, three pieces of inf

The \code{getFAOtoSYB} is a wrapper for the \code{getFAO} to batch download data, it supports error recovery and stores the status of the download. The function also splits the data downloaded into entity level and regional aggregates, saving time for the user. Query results from \code{FAOsearch} can also be used.

In some cases multiple China are provided. In the FAOSTAT database for example, the trade domain provides data on China mainland (faostat country code = 41), Taiwan (faostat country code = 214) and China plus Taiwan (faostat country code = 357). In some other datasets it is also possible to find China plus Taiwan plus Macao plus Hong Kong (faostat country code = 351). The \code{CHMT} function avoids double counting if multiple China are detected by removing the more aggregated entities if detected. The default in \code{getFAOtoSYB} is to use \code{CHMT} when possible. It is important to perform this check before the aggregation step in order to avoid duble counting. This means that not necessarely this operation needs to be done at the data collection stage. This can be done also at a later stage using the \code{FAOcheck} function (or the \code{CHMT} function directly).
In some cases multiple China are provided. In the FAOSTAT database for example, the trade domain provides data on China mainland (faostat country code = 41), Taiwan (faostat country code = 214) and China plus Taiwan (faostat country code = 357). In some other datasets it is also possible to find China plus Taiwan plus Macao plus Hong Kong (faostat country code = 351). The \code{CHMT} function avoids double counting if multiple China are detected by removing the more aggregated entities if detected. The default in \code{getFAOtoSYB} is to use \code{CHMT} when possible. It is important to perform this check before the aggregation step in order to avoid double counting. This means that not necessarily this operation needs to be done at the data collection stage. This can be done also at a later stage using the \code{FAOcheck} function (or the \code{CHMT} function directly).

<<FAO-check, eval=FALSE>>=
FAOchecked.df = FAOcheck(var = FAOquery.df$varName, year = "Year",
Expand Down Expand Up @@ -282,6 +282,38 @@ Given the lack of an internationally recognized standard which incorporates all
merged.df = mergeSYB(FAOchecked.df, WB.lst$entity, outCode = "FAOST_CODE")
@

\section{Reshape data to the wide "non normalized" format}

The dataset locations returned by `FAOsearch()` point to the "normalized"
version of the data, compatible with the tidy data mindset. The "normalized"
data format is a long format, better for analysis in the tidy-data mindset as
described by Hadley Wickham in \url{https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html}.

\quotation{
In tidy data:
\begin{itemize}
\item: Every column is a variable.
\item: Every row is an observation.
\item: Every cell is a single value.
\end{itemize}
}

In case you want the data in long format, you can reshape it with:

<<FAO-search, eval=FALSE>>=
library(tidyr)
# Reuse the data folder created above
data_folder <- "data_raw"
dir.create(data_folder)
# Load food balance data
fbs <- get_faostat_bulk("FBS", data_folder)
# Reshape to wide format
fbs_wide <- pivot_wider(fbs, names_from=year, values_from=value)
@


\section{Scale data to basic unit}

Warning: this section needs to be updated. Contributions and pull requests are welcomed at \url{https://gitlab.com/paulrougieux/faostatpackage/}.
Expand Down

0 comments on commit 6f19562

Please sign in to comment.