Skip to content

Latest commit

 

History

History
76 lines (50 loc) · 3.85 KB

DATA.md

File metadata and controls

76 lines (50 loc) · 3.85 KB

Data flow

To help you follow the data, the following section shows the overall data flow. This gives us an indication of the different I/O steps involved.

Main Route:
XLSForm > Kobo Dashboard > Raw Survey Data XLS > R Scripts


Sub Route 1:
R Scripts > Bivariate Statistics Table > PG Database for API use > Crosstabs Visualization API


Sub Route 2:
R Scripts > Univariate Statistics Table > PG Database for API use > Unvariate analysis visualization API (similar to ODP)

Notes:

  1. We need to carefully design the input and output tables/JSONs/CSVs at different stages.
  2. Different people are involved at different stages. We need to synchronize our variables names and data formats efficiently.
  3. Ideally, we want no manual steps. Therefore, practically, we would want to minimize.

Data processing

This section lists out all the things I did for data processing. This should aid us in future projects as well.

Data related things to do at XLSForm

  1. Rename variables: You should pay careful attention to the names you assign to different questions and option variables throughout your XLSForm (read about XLSForm here).

  2. Tip 1: Have a system for naming that helps you easily call variables in R. When designing your own conventions, think about whether that convention is easy for others to understand as well.

As an example, in our worker's question, we've used suffixes (i_ for impact, p_ for preparedness, m_ for metadata, o_ for outlook, b_ for baseline). We have also grouped similar columns( e.g.,_econ for questions around economic effects of Covid19, _lvlhd_ for effects to workers' livelihoods. Finally we've tried using shorter names (like shrt_names). However, when naming variables, we've prioritized readability and specificity over name length (Good name p_econ_hhd_items_pre_covid Bad name: p_e_h_pc)

  1. Tip 2: Don't fret too much. This is an iterative process. Note however, that once you go live, don't change variable names. You've got until then to experiment and come up with a usable, legible naming convention for your variables.

  2. Tip 3: In the XLS form, the columns you need to work with are:

    1. "name" column in the "survey" sheet
    2. "list_name" column in the "choices" sheet

Data related things to do at Kobo Dashboard stage

  1. Use appropriate download settings, here's what we use. Note that the "Group Seperator" - though disabled - is set to "__" (two underscores.) This is important and must be followed.

  1. Not much else, Bhawak uploads the file and deploys the survey here.

Data related things to do at Raw Survey Data XLS:

Nothing. This is the idea, to not play with the data file. Everything must be doneeiather before this stage or after this stage.

Data related things to do at R Scripts stage:

  1. ✔️ Generate tables for API use: a. univariate stats table b. Bivartiate stats table c. ...other? d. ❌ Is it exhaustive?
  2. ✔️ Finalize variable names for API use (i.e., finalize data format contract). The keys, values, labels, if set here, can simplify integration to Django.
  3. Isolate single-select, multiselect variables: You have to go through the questionnaire and do this question by question.
  4. ❓ Think of what to do for branched variables
  5. ❓ Map variable names to respective labels in English and Nepali.
  6. More:
    • write reusable, general functions
    • simplify code
    • use comments
    • name properly
    • etc.

Where are the files/content?

  1. XLSForm can be downloaded from the Kobo dashboard.
  2. Raw Survey Data XLS can also be downloaded from the Kobo dashboard. Be sure to use the above shown settings.
  3. Workers survey form: https://ee.humanitarianresponse.info/x/RGuatGl6