pre-harmonization data wrangling (purgatory!) #5

kellyspeare · 2025-01-28T01:04:45Z

Some datasets present data in a different format from what we want. These require some data wrangling to get them into the format that we want. For these datasets we downloaded the data, wrangled into the desired format in a short script, and then uploaded the wrangled dataset to the google drive. The script used to wrangle those data goes in the "basement". <-actually not going to put it in the basement. going to put it in a different folder called "purgatory". will decide tomorrow

lter-arc_alaska_toolik_1999_mammals_plants.csv -- this dataset has biomass data on several parts plants (i.e., blade, sheath, inflor) for each species. we want total plant biomass so I summed those to get total biomass of that species.

hillarykrumbholz · 2025-01-28T18:33:35Z

See "Project 2" in Boatyard

need to contact the authors to get more information about this dataset

Data from Arctic LTER (2006lgdhbmcn.csv)

https://portal.edirepository.org/nis/metadataviewer?packageid=knb-lter-arc.10025.6

This dataset is a mix of long and wide format??? Primarily the issue is with the B#Q# columns. These columns measure the dry weight of samples collected for a block and quad. From EDI metadata:

oven dried weight of sample collected in a quadrat. (Code of variable name: B for block and the number following the B is the block number. The Q stands for quadrat with the number following the Q being the quadrat number)

There are 3 blocks and 4 quads within each block.
example:
B1Q1
B1Q2
B1Q3
B1Q4
B2Q1......

Need to tease them apart before it can be harmonized.

Additionally, Species column is a mix of species and functional groups. Look to #3 (comment)

JamieMcDevittIrwin · 2025-02-13T19:22:05Z

NICK - Dont fix this one - Sally has this dataset in GEx, and will put it in.

KLEE VEG HITS JUNE- 2024.xls
(sent to JMI by Tyler Cloverdale), https://www.tandfonline.com/doi/abs/10.1080/10220119.1997.9647929

This dataset is an excel file where each treatment X site is a sheet in the file.

Treatment: O, C, W, WC, MW, MWC
O= all large herbivores excluded, C= cattle allowed, W= wildlife allowed (i think this refers to large mammalian herbivores?), M= megaherbivores allowed

Site: N, C, S
N= North, C= Central, S= South

These sheets have extra headers of rows that will need to be removed and we will need to combine all of the treatments into one file.

Still trying to understand the sampling within each sheet (trap station and type of hits). Emailed Tyler to see if he knows this.

kellyspeare · 2025-02-14T00:06:41Z

Data from Harvard forest LTER

Dataset has individual observations of trees in rows (i.e., each row represents one tree seedling). need to summarize data by site, treatment, and species. This will result in counts of tree seedlings of each species for each treatment and site. Columns "height.class" and "browsed" can be discarded.

Dataset in purgatory with file name: hf174-06-tree-seedlings-2010.csv

--> source: lter-harvard_newengland_plantcover_2008_2019_moose_treeseedling

kellyspeare · 2025-02-14T02:03:56Z

Data from Zaneveld et al. 2016

data file includes plot means and standard errors in columns (every other column is mean, standard error). need to remove columns that give standard errors.

Dataset in purgatory with file name: 41467_2016_BFncomms11833_MOESM1571_ESM
--> source: burkepile_florida_herbvr_2009-2012_fish_benthic.csv

SallyKoerner · 2025-02-14T15:35:50Z

Data in purgatory called boer-ca-n4.csv

Output file should be called gex_boer-ca-n4_grazing_year_grazers_plants.csv

The problem with this one is that it has 9 exp_name (or in the spreadsheet reserve_site), and each one has different numbers of grazed/ungrazed plots.
CA_HASTINGS_12BOUC --- 7/8
CA_HASTINGS_13MENK --- 32/37
CA_HASTINGS_14KNOP --- 70/71
CA_HASTINGS_2GRIFF --- 8/26
CA_HASTINGS_3GRIFF --- 35/41
CA_HASTINGS_4GRIFF --- 19/19
CA_HASTINGS_5GRIFF --- 10/17
CA_HASTINGS_7MUICK --- 8/4
CA_HASTINGS_8MUICK --- 6/5

We need to randomly subset so that grazed and ungrazed have the same number of plots. So for example in 12BOUC, one of the ungrazed needs to be dropped. In 13MENK 5 ungrazed need dropped

I have already uploaded the metadata folder with the stuff I have using the gex_boer-ca-n4_grazing_year_grazers_plants folder name

JamieMcDevittIrwin added the documentation Improvements or additions to documentation label Jan 28, 2025

JamieMcDevittIrwin changed the title ~~datasets that need coarse wrangling to get data into necessary format~~ pre-harmonization: datasets that need coarse wrangling to get data into necessary format (purgatory!) Jan 28, 2025

This comment has been minimized.

Sign in to view

JamieMcDevittIrwin changed the title ~~pre-harmonization: datasets that need coarse wrangling to get data into necessary format (purgatory!)~~ pre-harmonization data wrangling (purgatory!) Jan 28, 2025

This comment has been minimized.

Sign in to view

kellyspeare closed this as completed Jan 28, 2025

hillarykrumbholz reopened this Jan 28, 2025

This comment has been minimized.

Sign in to view

njlyon0 added a commit that referenced this issue Jan 30, 2025

feat (boatyard): purgatory dataset 4 repaired (see #5)

09df59d

njlyon0 added a commit that referenced this issue Jan 30, 2025

feat (boatyard): project 5 purgatory repair complete (see #5)

d275bc2

njlyon0 added a commit that referenced this issue Jan 30, 2025

feat (boatyard): 'project 6' purgatory files repaired (see #5)

154a03a

njlyon0 added the bug Something isn't working label Jan 30, 2025

This comment has been minimized.

Sign in to view

njlyon0 added a commit that referenced this issue Jan 30, 2025

feat (boatyard): finished repairing 'project 7' (see #5)

401ebea

njlyon0 mentioned this issue Jan 30, 2025

[CAGED] Help with Harmonization Workflow lter/scicomp#34

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-harmonization data wrangling (purgatory!) #5

pre-harmonization data wrangling (purgatory!) #5

kellyspeare commented Jan 28, 2025 •

edited

Loading

This comment has been minimized.

hillarykrumbholz commented Jan 28, 2025 •

edited by njlyon0

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

JamieMcDevittIrwin commented Feb 13, 2025 •

edited by SallyKoerner

Loading

kellyspeare commented Feb 14, 2025 •

edited

Loading

kellyspeare commented Feb 14, 2025

SallyKoerner commented Feb 14, 2025

pre-harmonization data wrangling (purgatory!) #5

pre-harmonization data wrangling (purgatory!) #5

Comments

kellyspeare commented Jan 28, 2025 • edited Loading

This comment has been minimized.

hillarykrumbholz commented Jan 28, 2025 • edited by njlyon0 Loading

See "Project 2" in Boatyard

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

JamieMcDevittIrwin commented Feb 13, 2025 • edited by SallyKoerner Loading

kellyspeare commented Feb 14, 2025 • edited Loading

kellyspeare commented Feb 14, 2025

SallyKoerner commented Feb 14, 2025

kellyspeare commented Jan 28, 2025 •

edited

Loading

hillarykrumbholz commented Jan 28, 2025 •

edited by njlyon0

Loading

JamieMcDevittIrwin commented Feb 13, 2025 •

edited by SallyKoerner

Loading

kellyspeare commented Feb 14, 2025 •

edited

Loading