Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-harmonization data wrangling (purgatory!) #5

Open
kellyspeare opened this issue Jan 28, 2025 · 15 comments
Open

pre-harmonization data wrangling (purgatory!) #5

kellyspeare opened this issue Jan 28, 2025 · 15 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@kellyspeare
Copy link
Collaborator

kellyspeare commented Jan 28, 2025

Some datasets present data in a different format from what we want. These require some data wrangling to get them into the format that we want. For these datasets we downloaded the data, wrangled into the desired format in a short script, and then uploaded the wrangled dataset to the google drive. The script used to wrangle those data goes in the "basement". <-actually not going to put it in the basement. going to put it in a different folder called "purgatory". will decide tomorrow

lter-arc_alaska_toolik_1999_mammals_plants.csv -- this dataset has biomass data on several parts plants (i.e., blade, sheath, inflor) for each species. we want total plant biomass so I summed those to get total biomass of that species.

@JamieMcDevittIrwin JamieMcDevittIrwin added the documentation Improvements or additions to documentation label Jan 28, 2025
@JamieMcDevittIrwin JamieMcDevittIrwin changed the title datasets that need coarse wrangling to get data into necessary format pre-harmonization: datasets that need coarse wrangling to get data into necessary format (purgatory!) Jan 28, 2025
@JamieMcDevittIrwin

This comment has been minimized.

@hillarykrumbholz
Copy link
Collaborator

hillarykrumbholz commented Jan 28, 2025

See "Project 2" in Boatyard

  • need to contact the authors to get more information about this dataset

Data from Arctic LTER (2006lgdhbmcn.csv)

This dataset is a mix of long and wide format??? Primarily the issue is with the B#Q# columns. These columns measure the dry weight of samples collected for a block and quad. From EDI metadata:

oven dried weight of sample collected in a quadrat. (Code of variable name: B for block and the number following the B is the block number. The Q stands for quadrat with the number following the Q being the quadrat number)

There are 3 blocks and 4 quads within each block.
example:
B1Q1
B1Q2
B1Q3
B1Q4
B2Q1......

Need to tease them apart before it can be harmonized.

Additionally, Species column is a mix of species and functional groups. Look to #3 (comment)

@JamieMcDevittIrwin JamieMcDevittIrwin changed the title pre-harmonization: datasets that need coarse wrangling to get data into necessary format (purgatory!) pre-harmonization data wrangling (purgatory!) Jan 28, 2025
@JamieMcDevittIrwin

This comment has been minimized.

@hillarykrumbholz

This comment has been minimized.

@JamieMcDevittIrwin

This comment has been minimized.

@JamieMcDevittIrwin

This comment has been minimized.

@JamieMcDevittIrwin

This comment has been minimized.

@JamieMcDevittIrwin

This comment has been minimized.

@njlyon0

This comment has been minimized.

@JamieMcDevittIrwin

This comment has been minimized.

@njlyon0

This comment has been minimized.

@JamieMcDevittIrwin
Copy link
Collaborator

JamieMcDevittIrwin commented Feb 13, 2025

NICK - Dont fix this one - Sally has this dataset in GEx, and will put it in.

KLEE VEG HITS JUNE- 2024.xls
(sent to JMI by Tyler Cloverdale), https://www.tandfonline.com/doi/abs/10.1080/10220119.1997.9647929

This dataset is an excel file where each treatment X site is a sheet in the file.

Treatment: O, C, W, WC, MW, MWC
O= all large herbivores excluded, C= cattle allowed, W= wildlife allowed (i think this refers to large mammalian herbivores?), M= megaherbivores allowed

Site: N, C, S
N= North, C= Central, S= South

These sheets have extra headers of rows that will need to be removed and we will need to combine all of the treatments into one file.

Still trying to understand the sampling within each sheet (trap station and type of hits). Emailed Tyler to see if he knows this.

@kellyspeare
Copy link
Collaborator Author

kellyspeare commented Feb 14, 2025

Data from Harvard forest LTER

Dataset has individual observations of trees in rows (i.e., each row represents one tree seedling). need to summarize data by site, treatment, and species. This will result in counts of tree seedlings of each species for each treatment and site. Columns "height.class" and "browsed" can be discarded.

Dataset in purgatory with file name: hf174-06-tree-seedlings-2010.csv

--> source: lter-harvard_newengland_plantcover_2008_2019_moose_treeseedling

@kellyspeare
Copy link
Collaborator Author

Data from Zaneveld et al. 2016

data file includes plot means and standard errors in columns (every other column is mean, standard error). need to remove columns that give standard errors.

Dataset in purgatory with file name: 41467_2016_BFncomms11833_MOESM1571_ESM
--> source: burkepile_florida_herbvr_2009-2012_fish_benthic.csv

@SallyKoerner
Copy link
Collaborator

Data in purgatory called boer-ca-n4.csv

Output file should be called gex_boer-ca-n4_grazing_year_grazers_plants.csv

The problem with this one is that it has 9 exp_name (or in the spreadsheet reserve_site), and each one has different numbers of grazed/ungrazed plots.
CA_HASTINGS_12BOUC --- 7/8
CA_HASTINGS_13MENK --- 32/37
CA_HASTINGS_14KNOP --- 70/71
CA_HASTINGS_2GRIFF --- 8/26
CA_HASTINGS_3GRIFF --- 35/41
CA_HASTINGS_4GRIFF --- 19/19
CA_HASTINGS_5GRIFF --- 10/17
CA_HASTINGS_7MUICK --- 8/4
CA_HASTINGS_8MUICK --- 6/5

We need to randomly subset so that grazed and ungrazed have the same number of plots. So for example in 12BOUC, one of the ungrazed needs to be dropped. In 13MENK 5 ungrazed need dropped

I have already uploaded the metadata folder with the stuff I have using the gex_boer-ca-n4_grazing_year_grazers_plants folder name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants