-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-harmonization data wrangling (purgatory!) #5
Comments
This comment has been minimized.
This comment has been minimized.
See "Project 2" in Boatyard
Data from Arctic LTER (2006lgdhbmcn.csv) This dataset is a mix of long and wide format??? Primarily the issue is with the B#Q# columns. These columns measure the dry weight of samples collected for a block and quad. From EDI metadata:
There are 3 blocks and 4 quads within each block. Need to tease them apart before it can be harmonized. Additionally, Species column is a mix of species and functional groups. Look to #3 (comment) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
NICK - Dont fix this one - Sally has this dataset in GEx, and will put it in. KLEE VEG HITS JUNE- 2024.xls This dataset is an excel file where each treatment X site is a sheet in the file. Treatment: O, C, W, WC, MW, MWC Site: N, C, S These sheets have extra headers of rows that will need to be removed and we will need to combine all of the treatments into one file. Still trying to understand the sampling within each sheet (trap station and type of hits). Emailed Tyler to see if he knows this. |
Data from Harvard forest LTER Dataset has individual observations of trees in rows (i.e., each row represents one tree seedling). need to summarize data by site, treatment, and species. This will result in counts of tree seedlings of each species for each treatment and site. Columns "height.class" and "browsed" can be discarded. Dataset in purgatory with file name: hf174-06-tree-seedlings-2010.csv --> source: lter-harvard_newengland_plantcover_2008_2019_moose_treeseedling |
Data from Zaneveld et al. 2016 data file includes plot means and standard errors in columns (every other column is mean, standard error). need to remove columns that give standard errors. Dataset in purgatory with file name: 41467_2016_BFncomms11833_MOESM1571_ESM |
Data in purgatory called boer-ca-n4.csv Output file should be called gex_boer-ca-n4_grazing_year_grazers_plants.csv The problem with this one is that it has 9 exp_name (or in the spreadsheet reserve_site), and each one has different numbers of grazed/ungrazed plots. We need to randomly subset so that grazed and ungrazed have the same number of plots. So for example in 12BOUC, one of the ungrazed needs to be dropped. In 13MENK 5 ungrazed need dropped I have already uploaded the metadata folder with the stuff I have using the gex_boer-ca-n4_grazing_year_grazers_plants folder name |
Some datasets present data in a different format from what we want. These require some data wrangling to get them into the format that we want. For these datasets we downloaded the data, wrangled into the desired format in a short script, and then uploaded the wrangled dataset to the google drive. The script used to wrangle those data goes in the "basement". <-actually not going to put it in the basement. going to put it in a different folder called "purgatory". will decide tomorrow
lter-arc_alaska_toolik_1999_mammals_plants.csv -- this dataset has biomass data on several parts plants (i.e., blade, sheath, inflor) for each species. we want total plant biomass so I summed those to get total biomass of that species.
The text was updated successfully, but these errors were encountered: