Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise R practical #35

Open
PBBlomquist opened this issue Sep 12, 2024 · 4 comments
Open

Revise R practical #35

PBBlomquist opened this issue Sep 12, 2024 · 4 comments
Assignees

Comments

@PBBlomquist
Copy link
Contributor

No description provided.

@PBBlomquist PBBlomquist added this to the CS: R practical milestone Sep 12, 2024
@PBBlomquist PBBlomquist self-assigned this Sep 12, 2024
@PBBlomquist
Copy link
Contributor Author

PBBlomquist commented Sep 12, 2024

Edits made for draft version. They include:

  • Objectives + scenario more immediately visible at the top
  • More narrative throughout
  • Most important details about case study at the top in a box. Other detail in original table removed (e.g. we don't need to give the title in the title and then again in a table). Info about authors/version/terms etc made shorter and put to the bottom
  • Instructions on using case study removed - we'd like to put in a different page
  • Changed structure to make more sense and there were fewer headings/subheadings. Also removed a lot of verbose text
  • Made the 'optional' stuff not optional - it was about data quality checking and that should not be seen as optional
  • Removal of the statistical test (it had no explanation, was a bit random ,seemingly encouraged people to generate p values without thinking about it)
  • Some visual changes, e.g. self-testing questions in a box

All in commit: 53beadb

@PBBlomquist
Copy link
Contributor Author

@alanahjansen

I've been moving the data for these case studies into the new package. I've realised that the six datasets loaded are actually just two files - one is a linelist and duplicated in different formats, the other is an aggregate table and also duplicated in different formats.

I don't think we need to load duplicate files to make the point that import() can handle different file types. We just need the linelist as one filetype and the aggregate data as another filetype.

Are you happy with me reducing the number of datasets in this practical from 6 to 2 (one csv and one xlsx)?

@alanahjansen
Copy link
Contributor

@PBBlomquist

As long as there are still two different file types to show the strength of import(), which you have clearly said above, I have no problem reducing the datasets down to two. I was wondering why there were so many to begin with.

Great job cleaning up the case study!

@PBBlomquist
Copy link
Contributor Author

Further edits made:

  • Clean up missing data (all unknown etc changed to NA) - so more compatible with tbl_summary() percentages.
  • In relation to prior point, added caveat of how to treat missing data (UNK vs NA vs Missing) and need to assess if appropriate combining
  • Only two datasets imported
  • Clarify that we use tabyl() during exploration and tbl_summary() for presentation-ready tables
  • Make narrative more fun at the start of each section
  • Break up long code chunks and text to have more explanation in between
  • Remove geom_histogram()
  • Move column generation into the relevant sections (e.g. create a specific column just before making the figure that needs, otherwise it's really random in cleaning section. OR would need to indicate more of the plan/intended outputs earlier on but that is redundant)

Still want to

  • Have a user review it
  • Change gender to sex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants