Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group boxplot subarray #26

Merged
merged 3 commits into from
Dec 11, 2023
Merged

Conversation

amanda-hi
Copy link
Contributor

Overview of Pull Request

Fixes #24

Main changes

  • Updated internal plotting data frame to include feature columns from an ADAT, enabling the user to perform a facet_wrap() on a figure created with boxplotSubarray(). No arguments/parameters were added to the function.

Change type

Please check the relevant box(es):

Choose reviewer(s)

Reviewer by Department

Department Reviewer Change Type
Bioinformatics @amanda-hi Code, bugs, features
@stufield bugs, features
Legal @SLbmangum LICENSE
Product @kmurugesan14 Documentation
Regulatory @nmcnabbSL Documentation

- all_of() should be used to select multiple columns in dplyr::select()
- previously, a warning was thrown in boxplotSubarray() when features (SeqIds) were selected
  to generate the plotting dataset
- all_of() was added to the dplyr::select() call to resolve this warning
@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@23152e8). Click here to learn what that means.

❗ Current head f89b817 differs from pull request most recent head b09a292. Consider uploading reports for the commit b09a292 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #26   +/-   ##
=======================================
  Coverage        ?   42.89%           
=======================================
  Files           ?       29           
  Lines           ?     1119           
  Branches        ?        0           
=======================================
  Hits            ?      480           
  Misses          ?      639           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amanda-hi amanda-hi force-pushed the group-boxplotSubarray branch from d27b9cb to 66b980d Compare November 29, 2023 21:11
Copy link
Contributor

@stufield stufield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a few minor comments.

feats <- getAnalytes(.data)

if ( !all(reqd_cols %in% names(.data)) ) {
msng_idx <- which(!reqd_cols %in% names(.data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would use msng <- setdiff(reqd_cols, names(.data)) ...

Copy link
Contributor Author

@amanda-hi amanda-hi Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely a better option! I was overthinking this, setdiff() is much simpler.

#'
#' @family boxplots
#' @param .data A `soma_data` or data frame object created via a call to
#' [read_adat()].
#' @param .data A `soma_data` or data frame object, created from a SomaScan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unification of args issue ... what do we want to call .data? I can't remember why I called it this ... but we might want to think about converging on an argname for when it's specifically a soma_adat object that's expected, even if a data.frame would ultimately suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout SomaPlotr, the plotting functions that take a soma_adat or data.frame as input typically use either .data or data as the argument. I was thinking of having them all converge on data, but that is pretty general and doesn't suggest a soma_adat specifically. Would adat be more appropriate? I think this change (unifying this argument name across the entire package) will be a separate commit, but I can add it to this PR.

Copy link
Contributor

@stufield stufield Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's beyond scope for this PR, but just something to think about.
I would probably steer you towards data=, simply because 1) it's more common in the R universe, 2) it's used in ggplot2, 3) might want to avoid confusion with rlang::.data pronoun that's pretty common. Just some things to think about, though likely down the road when thinking about converging/aligning arg-names.

#' the term "subarray" is analogous to sample, and typically indicates a row
#' or sample in the data.
#' Plots the distribution of all analytes, stratified by subarray, as a
#' boxplot. These plots are intended to be used as a quality control (QC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to consider rewording so as not to confuse with QC samples which we commonly label QC in ADATs. I would think there's a way to word-smith to avoid the potential double-meaning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I'll try to make the statement about QC more specific so it doesn't cause confusion.

#' boxplot. These plots are intended to be used as a quality control (QC)
#' visualization tool for SomaScan data. In SomaScan (`soma_adat`) data
#' format, the term "subarray" is analogous to sample, and typically indicates
#' a row or sample in the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "or sample" since we already say that prior to the comma.

- updated documentation to describe required feature columns
- added vectors of required feature columns to utils.R
- incorporated feature columns into internal data object (plot_data), enabling
  the use of facet_wrap() on plots created with boxplotSubarray()
- added example of grouping boxplotSubarray() plots by a feature column
  to the documentation
- fixes SomaLogic#24
@amanda-hi amanda-hi force-pushed the group-boxplotSubarray branch from 66b980d to b09a292 Compare December 4, 2023 19:01
@amanda-hi amanda-hi merged commit e013c80 into SomaLogic:main Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add grouping capability to boxplotSubarray()
3 participants