Skip to content

Commit

Permalink
DOC: updates to subsample doc to reflect use of ceil and filtering of…
Browse files Browse the repository at this point in the history
… samples below a sum of n
  • Loading branch information
wasade committed May 2, 2024
1 parent 0c12df0 commit f3e4764
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions biom/table.py
Original file line number Diff line number Diff line change
Expand Up @@ -2914,7 +2914,8 @@ def subsample(self, n, axis='sample', by_id=False, with_replacement=False,
with_replacement : boolean, optional
If `False` (default), subsample without replacement. If `True`,
resample with replacement via the multinomial distribution.
Should not be `True` if `by_id` is `True`.
Should not be `True` if `by_id` is `True`. Important: If `True`,
samples with a sum below `n` are retained.
seed : int, optional
If provided, set the numpy random seed with this value
Expand All @@ -2931,14 +2932,16 @@ def subsample(self, n, axis='sample', by_id=False, with_replacement=False,
Notes
-----
Subsampling is performed without replacement. If `n` is greater than
the sum of a given vector, that vector is omitted from the result.
Adapted from `skbio.math.subsample`, see biom-format/licenses for more
information about scikit-bio.
If subsampling is performed without replacement, vectors with a sum
less than `n` are omitted from the result. This condition is not held
when operating with replacement.
This code assumes absolute abundance if `by_id` is False.
If subsampling with replacement, `np.ceil` is applied prior to
calculating p-values to ensure that low-abundance features have a
chance to be sampled.
Examples
--------
>>> import numpy as np
Expand Down

0 comments on commit f3e4764

Please sign in to comment.