conditionally and persisitently rare taxa #98

Daenarys8 · 2024-11-14T10:28:06Z

ping #83

Signed-off-by: Daena Rys <[email protected]>

R/addMetrics.R

TuomasBorman · 2024-11-14T12:25:22Z

R/addMetrics.R

+        # Calculate metrics for each split subject/group
+        results_list <- lapply(split_data, function(sub_data) {
+            # Initialize a list to hold results for each metric
+            metric_results <- list()
+
+            # Loop through each metric and calculate it
+            for (metric in metrics) {
+                # Check if the metric is supported and calculate accordingly
+                if (metric == "CRT") {
+                    crt_results <- .calculateCRT(sub_data, thresholds[["CRT"]])
+                    metric_results[["CRT"]] <- crt_results
+                } else {
+                    stop(paste("Metric", metric, "is not supported yet."), 
+                         call. = FALSE)
+                }
+            }
+
+            # Add the calculated metrics to the metadata (colData or rowData)
+            for (metric in names(metric_results)) {
+                colData(sub_data)[[metric]] <- metric_results[[metric]]
+            }
+
+            return(sub_data)
+        })


It would be more efficient to:

Get matrix

Get the list which has vectors as elements. Each vector represent members of each group

Loop through this list

Select the group members from the matrix

Calculate values

Combine all the results and sort values based on sample names

Add to colData (in add* method)

I am not convinced the right location should be colData as the interest is taxa related.

True, I followed the code and did not think about it. As it these features are global (for whole dataset), colData is not correct place indeed

I think the output could be similar to getPrevalent and getRare, i.e., vector

Hmm.. yes. If we have a single subject/group with time series, then this is a vector over features (or rowData) - one measure of persistence or rarity for each feature and rowData would be a suitable location.

If we have multiple subjects/groups with time series, then this would be features x groups matrix.

At simplest it could be implemented for just a single-group case, and rest left for the user.

If we support multi-group (multi-subject) analyses, then that is a more general design question and linked to #30

I think one vector could be calculated for whole dataset for now. As you mentioned, the multi-group thing is more general question, and is affecting also the mia functions

Yes - note that this can only be calculated in the case where the whole data set has some continuous variable that extends all samples (like is the case when there is time dimension for a single subject). If we have multiple groups then this is not defined for the whole data set.

I might have missed something, but I cannot see direct linkage of time series to this concept of conditionally/persistent rare/abundant taxa. It seems that this is more general concept?

Based on the following article, I summarized the concept

Sizhong Yang, Matthias Winkel, Dirk Wagner, Susanne Liebner, Community structure of rare methanogenic archaea: insight from a single functional group, FEMS Microbiology Ecology, Volume 93, Issue 11, November 2017, fix126, https://doi.org/10.1093/femsec/fix126

Rare taxa:

Average relative abundance below 1% accross all samples

Abundant taxa:

Average relative abundance above 1% accross all samples

Conditionally rare taxa:

microbial taxa that are typically in low abundance in one locality but occasionally become prevalent in at least one other sample

CRT was assigned if an OTU exhibits a maximum relative abundance at least 100 times higher than its minimum value (max:min > 100)

Permanently rare taxa:

If the (max/min) ratio is less than 5, that is the abundance rarely fluctuates among samples, it will be grouped into the permanently rare taxa (PER)

Rare and abundant taxa can be already calculated by mia functions:

getRare(tse, assay.type = "relabundance", detection = 1/100)
getPrevalent(tse, assay.type = "relabundance", detection = 1/100)

I see that these conditionally rare and permanently rare taxa are very close to getRare() and getPrevalent(). As they do not necessarily deal with time series, should these methods be in mia instead: getConditionallyRare() & getPermanentlyRare()

Yep, I mixed it up, no time series needed.

CRT seems like a bit strange concept in the end. I am not sure if there should be much difference whether the minimum count is 1, 2, or 3 reads but this can make a huge difference in the max:min calculation. But yes, it might be relevant to recognize taxa that have vast variation. Variance would not capture all these cases, if the blooms cover only few samples. Shouldn't this be called conditionally abundant, rather than conditionally rare..?

PER: is the rarity also evaluated, or just max/min ratio? some taxa might be abundant and prevalent, while having a low max/min ratio. At least in principle. Then it would be misleading to call them PER. The interesting PER are taxa that might be prevalent but always stay under a certain threshold and never bloom.

We could add the methods as suggested and it could go to mia. If this could be made into a more generic index function (like addAlpha for samples), that could help to simplify maintenance and adding possible other measures later. Would that be feasible..?

On the other if this seem time-consuming to finalize, it could be left for later.

Signed-off-by: Daena Rys <[email protected]>

TuomasBorman · 2024-11-28T13:36:59Z

I read this article that Leo shared in the issue: https://journals.asm.org/doi/10.1128/mbio.01371-14

In this article, conditionally rare taxa is determined based on coefficient of bimodality. They determined CRT for each system from where multiple time points were collected.

How should we proceed with these? CRT is determined with two different ways.

Proposal:

In mia, introduce new funtions:

getConditionallyRare() and getPermanentlyRare().

These functions would return a vector of taxa (only those ones that are classified as CRT or PRT). Similar to getRare() and getPrevalent().

These would follow the implementation introduced in this article

Sizhong Yang, Matthias Winkel, Dirk Wagner, Susanne Liebner, Community structure of rare methanogenic archaea: insight from a single functional group, FEMS Microbiology Ecology, Volume 93, Issue 11, November 2017, fix126, https://doi.org/10.1093/femsec/fix126

Conditionally rare:

(max:min > 100), where max and min are maximum and minimum abundances

Permantently rare:

(max:min < 5)

In mia, implement getPrevelanceClass() function that returns a class for each taxa (prevalent, CRT, PRT or rare).

In miaTime, implement getBimodalityCoefficient(). For each system/group/patient and taxa, return bimodality coefficient.

The method would return either a vector length nrow or data.frame with dimensions nrow*N_system

This method would follow this implementation:

Shade A, Jones SE, Caporaso JGHandelsman J, Knight RFierer N, Gilbert JA2014.Conditionally Rare Taxa Disproportionately Contribute to Temporal Changes in Microbial Diversity. mBio5:10.1128/mbio.01371-14.https://doi.org/10.1128/mbio.01371-14

@antagomir

antagomir · 2024-11-29T16:21:25Z

There are different multi/bimodality indices available, here one implementation:
https://cran.r-project.org/web/packages/BimodalIndex/BimodalIndex.pdf

antagomir · 2024-11-29T16:29:08Z

Interesting, I had not paid attention to the multiple definitions.

Definitions in (1-2) are somehow arbitrary, and relatively easy calculate manually in whatever desirable way. I am hesitating if we need full methods for these in the end, before there is a clear need / more use cases. On the other hand this PR is already half-way..

Bimodalities are interesting, too. I'm good with that if it is quick to add. It could make sense to check what uni/bi/multimodality indices are available in R. There are a couple of different methods.

TuomasBorman · 2024-12-01T20:23:46Z

1-2) I think this PR is not suitable for implementing these since I think mia is the correct place. However, these should be quite easy to implement.

getConditionallyRare <- function(x, threshold = 100, ...){
    res <- .calculate_max_min()
    res <- names(res)[ res > threshold ]
    return(res)
}

getPermanentlyRare <- function(x, threshold = 5, ...){
    res <- .calculate_max_min()
    res <- names(res)[ res < threshold ]
    return(res)
}

This should be also fast to add especially if we decide to utilize these existing packages. It would make sense to use them but probably have to check what options there are (if there are some other package with broader scope, and we could use it elsewhere also)

antagomir · 2024-12-01T22:23:01Z

Seems good. Only question I have is whether the added benefit from having these will be higher than the maintenance burden as these are relatively specialized functions. I could have thought about it earlier. But if these are fast to add, then it would be good to just PR them to mia and close the case.

antagomir · 2024-12-01T22:23:31Z

Re: (3) I would do that when someone has need in a real project.

TuomasBorman · 2024-12-02T09:03:04Z

1-2) That is true. This is especially problematic if there are more than these two ways to define conditionally rare and persistently rare taxa. I don't know if it is common to determine these CRT and PRT, and are these the most common definitions.

However to point out, in the issue, there was one external person interested in this activity. If these are the most common definitions for CRT & PRT, it would be nice to add features that community needs.

I agree that this bimodality thing can be implemented in the future if needed

add crt fxn

676f868

Signed-off-by: Daena Rys <[email protected]>

TuomasBorman mentioned this pull request Nov 14, 2024

Conditionally & persistently rare taxa #83

Open

TuomasBorman requested changes Nov 14, 2024

View reviewed changes

Daenarys8 added 2 commits November 25, 2024 20:46

modify function

42ab277

Signed-off-by: Daena Rys <[email protected]>

modify function

f5297f2

Signed-off-by: Daena Rys <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conditionally and persisitently rare taxa #98

conditionally and persisitently rare taxa #98

Daenarys8 commented Nov 14, 2024

TuomasBorman Nov 14, 2024

Daenarys8 Nov 25, 2024

TuomasBorman Nov 25, 2024

antagomir Nov 25, 2024

TuomasBorman Nov 25, 2024

antagomir Nov 25, 2024

TuomasBorman Nov 26, 2024

antagomir Nov 26, 2024

TuomasBorman commented Nov 28, 2024

antagomir commented Nov 29, 2024

antagomir commented Nov 29, 2024

TuomasBorman commented Dec 1, 2024

antagomir commented Dec 1, 2024

antagomir commented Dec 1, 2024

TuomasBorman commented Dec 2, 2024

conditionally and persisitently rare taxa #98

Are you sure you want to change the base?

conditionally and persisitently rare taxa #98

Conversation

Daenarys8 commented Nov 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TuomasBorman commented Nov 28, 2024

antagomir commented Nov 29, 2024

antagomir commented Nov 29, 2024

TuomasBorman commented Dec 1, 2024

antagomir commented Dec 1, 2024

antagomir commented Dec 1, 2024

TuomasBorman commented Dec 2, 2024