Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor getBaselineDivergence/getStepwiseDivergence #91

Merged
merged 42 commits into from
Nov 4, 2024
Merged

Conversation

Daenarys8
Copy link
Contributor

@Daenarys8 Daenarys8 commented Sep 16, 2024

Dependent on microbiome/mia#635

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
@Daenarys8
Copy link
Contributor Author

Given that the calculation has changed, should we still support the baseline_sample parameter as either a vector or a SummarizedExperiment object?

@TuomasBorman
@antagomir

@TuomasBorman
Copy link
Collaborator

If I remember correctly addDivergence supports

  • Variable from colData
  • Numeric vector denoting reference sample
  • Character vector denoting reference samples for each sample
  • Character scalar denoting a reference sample

The same should/could be supported in miaTime (perhaps use the same function)? However, to make things simpler, we should prioritize the colData variable in the documentation, otherwise it might become too complex.

I think the only difference between addDivergence and addBaselineDivergence is that the latter adds the time difference?

addStepwiseDivergence is little bit more complex since we should specify the previous step/sample for each sample. But after that, the function is same as addDivergence (apart from adding time difference).

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
@Daenarys8
Copy link
Contributor Author

@antagomir

Copy link
Collaborator

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is unnecessarily complex. We should aim to reuse existing functions whenever possible, as there is a lot of functionality here that is implemented only for this function, but it isn’t needed. Specifically, much of what this function does can already be handled by mia::addDivergence.

The only features missing from addDivergence that this function requires are:

a) Adding reference values to colData
b) Adding time difference results to colData

These could be integrated more simply without duplicating functionality.

addBaselineDivergence

  • (Optional) Set the first time point as the default reference.
    
  1. Calculate divergence values with `addDivergence`.
    
  2. Add time difference values to `colData`.
    

addStepwiseDivergence

  1. Add reference sample information to colData.
  2. Calculate divergence values with addDivergence.
  3. Add time difference values to colData.

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
@Daenarys8
Copy link
Contributor Author

ndimred is longer used in getStepwiseDivertgence

Signed-off-by: Daena Rys <[email protected]>
Signed-off-by: Daena Rys <[email protected]>
Copy link
Collaborator

@TuomasBorman TuomasBorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the most recent code, I made some changes to minimize redundant code. The functionality of these functions are now ready, however, the documentation is not updated yet.

Moreover, we need to discuss about deprecation (do we need to deprecate or can we just remove and modify). Especially this is relevant in get* functions as they previously added values to colData but now they return only the values (as we have decided this naming scheme).

Moreover, reducedDim() is not now supported. If we want to support it, it needs to be incorporated to mia::getDivergence(). The idea of these miaTime divergence functions is only to add reference samples (and time difference) to colData, mia::getDivergence() does the calculation.

@TuomasBorman
Copy link
Collaborator

I think this PR is now ready. However, there is still one important thing to discuss. We have not deprecated the argument names. Moreover, previously get*Divergence functions returned TreeSE with calculated values in colData.

Now they get* functions return DF and add* return TreeSE. Because we will not remove the get* functions but just modify them to return DF, we cannot deprecate them.

Moreover, deprecating argument name causes extra work.

--> Can these arguments and functions be modified without deprecation? I think (but I do not know) that these functions are not that widely used yet, so we can modify them freely. Moreover, this package is not published yet and this has been more like a draft.

@antagomir

@antagomir
Copy link
Member

I agree that this can be changed now without deprecation, not yet in wide use.

@antagomir
Copy link
Member

Yes, these can be modified without deprecation.

Is this now ready to merge & close?

@antagomir
Copy link
Member

Remember to check & update the vignettes and unit tests as well if necessary

@TuomasBorman TuomasBorman linked an issue Nov 4, 2024 that may be closed by this pull request
@TuomasBorman TuomasBorman merged commit 4d2ec58 into devel Nov 4, 2024
3 checks passed
@TuomasBorman TuomasBorman deleted the bioc_mods branch November 4, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize getBaselineDivergence
3 participants