-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One sided RF distance #76
Changes from 4 commits
682da6d
9099847
4af7fa2
fc0dee3
c3f6a0f
711e6e8
a1ea6fe
8f4a657
7562dfc
e52b16c
0659023
4bcc91e
08d07d3
f6d4b67
64d128a
8b2aa42
3ef88fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -637,8 +637,8 @@ def sum_rfdistance_funcs(reference_dag: "HistoryDag"): | |
The reference DAG must have the same taxa as all the trees in the DAG on which these count | ||
functions are used. | ||
|
||
The edge weight is computed using the expression 2 * N[c_e] - |T| where c_e is the clade under | ||
the relevant edge, and |T| is the number of trees in the reference dag. This provide rooted RF | ||
The edge weight is computed using the expression |T| - 2 * N[c_e] where c_e is the clade under | ||
the relevant edge, and |T| is the number of trees in the reference dag. This provides rooted RF | ||
distances, meaning that the clade below each edge is used for RF distance computation. | ||
|
||
The weights are represented by an IntState object and are shifted by a constant K, | ||
|
@@ -682,6 +682,58 @@ def edge_func(n1, n2): | |
return kwargs | ||
|
||
|
||
def one_sided_rfdistance_funcs(reference_dag: "HistoryDag"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this looks great! I think the argument Also, we should make it very clear in the docstring whether the reference_history is expected to be multifurcating or not, when we're trying to detect resolutions of multifurcating trees. |
||
"""Provides functions to compute the one sided RF distance to a reference tree. | ||
In other words, the number of clades in a tree that are not in the reference tree. | ||
|
||
Args: | ||
reference_dag: The reference DAG. The distance will be computed in relation | ||
to this DAG | ||
|
||
The reference DAG must have the same taxa as all the trees in the DAG on which these count | ||
functions are used. | ||
|
||
The edge weight is computed using the expression |T| - N[c_e] where c_e is the clade under | ||
the relevant edge, and |T| is the number of trees in the reference dag. This provides rooted RF | ||
distances, meaning that the clade below each edge is used for RF distance computation. | ||
|
||
The weights are represented by an IntState object. | ||
""" | ||
N = reference_dag.count_nodes(collapse=True) | ||
|
||
# Remove the UA node clade union from N | ||
try: | ||
N.pop(frozenset()) | ||
except KeyError: | ||
pass | ||
|
||
num_trees = reference_dag.count_histories() | ||
|
||
def make_intstate(n): | ||
return IntState(n, state=n) | ||
|
||
def edge_func(n1, n2): | ||
clade = n2.clade_union() | ||
if clade in N: | ||
weight = num_trees - (1 * N[n2.clade_union()]) | ||
else: | ||
# This clade's count should then just be 0: | ||
weight = num_trees | ||
return make_intstate(weight) | ||
|
||
kwargs = AddFuncDict( | ||
{ | ||
"start_func": lambda n: make_intstate(0), | ||
"edge_weight_func": edge_func, | ||
"accum_func": lambda wlist: make_intstate( | ||
sum(w.state for w in wlist) | ||
), # summation over edge weights | ||
}, | ||
name="one_sided_RF_rooted_sum", | ||
) | ||
return kwargs | ||
|
||
|
||
def make_rfdistance_countfuncs(ref_tree: "HistoryDag", rooted: bool = False): | ||
"""Provides functions to compute Robinson-Foulds (RF) distances of trees in | ||
a DAG, relative to a fixed reference tree. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be called just
optimal_one_sided_rf_distance
since we're not taking a sum over anything... unless I'm misunderstanding!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the method returns something like the sum of one-sided RF distances rather than a single distance. I tried running it one some examples, and it returns a number that is bigger than the one returned by
optimal_rf_distance
.Also, from skimming the code in
utils.one_side_rfdistance_funcs
it looks pretty close toutils.sum_rfdistance_fucs