-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-MP Node Support #4
Comments
There is some precedent for your option 2, but how are you going to get these sub-parsimonious trees? My hope is that the fitch-staring that @willdumm is doing will allow us to understand these implicitly. Or can we have a neural network find the signal for us? 😎 |
For option 2, I'd like to do something (e.g. what William suggested in 1, or the fruits of my tree-staring project) to make the DAG include all the trees that we want to consider, so we can truncate the underlying distribution (like I'd like to understand how this truncated distribution decomposes over DAG edges, and how it can be represented as a collection of downward-conditional edge probability distributions on sets of edges descending from each node-clade of the dag. For example, the distribution In order to compute downward conditional distributions representing this conditional distribution on histories in ... where in (2) the sum is over all such history fragments, and the dot represents a node whose probability we're trying to compute. The farthest right equality in (2) decomposes a node's probability into a downward part and an upward part. Each of these can be computed in the DAG efficiently by dynamic programming. Note that (2) gives node support. The issue is what happens when the sum is restricted to history fragments which are in a DAG which does not contain all possible histories. @williamhowardsnyder @marybarker at some point (I guess next week when I'm back in person, but this week could also work) I'd like to work through this with you. Mary, I know we thought about this a lot, and maybe you totally get it now, but it would be great for me to finally understand clearly. This is similar to stuff that's already been figured out for the sDAG, which is more complicated since in the hDAG we're considering probabilities which decompose over edges, without marginalizing over possible parent/child sequences, etc. |
Here's a brief summary of a more updated version of parsimony-weighted node support, based on our discussions this past week: NotationGiven an edge
Tree probabilities that decompose over edgesLet Given this edge-weight function, we can then sample trees from the DAG with the probabilitity distribution Example: parsimony weightingLet CaveatNote that a probability distribution on the DAG that is based on a numeric value that varies between trees does not ensure that we sample the highest-value trees with the greatest frequency, since the probability distribution is a weighted frequency distribution. |
Node Support Definition
Node support is an estimate of the confidence that a clade is in the true tree. Formally, this is the sum over all posterior probabilities for trees that the node appears in. Let$v$ be our node of interest and let the set of trees that a node takes part in be $T_v = set(t \in T : v \in t)$ , where $T$ is the set of all trees with non-zero posterior. Then, the support can be written as
$$Pr(v \mid D) = \sum_{t \in T_v} Pr( t \mid D)$$
Our current assumption is that all MP trees have uniform probability (we also assume the hDAG contains all MP trees or at least a good chunk of them). In that case,
$$Pr( t \mid D) = 1 / |T|,$$ $T$ is the trees in the DAG. And so our support becomes
$$Pr(v \mid D) = |T_v| / |T|,$$ $v$ is the proportion of trees in the DAG that contain our node $v$ . The idea is that all MP trees are equally optimal candidates, so the posterior should reflect that.
where
In other words, the support of
However, there are two challenges we've identified with this approach:
Is there a way to redefine the posterior on trees in the DAG to account for these challenges? A couple of ideas that might be worth exploring in this benchmarking effort:
The text was updated successfully, but these errors were encountered: