Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add individuals= option to treeSeqOutput() to output a sample #448

Open
bhaller opened this issue Jun 28, 2024 · 2 comments
Open

add individuals= option to treeSeqOutput() to output a sample #448

bhaller opened this issue Jun 28, 2024 · 2 comments
Labels
enhancement trees related to tree-seq, tskit, etc.

Comments

@bhaller
Copy link
Contributor

bhaller commented Jun 28, 2024

Seems like a lot of people want to be able to do this; it's an FAQ on slim-discuss. It can be done with killIndividuals() or similar techniques, but it'd be more graceful to have a way to just do it directly in the treeSeqOutput() call. No reason we can't do this easily, right @petrelharp?

@bhaller bhaller added enhancement trees related to tree-seq, tskit, etc. labels Jun 28, 2024
@petrelharp
Copy link
Collaborator

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera. It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that. I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

@bhaller
Copy link
Contributor Author

bhaller commented Jun 28, 2024

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera.

Indeed, I recall that.

It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that.

Yes; well, we'd like to avoid making a copy, but haven't succeeded in that (I don't think we're even close to that goal, are we?); and even if we did succeed in that, we could still keep the current code path for use when a subset of individuals is specified. Doesn't seem like a major obstacle.

I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

An example of that does seem like a good idea. It's a new idea to me; I've never seen it done, and hadn't heard anyone even mention it until you mentioned it in the recent slim-discuss thread. I'm a bit suspicious of it because of possible (likely?) correlations between the different replicates from a single output; pseudoreplication issues seem possible, and hard to rule out. But if you think it's a good technique, it should certainly be demonstrated somewhere. But it seems kind of orthogonal to the issue at hand.

To me, the basic fact is that people want to be able to do this, and they're hacking it in by the various techniques described in the slim-discuss thread; given that reality, it'd be better to provide them with a clean API that does what they want to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement trees related to tree-seq, tskit, etc.
Projects
None yet
Development

No branches or pull requests

2 participants