add individuals= option to treeSeqOutput() to output a sample #448

bhaller · 2024-06-28T07:43:13Z

Seems like a lot of people want to be able to do this; it's an FAQ on slim-discuss. It can be done with killIndividuals() or similar techniques, but it'd be more graceful to have a way to just do it directly in the treeSeqOutput() call. No reason we can't do this easily, right @petrelharp?

The text was updated successfully, but these errors were encountered:

petrelharp · 2024-06-28T15:07:12Z

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera. It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that. I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

bhaller · 2024-06-28T15:17:26Z

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera.

Indeed, I recall that.

It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that.

Yes; well, we'd like to avoid making a copy, but haven't succeeded in that (I don't think we're even close to that goal, are we?); and even if we did succeed in that, we could still keep the current code path for use when a subset of individuals is specified. Doesn't seem like a major obstacle.

I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

An example of that does seem like a good idea. It's a new idea to me; I've never seen it done, and hadn't heard anyone even mention it until you mentioned it in the recent slim-discuss thread. I'm a bit suspicious of it because of possible (likely?) correlations between the different replicates from a single output; pseudoreplication issues seem possible, and hard to rule out. But if you think it's a good technique, it should certainly be demonstrated somewhere. But it seems kind of orthogonal to the issue at hand.

To me, the basic fact is that people want to be able to do this, and they're hacking it in by the various techniques described in the slim-discuss thread; given that reality, it'd be better to provide them with a clean API that does what they want to do.

bhaller added enhancement trees related to tree-seq, tskit, etc. labels Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add individuals= option to treeSeqOutput() to output a sample #448

add individuals= option to treeSeqOutput() to output a sample #448

bhaller commented Jun 28, 2024

petrelharp commented Jun 28, 2024

bhaller commented Jun 28, 2024

add individuals= option to treeSeqOutput() to output a sample #448

add individuals= option to treeSeqOutput() to output a sample #448

Comments

bhaller commented Jun 28, 2024

petrelharp commented Jun 28, 2024

bhaller commented Jun 28, 2024