-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need unit tests for calc...() popgen functions #474
Comments
This could take the form of a VCF file and an expected value. My SLiM test could simply load the VCF and check for a match (within reasonable numerical tolerance) to the expected value. |
My recommendation is to not do expected values from theory (if that's what you meant); instead compare to the value calculated independently - either by other software or by a separate, first-principles implementation. I don't want to take this on right now though - maybe a good student project? |
OK. Why not expected values from theory? |
Because that is so much more complicated - you have to worry about statistical power; how close is "close enough"; etcetera. That sort of thing is good for validation, but not so good for unit tests (for one thing you end up having to run a lot of simulatiosn to make sure). What we do in tskit, for instance, is usually just pull up the definition of the thing, then code up some real simple implementation that doesn't worry about efficiency; and compare to that. msprime does have a whole |
Aha, I see. Yes, there are certainly problems with doing statistical tests for validation. SLiM already does tons of them, though. But if a precise comparison to the "right answer" is possible, that's certainly better! |
Sorry for not responding to this until now. I saw it at the time but knew I'd be rather busy for the rest of that term, and I was, so I left it alone. I agree that comparison to other calculated values makes sense and sounds relatively doable. I don't know what computing expected values from theory would entail personally. I'm responding now because I did write a script with a simple sort of test. I did tests like it when writing the functions but never had a clean version to show until now. I'm attaching the script as a text file so Github lets me but it should really have a .slim suffix. It will run a simple neutral simulation with either a population contraction or expansion (one should be commented out at a time) and plot Tajima's D against time (in ticks). The basic qualitative expectation is that a contraction should increase Tajima's D and an expansion should decrease it. The change in pop size occurs halfway along the x-axis in the plot and it does seem to exhibit this pattern. Obviously there's also random variation in Tajima's D so sometimes the change halfway isn't obvious. I assume increasing pop size would reduce the variation but the simulation takes about a minute as is so I didn't want to push parameters too much. There's comments with specific parameters in script. Also, I ran it on the current release build so I define calcTajimasD2 and calcPi2 explicitly in there, but I took them straight from slim_functions.cpp so should match what's supposed to be built-in barring the current bug you noted. The above could also be used to plot calcPi and calcWattersonsTheta. The changes in Tajima's D should correspond to changes in these. Also, in neutrality with constant pop size, both of those should fluctuate around Regarding comparisons to inference programs; SLiM can output VCFs so I assume those can be analyzed directly with something like VCFtools, which computes pi and Tajima's D. So someone could just compute the stats within SLiM at the time a VCF is output, and run VCFtools (or other programs) on that same VCF and see if they are about the same, right? For a different project I did some comparisons between software and used a dummy VCF I made matching a textbook example. I believe it was an alignment given in figure 3.1 of Matt Hahn's Molecular Population Genetics. I don't have the book on hand. The textbook does the pencil-and-paper pi and watterson's theta calcs on that alignment so I assume that would work as something with known mutations in known positions/frequencies. I could dig around for the dummy VCF. Some of that I could probably do soon if it'd be of interest. I'd be happy to let someone else do it too but I'm assuming no one's biting if the post has been up this long. |
This sounds great, @npb596. I'm rather buried in work right now – I'm extending SLiM to support multiple chromosomes, which has turned out to be rather more work than I expected :-O. It would be wonderful if you could run with this issue! :-> I'd suggest two things: (1) I ran your test script, and it does seem good. For a test like this, I think it is fine for it to take a fairly long time to run, so I would suggest making it bigger to cut down on the noise and make the pattern more clearly visible. If it takes, say, an hour or even eight hours to run, I think that's fine; it won't be run often, but when it is run, it should provide a high-quality, rigorous test where a problem is clearly visible. I will run it manually before each release, with other manual tests that I use from time to time. Maybe the scale (pop size, if that is the right place to scale the test up up) should be a defined constant at the top, with a comment about what scale is suggested to produce a good plot without taking weeks/months. :-> It'd be great if you added the other (By the way, the graph I got looks like this: From the way the line is going upward, I'm guessing the model had not yet reached equilibrium when the bottleneck started? But the drop caused by the expansion is certainly very visible! I guess it would be good for the model to run to equilibrium first, so that the height can actually be compared to an expected value? I'm a bit surprised it hadn't equilibrated, though, since the pop size is 2000 and the expansion happens at 80000, 40*N. This is the kind of thing where I'm just out of my depth and not sure what to think. :-O) (2) Then, on the other hand, it'd be great to have actual "unit tests" of these functions, meaning very small, fast, deterministic tests that run every time SLiM does a self-test. So these should take just a small fraction of a second. What you wrote about VCFtools and Matt Hahn's book seems helpful. In the end, what we'd want is an input VCF file that is quite small (a couple of pages max, ideally smaller?), so that I can embed it right into SLiM's code as a string, and then the unit test can read the VCF in, use the Anyhow, those are my ideas; please do whatever you think is good and have time to do. :-> I greatly appreciate the help, since this is an area where I don't really know what I'm doing, what constitutes a good tests for these functions, how to calculate the expected values, etc. :-> Thanks again! |
The need for unit tests for SLiM's popgen functions has been underlined by another discovery of a bug with them (https://groups.google.com/g/slim-discuss/c/Yacfk9EIYeU/m/bc72wVUzBAAJ). I'm not sure how to test them, though. I suppose a test could construct a population with known mutations, placed into the genomes at known positions/frequencies, and then test that the value calculated by the function matches the expected value calculated independently from first principles or by other software. If someone can supply me with a test scenario and an expected value, I can construct a corresponding SLiM test, but I don't have the knowledge necessary to come up with appropriate scenarios and expected values. These test scenarios wouldn't need to be large/complex; even a test with a genome of say, ten base positions long with, say, five mutations present and four diploid individuals (eight genomes) would be quite sufficient to test that the math and logic are correct, I would think. It would be good to have such tests for all of the calc...() functions. Perhaps @npb596 or @petrelharp or @philippmesser could help me with this?
The text was updated successfully, but these errors were encountered: