-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster solutions issues #20
Comments
The pellet survey project has 8 strata plus global-level multipliers. Cluster solutions is exactly the minke data with the exception of an invented column containing categorical cluster size (groups of size 1, size 2 and more than size 2) |
I'm now leaving to catch the evening bus for home, can check back later. |
The issue with the cluster data here was that the This should be fixed in 8820620. Although tests can run, the results are rather different:
(For example for the first analysis therein.) I think this is connected to #17. (Not sure what you mean by the "pellet survey".) |
Pellet survey--dung survey. Line transect sampling of deer pellets -> density estimate of pellets -> decomposition and deposition rates (multipliers) -> deer density estimate |
sorry to bounce around among projects (difference between home and office machines) here is a NOT cluster size example, Here are results of 3 analyses inside that project:
Hope to see the back of my marking tomorrow. Perhaps at that point, we might discuss what types of analyses CANNOT be resolved between DisWin and mrds (perhaps the cluster size issue falls therein). Then we can draft a list of pertinent DisWin projects to put through the grinder looking for specific problems; at the moment it is a scattergun approach. Perhaps it might be adequate to check up to density estimates, and ignore abundance estimates and their measures of precision. |
53863b8 includes support for estimating cluster size in abundance and density estimates in the same ways as Distance for Windows. I think the remaining issues with the data set are down to stratification. This needs to be resolved by building a wrapper around |
Getting easier to run projects through readdst so perhaps a clearer picture is emerging. Current install is indeed Through it, I have run CovarWhaleSim-solutions, and as you have previously seen, that project has 3 analyses that checkout quite well, but last 4 analyses (
|
Moving on to cluster size estimation, project called
|
It looks to me after some investigation like the hour covariate isn't doing much (in There isn't an obvious relationship with observed distance (which we might expect?) and it seems to throw off the optimiser. It also looks like the values are all very similar. Could you try re-running all these analyses (in |
Reran the I'm not sure what you mean about
more signals from D7B3 there is nothing happening with that covariate. |
Thanks for doing this! Hm. Okay, I'm going to leave this issue for now but open a separate issue at #23. My feeling as this might be a mroe optimisation-related issue. |
I'm a little confused about one of the analysis in the cluster project:
Here I don't even get the number of parameters right. I'm unsure about where the extra parameters come from. The model definition is as follows:
@erex does this correspond to what you see in DISTANCE? Are you able to paste the log output (parameter estimates) from DISTANCE? Thanks! |
Human description of this (admittedly perverse) analysis: "In the analysis “Post-stratified E(s)_strat f(0)_regr”, the detection function has been estimated separately in each cluster size stratum. The detection functions are different from each other - it looks like cluster sizes 3 and above are detected with certainty almost all the way out to 1.2nm." DisWin is fitting a 2-parameter hazard rate model to each of 3 strata (where strata are defined as "detections with cluster size=1, detections with cluster size 2 and detections with cluster size>=3", that is how 6 parameters come to be. Log window contents for
Also notice that your element [3] of model_definitions (where the phantom The 6 parameter estimates you requested are: This third stratum is for "big" clusters that the detection function tries to fit with a flat hazard rate out to 1.2nm, and then falls off a cliff because no clusters of that size are seen beyond 1.2nm, hence the (non-meaningful) upper bound warning. As I said, this is a pretty pathological case; unlikely any user is going to want to perform this kind of analysis on their data. |
Thanks for this thorough investigation Eric! My hunch was that this was what was happening. I don't know if the general case of "post-stratification by some covariate" is useful anyway. I think that should just consist of setting the Mismatch in log window and what is stored in the database in terms of the MCDS command language is very odd. @lenthomas do you have any ideas about this? |
My initial instinct is that neither Distance nor readdst should bother translating post-stratified analyses. Seems to me like our time would be better spent with other details than that! I suggest just giving a warning and not converting those analyses? For RDistance, users can always set up data using R to do post-stratification. However, if you do want to implement it, fine by me. I do remember my SQL post-stratification code was a bit epic, to deal with post-stratification at the sample and observation layers... |
@erex reports the following problem with the cluster example data in #19:
The text was updated successfully, but these errors were encountered: