-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update configuration around external_metrics.txt #26
Comments
Implemented in 5717b13:
|
all new features testable in 1.6 |
@cschu where is 1.6 located? |
minos-1.6 of course :-) |
I guess if a user specifies a metric in the external_metrics file e.g. then we should uncomment these metrics in the generated scoring yaml currently we add the value but leave this commented out |
Yea, makes sense. Will those then contribute to transcript_score/protein_score downstream? If so, how? Currently,
|
ah nvm, we're talking multipliers |
I have this implemented, it should roll out together with the next updates unless you'd like to have it in a 1.6.1 now. |
We dont need to use these additional metrics even if uncommented in the transcript_score or protein_score or add a new score for them. We do need all 4 i.e. nF1:1,aF1:2,eF1:3,jF1:4 to be set (same for tCov:X,qCov:Y) i.e. someone cant exclude aF1 from this list, as we need this in the classification so it needs to be always generated if the metric is present in the external file. Currently if a user includes aln_tran nF1:1,eF1:3,jF1:4 missing aF1: then we error prior to mikado configure if a user just includes nF1:1 then we end up with #external.all_scallop_transcripts_nF1: {rescaling: max, use_raw: true, multiplier: nF1:1} and mikado configure errors So either way we get an error (good), it's not 100% transparent for a user but again the documentation can make clear what we expect / advise |
Thats fine, thanks |
@cschu
Currently in the external_metrics.txt file for each metric_name_prefix a single value can be specified for the multiplier and not_fragmentary_min_value.
However these translate into multiple metrics
e.g.
for metric class aln_tran, aln_prot we have *_aln_nF1 *_aln_jF1 *_aln_eF1 and *aln_aF1
for metric class seq_prot we have *_qCov and *_tCov
This means that the user has to use the same value for each of these metrics. In practice I have been specifying different values for in particular *_qCov and *_tCov which requires me after running gmc configure to manually update the gmc_run.scoring.yaml file.
Some suggestions about how to improve this
The not_fragmentary_min_value is used in the mikado not_fragmentary expression, currently we make use of both *_qCov and *_tCov , again in practice I have been setting *_qCov to 1 which effectively turns it off as {operator: gt, value: 1} is not true. I would simply remove using *_qCov in the not_fragmentary section as having a high query coverage does not indicate the model is not a fragment. We already only use *aln_aF1 in this section (i.e ignore *_aln_nF1 *_aln_jF1 *_aln_eF1). Making this change means we still only need one not_fragmentary_min_value to be set for each metric_name_prefix.
For the multipliers it’s more useful to be able to specify multiple values. One “solution” would allow a comma separated list of multiplier values that are then applied in a set order i.e. metric class aln_tran, aln_prot allow up to 4 values applied to *aln_aF1, *_aln_nF1 *_aln_jF1 and *_aln_eF1 and metric class seq_prot 2 values applied to *_qCov and *_tCov.
This is a bit messy as it means for the multiplier column you allow 1 value, 2 values or 4 values for different metric classes. I think for convenience you would still want a logic where if only 1 value is applied then this is applied to all the resulting metrics, but you would need to also deal with user cases where someone specifies 3 values for the metric class aln_tran, aln_prot.
I’m welcome to other ways of doing this.
At the moment it does mean that everytime I run gmc I have to do this manual change and that is not ideal even if I do expect users to sometimes adjust the gmc_run.scoring.yaml file prior to running gmc run.
The text was updated successfully, but these errors were encountered: