Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_augustus_test_and_train default --target_mono_exonic_pct #34

Open
swarbred opened this issue Jun 27, 2022 · 2 comments
Open

generate_augustus_test_and_train default --target_mono_exonic_pct #34

swarbred opened this issue Jun 27, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@swarbred
Copy link
Collaborator

I suggest we lower the default --target_mono_exonic_pct from 20 to 5%

With some species with smaller gene sets finding 20% of 1200 train and test genes wont be possible, this was the case for a recent fungal genome.

REAT Failed, the following file might contain information with the reasons behind the failure
/ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/execution/stderr
+ generate_augustus_test_and_train /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/inputs/-1046222641/with_utr.extra.gff --train_min 400 --train_max 1000 --test_max 200 --target_mono_exonic_pct 20
+ gff2gbSmallDNA.pl test.gff /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/d18b476e-faa4-4c2f-98a7-b5797c30ddde/call-SelectAugustusTestAndTrain/inputs/1001504700/gfSpoDele1_1.curated_primary.softmasked.fa 200 test.gb
Couldn't open test.gff.

When examined I could see that we simply dont have 240 single exon genes and the generate_augustus_test_and_train script generates no output with no info in an error log so it's not entirely transparant to a user what caused the error

Note the -f force option does not override the target_mono_exonic_pct 20% requirement though this does give an error indication

generate_augustus_test_and_train /ei/.project-scratch/e/e701c73c-45b1-4784-9385-6c69cf3272cf/CB-GENANNO-508_ERGA_Spongipellis_delectans/Analysis/reat-dev-issue25/Prediction/cromwell-executions/ei_prediction/2482d9fe-d7e9-42dc-bbaa-8259e9e25fb8/call-SelectAugustusTestAndTrain/inputs/-578101069/with_utr.extra.gff  --train_min 400 --train_max 1000 --test_max 200 --target_mono_exonic_pct 20 -f
Requested minimum number of mono-exonic models: 240
Real possible minimum number of mono-exonic models: 6
Number of train models: 32
Number of mono-exonic models in train set: 6
Traceback (most recent call last):
  File "/ei/software/cb/reat/dev-issue32/x86_64/bin/generate_augustus_test_and_train", line 138, in <module>
    main()
  File "/ei/software/cb/reat/dev-issue32/x86_64/bin/generate_augustus_test_and_train", line 101, in main
    test_models = random.sample(train_models, args.test_max)
  File "/ei/software/cb/reat/dev-issue32/x86_64/lib/python3.9/random.py", line 449, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

The idea was that target_mono_exonic_pct set a maximum percentage of single exon genes, as coded it works as a target. That being the case I would just lower it to 5%

@swarbred swarbred added the enhancement New feature or request label Jun 27, 2022
@ljyanesm
Copy link
Contributor

ljyanesm commented Jul 2, 2022

Dear @swarbred,

Sorry about the unexpected behaviour of this flag. Would you mind uploading the input data triggering the behaviour to this issue? I'll have a look.

Thanks

@swarbred
Copy link
Collaborator Author

swarbred commented Jul 3, 2022

with_utr.extra.gff.zip
Hi @ljyanesm
@gemygk may have already looked at this anyway, but I've attached the file
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants