Releases: EI-CoreBioinformatics/mikado
Going to BED in a haste
Release notes:
- Solved a nasty bug in Mikado compare for cases where more than one gene was located in the same exact location (eg AT1G78880.1, AT1G78882.1 in Arabidopsis)
- Switched from a pool implementation to stable Process objects for multiprocessing mikado.py pick. WARNING: for preloading, this ends up multiplying the amount of data as it has to be copied over to each process!
- Added the possibility of printing Transcript objects in BED12
- Added the possibility of specifying an output directory for all mikado steps
- Added the "introns" property to reference gene objects
Major speedup
The main highlight of this version is the creation of a pure single-threaded mode, which is now obligatory in the case of preloading. This makes Mikado ~3-5 times faster when using a cached database, albeit at the sacrifice of multiprocessing.
Other highlights:
Fixed #39, #38, #51
Due to a bug, multiple secondary redundant AS events could be selected. This is now fixed
Other minor bug fixes in grep.py, GFF/GTF, remove_utr
Minor speedups, v1
Improvements for this version:
- DB access greatly sped up, especially for multiprocessing
- added a parameter, "min_score_perc", to further filter the AS events
- Removed minor speed bottlenecks by memoization
- Solved an important bug by which splitted transcripts were devoid of any verified intron
- added a new utility, gtf_to_gff3
- Utilities now are installed together with mikado.py
- BLAST data loaded also for Split/Nosplit
NOTE: at the moment, multiprocessing makes sense for DB access but not for preloading. Investigation ongoing.
Proper boxes
Interval switching
Main features of the release:
- switched to Interval objects (from intervaltree) to describe the internal features of transcripts, such as exons and introns
- unified the find_retained_intron function inside Abstractlocus
- incremented the coverage by the tests
- Various bug fixes
Roadblock removal
Solved issue #37 by profiling the data loading with kernprof. Now Mikado should be as fast as in the past, but with additional sanity checks regarding the ORFs.
Transcript razionalization
The main change is the subdividing of Transcript into multiple modules, allowing for a finer control of what happens inside the code and quicker debugging or (hopefully) understanding for other coders.
BLASTing the store
Change-log since 0.9.2:
- Solved an issue by which Mikado was loading too few hits from the database when multiple hits had the same evalue (e.g. 4 hits with evalue 0 and max_target_seqs = 3 led to loading only 3 hits, spuriously on the base of the SQLite driver present on the system). Now we select up to max_target_seqs hits and all subsequent hits with the same evalue of the last selected hit. See issue #33 for details.
Tidying the house configuration
Tidying the house configuration
Main changes since 0.9.1:
- Worked with Dan to fix PortCullis (v. 0.10) and junction loading
- transitioned most of the options to the configuration file
- completed work on using JSON schemas to check / generate / insert missing values for configuration files.
- Prepare now chucks out transcript with mixed strands in every case, and tags the proportion of introns which are canonical.
Bridging the gap
BF release, gaps were not handled correctly. Code slightly simplified.