- can run the other scripts (viralgenome, motifAnalysis) outside the installation directory
- Need to move all global variables to cfg.py
- Add hg38 and mm10 annotation files
- docs.tar.gz needs to be updated
- Add more analysis of retroviral data (currently in another script)
- New flag to control STAR N over L ratio (mismatches per base). Default set at 0.08
- Thresholding now implemented correctly for tRNA, repeat, and genome
- Maps to retroviral index now
- ReadsPerPipeFile now outputs counts
- Script runs to completion with --clipper
- Figure 1 is no longer stretched out vertically
- When not using CLIPper, protein-coding genes are now correctly enumerated
- Folder structure is now correct
- Installation procedure using setup.py
- Now runs STAR instead of bowtie2 for genome index mapping. STAR is faster and maps across splice junctions.
- Users now have to supply a STAR index (-s flag required)
- CLIPper is now optional (and no longer recommended). Instead, reads are now overlapped by snoRNAs and miRNAs before overlapped to genes.
- Logging is improved, with a --verbose flag to see all command lines run
- Histograms of tRNA isotypes are now outputted, along with comparison across replicates
- Files are now outputted in a more manageable directory tree
- Code split into several files to improve maintainability
- tRNA indexes now end with CCA modification
- iCLIPro is now run on output
- The 5' end of RT stops on the minus strand is now correctly reported (off-by-one error)
- Both + and - signal are now reported for each nucleotide in PlotData_RepeatRNAHist_*
- Mean and standard deviation for each significant RT stop are now reported in files that end with _mergedRT.bed_stats
- Added the ability to run fasticlip outside its installation directory once the appropriate environment variables are set
- Can now handle any number of replicates. Added default thresholds for RT stop significance for different numbers of replicates.
- Users can define significance rules for RT stops for repeat and non-repeat RNAs separately with -tr and -tn
- ViralGenome_analysis.py added; can now map to arbitrary viral genome Bowtie2 indexes
- Can now specify filename prefix in motifAnalysis.py
- In Figure 5, if PlotData files are empty the plots will not be drawn (rather than crashing)
- Clarified the definition of exons, CDS, introns, and UTRs in both fasticlip and motifAnalysis.py
- Added the file PlotData_ReadsPerGene_proteinCoding_byRegion to count up reads for each of the above regions for each gene
- Removed -f value from the bedtools intersect commands, since default is -f 1e-9 which should work
- Users can now specify the minimum MAPQ quality they want to keep using the -q flag
- mapping to tRNA indexes (docs.tar.gz also updated to add tRNA indexes)
- tRNA and repeat masker reads are now reported
- Default lower limit read length is 15
- Duplicate removal of reads is now done before 3'/5' trimming to save time
- Various redundancy fixes to speed up making Figures 1, 3, 4
- Using the -sorted flag in bedtools intersect to speed up the repeat and blacklist masking
- Adding a list of genes that have 5' and 3' UTR binding, but no CDS binding
- In motifAnalysis.py, lists of RT stops are now filtered so that there is only one representative per significant "window"
- rDNA histogram is now the correct length
- CLIP clusters are now merged if 1) they are duplicates of each other, or 2) they overlap
- changed location of docs.tar.gz and updated ./configure
- added a script to map to retroviral indexes
- Added the necessary retroviral fasta files to the docs.tar.gz file
- added motifAnalysis.py to analyze motifs using HOMER
- we now combine all isoforms of each gene during plot making
- Added a minimum read length post-5' trim
- Changed -f13 to -f14 for the 5' trim so that the first 13 nucleotides are trimmed by default. -f should represent the first nucleotide that is kept