If you do not already have one, you will need a de novo assembly. If you are using the Ohio Super Computer (OSC) and have access to PAS1046, skip installation to OSC USE once you have added the line to your profile from here.
Use Singularity (or Docker) to pull the prebuilt container
singularity pull docker://xonq/funannotate_mask:1.8.1
Repeat masking software is contained within a software container. To access RepeatMasker software you will have to activate the container:
/fs/project/PAS1046/software/containers/funannotate/funannotate_mask.sif
Press CTRL + D or run exit
to exit it. Only run the container for repeat masking commands otherwise you will experience unexpected problems.
RepeatModeler outputs more files than you could ever deal with, so to circumnavigate project and user folder file limits, we will make an output directory in your scratch folder (/fs/scratch/<PAS####>/<USER>
). Once done, we prepare an NCBI database for RepeatModeler
to reference when actually running RepeatModeler. Make sure you have the container active as explained above.
mkdir <SCRATCH/OUTPUT/DIRECTORY>
cd <SCRATCH/OUTPUT/DIRECTORY>
BuildDatabase -name <NAME> -engine ncbi <YOUR/ASSEMBLY>
Now, deactivate the container, create a plain text (UTF-8) shell file with the following command, save it with a .sh
extension, and transfer to OSC. This file is for job submission. NOTE - reference the <NAME>
used after -name
in the previous command, no file extensions. So, if in the above command you used 'psicub1' for <NAME>
you will input <DATABASE/PATH>/psicub1
.
cd </SCRATCH/OUTPUT/DIRECTORY>
RepeatModeler -pa 8 -database <SCRATCH/OUTPUT/DIRECTORY/NAME> -engine ncbi 2>&1 | tee repeatmodeler.log
Edit the command below and submit the job. Remember, you do not want the container active when submitting the job - the job itself calls upon the container. Confusing, yes, but what you are doing is 1. telling the computer to submit a command to the supercomputer job scheduler, 2. telling the job itself to invoke singularity, and 3. telling singularity to run your shell file prepared above.
echo -e 'singularity exec /fs/project/PAS1046/software/containers/funannotate/funannotate_mask.sif bash <YOUR/CMD>.sh' | sbatch --time=100:00:00 --nodes=1 --ntasks-per-node=8 -A PAS<####> --job-name=repeatmodeler
Keep it clean - if you know cd
, mv
, ls
and rm
then you know everything you need to move files around and create a clean directory structure with informative directory names and file paths. You will not need most of the files - instead, the <NAME>-families.fa
is the de novo repeat library you reference when masking your assembly for gene prediction. Let's move that file into your organism(s)' folder(s) that we created when starting assembly.
mkdir <ORGANISM>/repeatmodeler
mv <REPEATMODELER>/<NAME>-families.fa <ORGANISM>/repeatmodeler
This installation of RepeatModeler references the Dfam 3.0 repeat library. In addition to citing RepeatMasker software, cite the Dfam library as well.