Skip to content
Cristina Tuñí i Domínguez edited this page Sep 11, 2020 · 3 revisions

Welcome to my Master's Thesis wiki!

Pipeline

This files create a GUI which run a tRNA alignment pipeline made by Marina Murillo.

Running it in Unix based systems

Requierements

The pipeline is designed to automatize the other requirements (eg. installation of packages and programs). If the automation step fails, one must manually install and add to PATH:

This can be done by installing Anaconda, and through bioconda those three programs can be installed.

And also pysam package for python.

This automatization step is carried out by anaconda_setup.sh. If one encounters problems with the automatization step, the commands in that script can be executed one by one to manually reach the final step of the requirements.

The installation and/or importation of python modules is carried out by modules.py. Like in the previous paragraph, all of those python packages can be installed manually using conda or the installer preferred by the user.

Running it in Windows

Requirements

First of all, it is important for the user to know that bowtie2, the aligner this pipeline is based on, is only designed to work on Unix based systems like Ubuntu or MacOS. So a workarround had to be found. Before running the pipeline, the user must install Windows Subsystem for Linux (WSL from now on). WSL allows Unix based orders to work on Windows, thus making possible for the Windows user to run bowtie2 among other "Unix exclusive" programs.

The user also must install and add to PATH Python 3.x.

Once this two steps are fullfilled, the automatization step should work in the same way as in Unix based systems. If it does not, one can run the commands step by step found in the anaconda_setup.sh in the WSL window.

Optional steps

Before running the pipeline, the user can choose to fullfill some optional steps, that do not have anything to do with the pipeline itself. Some of this steps allow for the pipeline to be used, but once done one time it is not necessary to re-run them:

  • First, one can install all of the required packages by cliking the button "Download additonal programs".

This will try to automatically make sure that all of the requirements are fullfilled. A console will open and the user must follow the instructions there. If Anaconda is not installed, their License Agreement will show up and the user must press "ENTER" until the end of the agreement, then write "yes" to accept it, press "ENTER" again to confirm the path where Anaconda will be installed, and we recommend writing "yes" again when promted, which will make the conda installer available.

This will install Anaconda and the three programs required to run the pipeline. If Anaconda is already installed, this step may fail.

  • Users can also download the Human Genome and the bowtie2 indexs required to run the pipeline.

By clicking the button "Download Genome", one will automatically download the genome and the alignment index and the annotation found here. ATTENTION: This file wheighs 14Gb aproximately, when compressed. Make sure you have enough disk space and a steady internet connection.

  • Fastq files can also be downloaded from the program.

Users can input an accession code in the form of SRRXXXXXXX (for example, SRR7216347). This will accesss the FTP server of EBI-ENA and download a compressed fastq file. The user can choose to uncompress it (once downloaded, not before) by clicking the button "Untar and delete .gz file". This will delete the compressed file, and keep the normal and usable .fastq file.

Usage

Once the user has:

  • Fullfilled the requirements to run the pipeline.
  • Downloaded the Human genome.
  • Downloaded and uncompressed the fastq files to analize.

The pipeline can finally be run! To do this, the user must click on the button "Choose folder" and select the folder where the fastq files are stored. Then, they must click the button "Choose file", and select the file to analyze.

Once this is done, by clicking the "Submit" button at the bottom of the app window, the pipeline will begin to run. This pipeline can take up several hours to complete, and it creates several heavy files, so we recommend two things: running in a powerful enough machine, and making sure one has enough disk space.

Clone this wiki locally