Skip to content

Latest commit

 

History

History
148 lines (140 loc) · 6.42 KB

Programs.md

File metadata and controls

148 lines (140 loc) · 6.42 KB
    extract_fasteris_inserts.sh
    (Uncommon - Legacy) Description: Given a directory with fasteris inserts (no adaptors) and an interval of libraries. The libraries are extracted, concatenated and converted to fasta. Fastq quality scores are ploted
      inputs: [First_lib][Last_Lib]
      outputs:
        [workdir]/data/fastq
        [workdir]/data/fasta
        [workdir]/data/quality
        [workdir]/count
      dependencies:
        tar
        fastq_to_fasta
        fastx_quality_stats
        fastq_quality_boxplot_graph.sh
        fastq_xtract.sh, lib_cat, fq_to_fa.sh
    extract_lcscience_inserts.sh _
    (Deprecated-Soon to be removed) Description: The libraries in [.fastq.gz] format are extracted and converted to fasta. Fastq quality scores are ploted. The template arguments is necessary if a range of lib are given.
    The template must be a substring of the file preceading the lib number. Template + lib number should identify only one file in the inserts_dir _directory
      Configs: config/workdir.cfg
        INSERTS_DIR if a range of arguments is supplied
        ADAPTOR adaptor sequence to be clipped
        LCSCIENCE_LIB if only one lib is to be extracted this value will be used
    inputs: [First_lib] [Last_Lib] [TEMPLATE]
    outputs:
      [workdir]/data/fastq
      [workdir]/data/fasta
      [workdir]/data/quality
    dependencies:
      tar
      fastq_to_fasta
      fastx_quality_stats
      fastq_quality_boxplot_graph.sh
      fastq_xtract.sh, lib_cat, fq_to_fa.sh
    pipe_trim_adaptor.sh _
    Description: Trim adaptors from fasta libs.
    The adaptor sequence must be set in the variable ADAPTOR in the workdirs.cfg configuration file.
      Configs: config/workdir.cfg
        ADAPTOR adaptor sequence to be clipped
    inputs: [First_lib] [Last_Lib]
    outputs:
      [workdir]/data/fasta/libxx_report.txt
      [workdir]/data/fasta/libxx_
    dependencies:
      fastx_clipper
    pipe_filter_wbench.sh
    Description: Given an interval of libraries the script filters them through the workbench filter using the configs in the config file. Mirbase database in config file workpath.cfg
      input: [First_lib] [Last_lib]
      Output: Filtered fasta
      filter_overview/Libxx_filt-${FILTER_SUF}.{csv,fa}
    pipe_filter_genome_mirbase.sh
    Description: Given an interval of libraries the script aligns them to a reference genome and keeps reads that align with a mismatch of X, using patman. Align previous reads with mirbase v20 matrue.fa. Reads that align are sent to the cons file while those that don't are sent to the noncons file. This filter using the configs in the config file. Mirbase database in config file workpath.cfg
    config/workdirs.cfg [THREAD] [GENOME]
    missing a config file (Next update)
      input: [First_lib] [Last_lib]
      Output:
        [workdir]data/filter_genome/libX_filt-${FILTER_SUF}_${GENOME}${_REPORT.csv,.fa}
        [workdir]/mirprof/libxx_filt-${FILTER_SUF}_${GENOME}_mibase{.uniq,_profile.csv,srna.fa}
        Cons fasta - libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_cons.fa
        Noncons fasta - libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons.fa
        [workdir]/data/count (?)
      Dependencies:
        java >= 7
        Patman
        UEA workbench (mirprof)
    _
    pipe_mircat.sh
    Description: process an interval of libraries though UEA workbench mircat.
    This is a memory intensive script, java has to be run with memory settings. Big genome have to be broken down into parts. For a 32G machine it can handle around 3-4Gb parts. So play round this parameters.
      Configure: Set MEMORY and THREADS var in the config/workdirs.cfg file.
      input: [First lib] [Last lib] [Genome]
      output:
        Basename=libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons
        mircat/${basename}_miRNA.fa
        mircat/${basename}_miRNA_hairpins.txt
        mircat/${basename}_ouput.csv _
      Dependencies:
        Java ~1.7
        UEA workbench (mircat)
    pipe_tasi.sh
    Description: Processes various file through the Tasi from UEA workbench. This script is not memory intensive no memory settings have to be set to run the java file. So far now genome size restrictions have been detected. (Tested up to 18G genome)
      Configuration: Set TASI_GENOME var in config/workidr.cfg _
      inputs: [First_lib][Last_lib]
      ouputs: [workdir]/data/tasi/libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons_tasi_{srnas.txt,locuslist.csv}
      Dependencies:
        Java ~1.7
        UEA workbench
    pipe_fasta.sh
    Description: Copies fasta files to workdir based on template.
    The template provided must be any identifying array of charactersimediatly before the serialization.
    Ex: Test-data-1.fa use --fasta data- or --fasta Test-data-
      Configuration: Set inserts_dir var in config/workidr.cfg _
      inputs: [First_lib][Last_lib][template]
      ouputs: [workdir]/data/fasta/
    pipe_fastq.sh
    Description: Copies fastq files to workdir based on template.
    The template provided must be any identifying array of charactersimediatly before the serialization.
    Ex: Test-data-1.fq use --fastq data- or --fastq Test-data-
    Can run a single file if only the first argument is given
    If no .fastq or .fq file is present in the directory (var inserts in config file) will check for fastq.gz .fq.gz files with the given template and extract them.
    Serialization mas be zero based ex: 1 should be 01 2-->02, ...
    Isn't removing adaptors currently a flag will be added later for this function.
      Configuration: Set inserts_dir var in config/workidr.cfg _
      inputs: [First_lib][Last_lib][template]
      ouputs: [workdir]/data/fasta/
    counts_merge.sh
    Description: Produces and merges together the count tables for the project
      Configuration: Set THREADS,workdir in config/workdir.cfg
      inputs: Config file only no arguments necessary
      ouputs: [workdir]/counts/