- extract_fasteris_inserts.sh
(Uncommon - Legacy) Description: Given a directory with fasteris inserts (no adaptors) and an interval of libraries. The libraries are extracted, concatenated and converted to fasta. Fastq quality scores are ploted
- inputs: [First_lib][Last_Lib]
- outputs:
- [workdir]/data/fastq
- [workdir]/data/fasta
- [workdir]/data/quality
- [workdir]/count
- dependencies:
- tar
- fastq_to_fasta
- fastx_quality_stats
- fastq_quality_boxplot_graph.sh
- fastq_xtract.sh, lib_cat, fq_to_fa.sh
- extract_lcscience_inserts.sh _
(Deprecated-Soon to be removed) Description: The libraries in [.fastq.gz] format are extracted and converted to fasta. Fastq quality scores are ploted. The template arguments is necessary if a range of lib are given.
The template must be a substring of the file preceading the lib number. Template + lib number should identify only one file in the inserts_dir _directory
- Configs: config/workdir.cfg
- INSERTS_DIR if a range of arguments is supplied
- ADAPTOR adaptor sequence to be clipped
- LCSCIENCE_LIB if only one lib is to be extracted this value will be used
- inputs: [First_lib] [Last_Lib] [TEMPLATE]
- outputs:
- [workdir]/data/fastq
- [workdir]/data/fasta
- [workdir]/data/quality
- dependencies:
- tar
- fastq_to_fasta
- fastx_quality_stats
- fastq_quality_boxplot_graph.sh
- fastq_xtract.sh, lib_cat, fq_to_fa.sh
- pipe_trim_adaptor.sh _
Description: Trim adaptors from fasta libs.
The adaptor sequence must be set in the variable ADAPTOR in the workdirs.cfg configuration file.
- Configs: config/workdir.cfg
- ADAPTOR adaptor sequence to be clipped
- inputs: [First_lib] [Last_Lib]
- outputs:
- [workdir]/data/fasta/libxx_report.txt
- [workdir]/data/fasta/libxx_
- dependencies:
- fastx_clipper
- pipe_filter_wbench.sh
Description: Given an interval of libraries the script filters them through the workbench filter using the configs in the config file. Mirbase database in config file workpath.cfg
- input: [First_lib] [Last_lib]
- Output: Filtered fasta
- filter_overview/Libxx_filt-${FILTER_SUF}.{csv,fa}
- pipe_filter_genome_mirbase.sh
Description: Given an interval of libraries the script aligns them to a reference genome and keeps reads that align with a mismatch of X, using patman. Align previous reads with mirbase v20 matrue.fa. Reads that align are sent to the cons file while those that don't are sent to the noncons file. This filter using the configs in the config file. Mirbase database in config file workpath.cfg
config/workdirs.cfg [THREAD] [GENOME]
missing a config file (Next update)
- input: [First_lib] [Last_lib]
- Output:
- [workdir]data/filter_genome/libX_filt-${FILTER_SUF}_${GENOME}${_REPORT.csv,.fa}
- [workdir]/mirprof/libxx_filt-${FILTER_SUF}_${GENOME}_mibase{.uniq,_profile.csv,srna.fa}
- Cons fasta - libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_cons.fa
- Noncons fasta - libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons.fa
- [workdir]/data/count (?)
- Dependencies:
- java >= 7
- Patman
- UEA workbench (mirprof)
- pipe_mircat.sh
Description: process an interval of libraries though UEA workbench mircat.
This is a memory intensive script, java has to be run with memory settings. Big genome have to be broken down into parts. For a 32G machine it can handle around 3-4Gb parts. So play round this parameters.
- Configure: Set MEMORY and THREADS var in the config/workdirs.cfg file.
- input: [First lib] [Last lib] [Genome]
- output:
- Basename=libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons
- mircat/${basename}_miRNA.fa
- mircat/${basename}_miRNA_hairpins.txt
- mircat/${basename}_ouput.csv _
- Dependencies:
- Java ~1.7
- UEA workbench (mircat)
- pipe_tasi.sh
Description: Processes various file through the Tasi from UEA workbench. This script is not memory intensive no memory settings have to be set to run the java file. So far now genome size restrictions have been detected. (Tested up to 18G genome)
- Configuration: Set TASI_GENOME var in config/workidr.cfg _
- inputs: [First_lib][Last_lib]
- ouputs: [workdir]/data/tasi/libxx_filt-${FILTER_SUF}_${GENOME}_mirbase_noncons_tasi_{srnas.txt,locuslist.csv}
- Dependencies:
- Java ~1.7
- UEA workbench
- pipe_fasta.sh
Description: Copies fasta files to workdir based on template.
The template provided must be any identifying array of charactersimediatly before the serialization.
Ex: Test-data-1.fa use --fasta data- or --fasta Test-data-
- Configuration: Set inserts_dir var in config/workidr.cfg _
- inputs: [First_lib][Last_lib][template]
- ouputs: [workdir]/data/fasta/
- pipe_fastq.sh
Description: Copies fastq files to workdir based on template.
The template provided must be any identifying array of charactersimediatly before the serialization.
Ex: Test-data-1.fq use --fastq data- or --fastq Test-data-
Can run a single file if only the first argument is given
If no .fastq or .fq file is present in the directory (var inserts in config file) will check for fastq.gz .fq.gz files with the given template and extract them.
Serialization mas be zero based ex: 1 should be 01 2-->02, ...
Isn't removing adaptors currently a flag will be added later for this function.
- Configuration: Set inserts_dir var in config/workidr.cfg _
- inputs: [First_lib][Last_lib][template]
- ouputs: [workdir]/data/fasta/
- counts_merge.sh
Description: Produces and merges together the count tables for the project
- Configuration: Set THREADS,workdir in config/workdir.cfg
- inputs: Config file only no arguments necessary
- ouputs: [workdir]/counts/