Influenza viruses continually evolve to evade population immunity. We have developed a publicly-available Galaxy workflow Influenza Classification Suite, for rapid clade-mapping of sequenced influenza viruses. This suite provides rapid, high-resolution understanding of circulating influenza strain evolution to inform influenza vaccine effectiveness and the need for potential vaccine reformulation.
- In the Galaxy Admin panel, select 'Install New Tools'
- Select the 'Galaxy Test Toolshed'
- Search for
flu_classification_suite
- Click the button labeled
flu_classification_suite
and select 'Preview and install' - Click 'Install to Galaxy'
- Select a tool panel section to install the tools under, or create a new section. We recommend creating a section called 'Influenza Classification Suite'
- Click 'Install'
- Create a new conda environment on the command line:
conda create -n biopython biopython
- Acitvate the environment:
conda activate biopython
- Clone the
flu_classification_suite
repository as follows:- Select "Clone or download" on the main respository page
- Copy the url by selecting the clipboard icon
- Type the following, pasting the copied url in place of <repo_url>:
git clone <repo_url>
Task | Action |
---|---|
Upload a fasta file |
|
Upload a comma-separated value (csv) file |
|
View the contents of a file |
|
Edit a file |
|
Remove a file |
|
Download a file |
|
View the metadata of a file |
|
Use an individual tool |
|
Use a workflow |
|
Determine if a tool is running | The operation will display as highlighted in grey while it is waiting to start on the server, as yellow during execution, and as green when complete |
New to Galaxy? Try Galaxy 101 tutorial
Template files and sample input and output files can be found in the 'test-data' folder for each respective tool.
Each tool can be selected from the “Influenza Classification Suite” menu and used individually or chained together to create a workflow.
Renames definition lines in fasta files. Requires a fasta file requiring sequence name changes and a 2-column renaming file (either tab-delimited text or csv). Searches for fasta definition lines matching column 1 and, if found, replaces fasta definition line with string specified in column 2 of the renaming file.
Input - Sequence file to be renamed (fasta), 2-column renaming file (csv or txt)
Output - fasta
Command line usage
Using csv renaming file:
python change_fasta_def_lines.py csv_rename_file.csv fasta_2_rename.fasta renamedSequences.fasta
Using tsv renaming file:
python change_fasta_def_lines.py tab_delim_rename_file.txt -t fasta_2_rename.fasta renamedSequences.fasta
Galaxy tool usage
- Create a renaming file in Excel with current sequence names in column 1 and desired names in column 2 and export in csv or tsv format
- Upload this file into Galaxy
- If the renaming file is tab-delimited text, select the pencil icon beside the file name, select datatypes, and ensure the datatype displayed is "csv"
- Select the Change Fasta Deflines tool
- Choose a fasta file under the “input_fasta” parameter
- Choose a renaming file under the “key_value_pairs” parameter
- Select Yes/No under the “Names file is tab-delimited” (Note: The default renaming file format is csv and the default selection is set to “No”)
- Press Execute to start the operation
Assigns clade designations to influenza HA amino acid fasta files.
Input - Sequence files (fasta), clade definition file (csv)
Output - fasta
Command line usage
python assign_clades.py input_sequences.fasta clade_definitions.csv clade-assigned-output-sequences.fasta
Galaxy tool usage
- Select the Assign Clades tool
- Select the fasta file (without clades assigned) containing the amino acid sequences under input_fasta
- Select the clade definition file under clade_definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Execute the operation
- Download the output file
Extracts antigenic amino acids from influenza hemagglutinin (HA) sequences, using a flu type-specific array of amino acid positions to be extracted (i.e. for H3, H1 etc.), and outputs as a fasta file.
Input - Assign Clades output (fasta) (e.g. Flu A/H3), amino acid index array (csv) (e.g. H3 index array)
Output - fasta, csv
Command line usage
Output extracted antigenic amino acids to fasta:
python antigenic_site_extraction.py input.fasta index-array.csv extracted-antigenic-sites.fasta
Output extracted antigenic amino acids to csv:
python antigenic_site_extraction.py input.fasta index-array.csv -c extracted-antigenic-sites.csv
Galaxy tool usage
- Select the Antigenic Site Extraction tool
- Select the fasta file with protein sequences to extract under input_fasta
- Select the antigenic site index array file under index_array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Choose Yes to output results in csv format (Note: The default No selection outputs results in fasta)
- Execute the operation
- Download the output file
Transforms fasta files of flu antigenic site amino acids into line lists, comparing antigenic maps to that of a reference sequence.
Input - Antigenic Site Extraction output (fasta) (e.g. Flu A/H3 antigenic sites), reference strain (fasta), amino acid index array (csv) (e.g. Flu A/H3 index array, clade definition file (csv)
Output - csv
Command line usage
python linelisting.py input.fasta reference-antigenic-sites.fasta index-array.csv clade-definitions.csv output.csv
Galaxy tool usage
- Select the Line List tool
- Input the fasta file with extracted antigenic maps and clade calls under Sample Sequences fasta
- Input the fasta file with the reference sequence antigenic map under Reference Sequence fasta
- Select the index array under Antigenic Site Index Array File (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the clade definition file under Clade Definition File (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Execute the operation
- Download the output file
- View the line list as a spreadsheet to compare antigenic amino acid sequences of samples to the reference.
Transforms fasta files of flu antigenic site amino acids into aggregated line lists, comparing antigenic maps to that of a reference sequence and collapsing and enumerating identical sequences.
Input - Antigenic Site Extraction output (fasta) (e.g. Flu A/H3 extracted antigenic sites), reference strain (fasta), amino acid index array (csv), clade definition file (csv)
Output - csv
Command line usage
python aggregate_linelisting.py input.fasta reference-antigenic-sites.fasta index-array.csv clade-definitions.csv output.csv
Galaxy tool usage
- Select the Aggregate Line List tool
- Input the fasta file with extracted antigenic maps and clade calls under Sample Sequences fasta
- Input the fasta file with the reference sequence antigenic map under Reference Sequence fasta
- Select the index array under Antigenic Site Index Array File (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the clade definition file under Clade Definition File (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Execute the operation
- Download the output file
- View the line list as a spreadsheet to compare antigenic amino acid sequences of samples to the reference.
Parses format of USearch-collapsed fasta output files and applies a custom format to the fasta definition lines.
Input - USearch-outputted sequence files (fasta)
Output - sequence files with custom-formatted definition lines (fasta)
Command line usage
python reformat_usearch_collapsed_fasta.py usearch_collapsed_sequences.fasta output.fasta
Galaxy tool usage
- Select the Reformat USearch-Collapsed Fasta tool
- Select the USearch-collapsed fasta file to reformat under input_fasta
- Execute the operation
- Download the output file
While each tool could be selected from the “Influenza Classification Suite” menu and used individually, a workflow was created by chaining tools in a pipeline to automate a series of tasks in a standardized, user-friendly manner.
This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences. It then extracts the antigenic amino acids and outputs the resulting antigenic maps in fasta format.
Input - Sequence files (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided clade definition and amino acid index array files or provide your own respective versions of these files)
Output - csv
Galaxy tool usage
- Select the workflow Assign clades and extract antigenic maps
- Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
- Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the fasta file to perform all operations as the input_fasta under Assign Clades
- Select Run workflow to execute the operations
- Wait for each workflow step to complete (highlighted in green)
- Download or delete output files from each step as desired
This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences, extracts the antigenic sites and displays the resulting sequences in relation to a reference (e.g. vaccine strain) sequence in a csv file.
Input - Sequence files (fasta), reference antigenic map (e.g. vaccine influenza strain) (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided reference antigenic map, clade definition and amino acid index array files or provide your own respective versions of these files)
Output - csv
Galaxy tool usage
- Select the workflow Assign clades, extract antigenic maps and output to line list
- Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
- Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the reference antigenic map under Reference Antigenic Map (Note: The reference antigenic map file must be the extracted antigenic site amino acids in fasta format and of the correct version for the respective flu type to obtain accurate results)
- Select the fasta file to perform all operations as the input_fasta under Assign Clades
- Select Run workflow to execute the operations
- Wait for each workflow step to complete (highlighted in green)
- Download or delete output files from each step as desired
This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences, extracts antigenic sites and displays the resulting sequences in relation to a reference (e.g. vaccine strain) sequence in a csv file. In addition, the aggregated view collapses and enumerates identical antigenic map sequences among samples.
Input - Sequence files (fasta), reference antigenic map (e.g. vaccine influenza strain) (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided reference antigenic map, clade definition and amino acid index array files or provide your own respective versions of these files)
Output - csv
Galaxy tool usage
- Select the workflow Assign clades, extract antigenic maps and output to aggregated line list
- Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
- Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
- Select the reference antigenic map under Reference Antigenic Map (Note: The reference antigenic map file must be the extracted antigenic site amino acids in fasta format and of the correct version for the respective flu type to obtain accurate results)
- Select the fasta file to perform all operations as the input_fasta under Assign Clades
- Select Run workflow to execute the operations
- Wait for each workflow step to complete (highlighted in green)
- Download or delete output files from each step as desired