Influenza Classification Suite

Influenza viruses continually evolve to evade population immunity. We have developed a publicly-available Galaxy workflow Influenza Classification Suite, for rapid clade-mapping of sequenced influenza viruses. This suite provides rapid, high-resolution understanding of circulating influenza strain evolution to inform influenza vaccine effectiveness and the need for potential vaccine reformulation.

Installation

Galaxy Tools

In the Galaxy Admin panel, select 'Install New Tools'
Select the 'Galaxy Test Toolshed'
Search for flu_classification_suite
Click the button labeled flu_classification_suite and select 'Preview and install'
Click 'Install to Galaxy'
Select a tool panel section to install the tools under, or create a new section. We recommend creating a section called 'Influenza Classification Suite'
Click 'Install'

Command Line Tools

Create a new conda environment on the command line: conda create -n biopython biopython
Acitvate the environment: conda activate biopython
Clone the flu_classification_suite repository as follows:
- Select "Clone or download" on the main respository page
- Copy the url by selecting the clipboard icon
- Type the following, pasting the copied url in place of <repo_url>: git clone <repo_url>

General How-To

Task	Action
Upload a fasta file	Select Get Data from the Tools menu Select Upload File from your computer Drag the fasta file(s) into the window Select Start to upload the file(s) Select Close once each file's green progress bar reads 100% Collapse the Get Data menu by selecting it
Upload a comma-separated value (csv) file	Select Get Data from the Tools menu Select Upload File from your computer Drag the csv file(s) into the window Select "csv" under the Type column Select Start to upload the file(s) Select Close once each file's green progress bar reads 100% Collapse the Get Data menu by selecting it
View the contents of a file	Select the eye icon by the filename
Edit a file	Select the pencil icon by the filename
Remove a file	Select the `X` icon by the filename
Download a file	Select the computer disk icon by the filename
View the metadata of a file	Select the information icon by the filename
Use an individual tool	Select the tool under Flu Classification Suite in the Tools pane Input the required files Execute the operation
Use a workflow	Select the workflow under Flu Classification Suite in the Tools pane Input the required files Select Run workflow to execute the operations
Determine if a tool is running	The operation will display as highlighted in grey while it is waiting to start on the server, as yellow during execution, and as green when complete

New to Galaxy? Try Galaxy 101 tutorial

Tools

Template files and sample input and output files can be found in the 'test-data' folder for each respective tool.

Each tool can be selected from the “Influenza Classification Suite” menu and used individually or chained together to create a workflow.

Change Fasta Deflines

Renames definition lines in fasta files. Requires a fasta file requiring sequence name changes and a 2-column renaming file (either tab-delimited text or csv). Searches for fasta definition lines matching column 1 and, if found, replaces fasta definition line with string specified in column 2 of the renaming file.

Input - Sequence file to be renamed (fasta), 2-column renaming file (csv or txt)
Output - fasta

Command line usage

Using csv renaming file:

  python change_fasta_def_lines.py csv_rename_file.csv fasta_2_rename.fasta renamedSequences.fasta

Using tsv renaming file:

  python change_fasta_def_lines.py tab_delim_rename_file.txt -t fasta_2_rename.fasta renamedSequences.fasta

Galaxy tool usage

Create a renaming file in Excel with current sequence names in column 1 and desired names in column 2 and export in csv or tsv format
Upload this file into Galaxy
If the renaming file is tab-delimited text, select the pencil icon beside the file name, select datatypes, and ensure the datatype displayed is "csv"
Select the Change Fasta Deflines tool
Choose a fasta file under the “input_fasta” parameter
Choose a renaming file under the “key_value_pairs” parameter
Select Yes/No under the “Names file is tab-delimited” (Note: The default renaming file format is csv and the default selection is set to “No”)
Press Execute to start the operation

Assign Clades

Assigns clade designations to influenza HA amino acid fasta files.

Input - Sequence files (fasta), clade definition file (csv)
Output - fasta

Command line usage

python assign_clades.py input_sequences.fasta clade_definitions.csv clade-assigned-output-sequences.fasta

Galaxy tool usage

Select the Assign Clades tool
Select the fasta file (without clades assigned) containing the amino acid sequences under input_fasta
Select the clade definition file under clade_definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Execute the operation
Download the output file

Antigenic Site Extraction

Extracts antigenic amino acids from influenza hemagglutinin (HA) sequences, using a flu type-specific array of amino acid positions to be extracted (i.e. for H3, H1 etc.), and outputs as a fasta file.

Input - Assign Clades output (fasta) (e.g. Flu A/H3), amino acid index array (csv) (e.g. H3 index array)
Output - fasta, csv

Command line usage

Output extracted antigenic amino acids to fasta:

  python antigenic_site_extraction.py input.fasta index-array.csv extracted-antigenic-sites.fasta

Output extracted antigenic amino acids to csv:

  python antigenic_site_extraction.py input.fasta index-array.csv -c extracted-antigenic-sites.csv

Galaxy tool usage

Select the Antigenic Site Extraction tool
Select the fasta file with protein sequences to extract under input_fasta
Select the antigenic site index array file under index_array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Choose Yes to output results in csv format (Note: The default No selection outputs results in fasta)
Execute the operation
Download the output file

Line List

Transforms fasta files of flu antigenic site amino acids into line lists, comparing antigenic maps to that of a reference sequence.

Input - Antigenic Site Extraction output (fasta) (e.g. Flu A/H3 antigenic sites), reference strain (fasta), amino acid index array (csv) (e.g. Flu A/H3 index array, clade definition file (csv)
Output - csv

Command line usage

python linelisting.py input.fasta reference-antigenic-sites.fasta index-array.csv clade-definitions.csv output.csv

Galaxy tool usage

Select the Line List tool
Input the fasta file with extracted antigenic maps and clade calls under Sample Sequences fasta
Input the fasta file with the reference sequence antigenic map under Reference Sequence fasta
Select the index array under Antigenic Site Index Array File (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the clade definition file under Clade Definition File (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Execute the operation
Download the output file
View the line list as a spreadsheet to compare antigenic amino acid sequences of samples to the reference.

Aggregate Line List

Transforms fasta files of flu antigenic site amino acids into aggregated line lists, comparing antigenic maps to that of a reference sequence and collapsing and enumerating identical sequences.

Input - Antigenic Site Extraction output (fasta) (e.g. Flu A/H3 extracted antigenic sites), reference strain (fasta), amino acid index array (csv), clade definition file (csv)
Output - csv

Command line usage

python aggregate_linelisting.py input.fasta reference-antigenic-sites.fasta index-array.csv clade-definitions.csv output.csv

Galaxy tool usage

Select the Aggregate Line List tool
Input the fasta file with extracted antigenic maps and clade calls under Sample Sequences fasta
Input the fasta file with the reference sequence antigenic map under Reference Sequence fasta
Select the index array under Antigenic Site Index Array File (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the clade definition file under Clade Definition File (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Execute the operation
Download the output file
View the line list as a spreadsheet to compare antigenic amino acid sequences of samples to the reference.

Reformat USearch-Collapsed Fasta

Parses format of USearch-collapsed fasta output files and applies a custom format to the fasta definition lines.

Input - USearch-outputted sequence files (fasta)

Output - sequence files with custom-formatted definition lines (fasta)

Command line usage

python reformat_usearch_collapsed_fasta.py usearch_collapsed_sequences.fasta output.fasta

Galaxy tool usage

Select the Reformat USearch-Collapsed Fasta tool
Select the USearch-collapsed fasta file to reformat under input_fasta
Execute the operation
Download the output file

Workflows

While each tool could be selected from the “Influenza Classification Suite” menu and used individually, a workflow was created by chaining tools in a pipeline to automate a series of tasks in a standardized, user-friendly manner.

Assign clades and extract antigenic maps

This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences. It then extracts the antigenic amino acids and outputs the resulting antigenic maps in fasta format.

Input - Sequence files (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided clade definition and amino acid index array files or provide your own respective versions of these files)

Output - csv

Galaxy tool usage

Select the workflow Assign clades and extract antigenic maps
Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the fasta file to perform all operations as the input_fasta under Assign Clades
Select Run workflow to execute the operations
Wait for each workflow step to complete (highlighted in green)
Download or delete output files from each step as desired

Assign clades, extract antigenic maps and output to line list

This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences, extracts the antigenic sites and displays the resulting sequences in relation to a reference (e.g. vaccine strain) sequence in a csv file.

Input - Sequence files (fasta), reference antigenic map (e.g. vaccine influenza strain) (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided reference antigenic map, clade definition and amino acid index array files or provide your own respective versions of these files)

Output - csv

Galaxy tool usage

Select the workflow Assign clades, extract antigenic maps and output to line list
Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the reference antigenic map under Reference Antigenic Map (Note: The reference antigenic map file must be the extracted antigenic site amino acids in fasta format and of the correct version for the respective flu type to obtain accurate results)
Select the fasta file to perform all operations as the input_fasta under Assign Clades
Select Run workflow to execute the operations
Wait for each workflow step to complete (highlighted in green)
Download or delete output files from each step as desired

Assign clades, extract antigenic maps and output to aggregated line list

This workflow assigns and appends clade names to fasta definition lines of flu HA amino acid sequences, extracts antigenic sites and displays the resulting sequences in relation to a reference (e.g. vaccine strain) sequence in a csv file. In addition, the aggregated view collapses and enumerates identical antigenic map sequences among samples.

Input - Sequence files (fasta), reference antigenic map (e.g. vaccine influenza strain) (fasta), clade definition file (csv), amino acid index array (csv) (Note: Use the provided reference antigenic map, clade definition and amino acid index array files or provide your own respective versions of these files)

Output - csv

Galaxy tool usage

Select the workflow Assign clades, extract antigenic maps and output to aggregated line list
Select whether to send the results to a new history (Note: This is not required but facilitates convenient tracking and deletion of files within an analysis run)
Select the clade definition file under Clade Definitions (Note: The clade definition file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the index array under Antigenic Amino Acid Index Array (Note: The index array file must be in csv format and of the correct version for the respective flu type to obtain accurate results)
Select the reference antigenic map under Reference Antigenic Map (Note: The reference antigenic map file must be the extracted antigenic site amino acids in fasta format and of the correct version for the respective flu type to obtain accurate results)
Select the fasta file to perform all operations as the input_fasta under Assign Clades
Select Run workflow to execute the operations
Wait for each workflow step to complete (highlighted in green)
Download or delete output files from each step as desired

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
doc/images		doc/images
suites/flu_classification_suite		suites/flu_classification_suite
tools		tools
workflows		workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Influenza Classification Suite

Table of Contents

Installation

Galaxy Tools

Command Line Tools

General How-To

Tools

Change Fasta Deflines

Assign Clades

Antigenic Site Extraction

Line List

Aggregate Line List

Reformat USearch-Collapsed Fasta

Workflows

Assign clades and extract antigenic maps

Assign clades, extract antigenic maps and output to line list

Assign clades, extract antigenic maps and output to aggregated line list

About

Releases

Packages

Contributors 4

Languages

Public-Health-Bioinformatics/flu_classification_suite

Folders and files

Latest commit

History

Repository files navigation

Influenza Classification Suite

Table of Contents

Installation

Galaxy Tools

Command Line Tools

General How-To

Tools

Change Fasta Deflines

Assign Clades

Antigenic Site Extraction

Line List

Aggregate Line List

Reformat USearch-Collapsed Fasta

Workflows

Assign clades and extract antigenic maps

Assign clades, extract antigenic maps and output to line list

Assign clades, extract antigenic maps and output to aggregated line list

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages