Consensus Sequence #3

j23414 · 2018-08-29T16:25:43Z

Any thoughts on smof being able to output a consensus sequence when given an alignment file?

input fasta

>seq_1
AAATATAT
>seq_2
AAAATTAT
>seq_3
AAATATAA

output fasta

>Consensus
AAATATAT

The text was updated successfully, but these errors were encountered:

arendsee · 2018-08-29T20:07:03Z

@j23414 Thanks for the suggestion. It's a fine idea and I think falls within the intended scope of smof. I'll put it on the TODO list.

I can now build a consensus table that counts the characters in each column of the alignment. If we want to reduce these to a single consensus string, then we need to decide how to resolve ties. ``` >seq_1 AAATATAT >seq_2 AAAATTAT >seq_3 AAATATAA ``` ``` A T 3 0 3 0 3 0 1 2 2 1 0 3 3 0 1 2 ```

arendsee · 2018-08-29T22:55:45Z

I've partially implemented a consensus function over on the dev branch. You can check out the comment in the commit message.

The current output is a table of character counts across the columns in the alignment. But if we want to go from this table to a single consensus string, we will need to decide how to resolve ties.

j23414 · 2018-08-30T14:09:46Z

For ties, would it make sense to output both options (both consensus sequences)?

Although I could see that becoming many consensus sequences (ties in more than 1 position)...

arendsee · 2018-08-30T14:49:30Z

I can't do that in FASTA format.

Perhaps if there is a near tie, I could use a wildcard, like *. For cutoff, I could use Shannon entropy. This is the standard measure conservation used in LOGO plots and the like.

j23414 · 2018-08-30T15:00:36Z

As a standard input for LOGO plots would be acceptable.

By multiple consensuses I'm talking about:

>consensus_1
AATATAT
>consensus_2
AAAATAT

I'm looking at influenza virus sequence which group as H1.alpha, H1.beta, H1.gamma, etc...
I need the consensus for each group (alpha, beta, gamma... ) and am aligning their consensus(es) so I can determine their major nucleotide (or amino acid) differences between groups.

This might explain why a vaccine works for alpha (targets certain amino acids), but not for gamma, etc...

No worries either way, I can work with the table of character counts. : )

arendsee · 2018-08-30T15:55:17Z

Hmm, I'd worry about a combinatorial explosion.

arendsee · 2018-09-18T18:54:51Z

@j23414 You can check out the lastest version (2.13.0) of smof. Now the consensus command prints the consensus (resolving ties alphabetically) by default. Printing a table is optional.

Is good?

j23414 · 2018-09-19T22:37:27Z

Good! : ) Consensus and table look good.

smof consensus fasta_aln.fna > consensus.fna
smof consensus -t fasta_aln.fna > consensus.tsv

arendsee added the enhancement label Aug 29, 2018

arendsee self-assigned this Aug 29, 2018

arendsee closed this as completed in 620d227 Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consensus Sequence #3

Consensus Sequence #3

j23414 commented Aug 29, 2018 •

edited

Loading

arendsee commented Aug 29, 2018

arendsee commented Aug 29, 2018

j23414 commented Aug 30, 2018

arendsee commented Aug 30, 2018

j23414 commented Aug 30, 2018

arendsee commented Aug 30, 2018

arendsee commented Sep 18, 2018

j23414 commented Sep 19, 2018

Consensus Sequence #3

Consensus Sequence #3

Comments

j23414 commented Aug 29, 2018 • edited Loading

arendsee commented Aug 29, 2018

arendsee commented Aug 29, 2018

j23414 commented Aug 30, 2018

arendsee commented Aug 30, 2018

j23414 commented Aug 30, 2018

arendsee commented Aug 30, 2018

arendsee commented Sep 18, 2018

j23414 commented Sep 19, 2018

j23414 commented Aug 29, 2018 •

edited

Loading