You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to run the hybrid taxonomy module from AMPtk on a small set of ASVs (about 4K) from metabarcoding data generated with the 12S Mimammal primer, which were processed with DADA2, along with the 12S Midori2 reference dataset MIDORI2_UNIQ_NUC_GB261_srRNA_SINTAX.fasta.
I am having some issues with the output of AMPtk taxonomy which is not behaving in the way that is described in the AMPtk taxonomy read the docs (especially regarding the LCA method).
GS method (1267/3939 ASVs): all ASVs have final hybrid assignment at species level in this category. In most cases both the usearch Global Alignment (GA) and sintax were above the threshold (97% and 0.8 respectively, N = 1115), but in the rest of the cases GA provided more complete taxonomic level information than sintax, and the hybrid taxonomic assignment followed the full GA assignment instead of applying the LCA (at the lowest taxonomic level above the threshold for sintax), which is what I expected from the description "6. If the best Global Alignment result is greater than 97% identical then that hit is retained. A final LCA algorithm is applied to the Global Alignment hit and the Best Bayesian Classifier hit". More importantly there were 41 cases where GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA.
GSL method (286 ASVs): has GA agreeing with sintax up to a given taxonomic level, from which the method retaining the most taxonomic level was retained as the hybrid assignment (either GA or sintax) . This does not really correspond to LCA.
SS method (2386 ASVs): in at least 50 cases, GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA. This could be a more widespread problem.
The GDL method was not used. I was wondering if this output behavior is normal ?
Here is the .sh script below:
#!/bin/bash
#SBATCH --account=def-mcristes
#SBATCH --mem-per-cpu=4775M
#SBATCH --cpus-per-task=10
#SBATCH --time=24:00:00
#SBATCH --mail-user=[email protected]
#SBATCH --mail-type=ALL
Hello Jon Palmer,
I was trying to run the hybrid taxonomy module from AMPtk on a small set of ASVs (about 4K) from metabarcoding data generated with the 12S Mimammal primer, which were processed with DADA2, along with the 12S Midori2 reference dataset MIDORI2_UNIQ_NUC_GB261_srRNA_SINTAX.fasta.
I am having some issues with the output of AMPtk taxonomy which is not behaving in the way that is described in the AMPtk taxonomy read the docs (especially regarding the LCA method).
GS method (1267/3939 ASVs): all ASVs have final hybrid assignment at species level in this category. In most cases both the usearch Global Alignment (GA) and sintax were above the threshold (97% and 0.8 respectively, N = 1115), but in the rest of the cases GA provided more complete taxonomic level information than sintax, and the hybrid taxonomic assignment followed the full GA assignment instead of applying the LCA (at the lowest taxonomic level above the threshold for sintax), which is what I expected from the description "6. If the best Global Alignment result is greater than 97% identical then that hit is retained. A final LCA algorithm is applied to the Global Alignment hit and the Best Bayesian Classifier hit". More importantly there were 41 cases where GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA.
GSL method (286 ASVs): has GA agreeing with sintax up to a given taxonomic level, from which the method retaining the most taxonomic level was retained as the hybrid assignment (either GA or sintax) . This does not really correspond to LCA.
SS method (2386 ASVs): in at least 50 cases, GA and sintax did not agree but it was not flagged for disagreement and not corrected with LCA. This could be a more widespread problem.
The GDL method was not used. I was wondering if this output behavior is normal ?
Here is the .sh script below:
#!/bin/bash
#SBATCH --account=def-mcristes
#SBATCH --mem-per-cpu=4775M
#SBATCH --cpus-per-task=10
#SBATCH --time=24:00:00
#SBATCH --mail-user=[email protected]
#SBATCH --mail-type=ALL
cd /home/msalamon/
module load StdEnv/2020 gcc/9.3.0 python/3.9 vsearch/2.28.1
virtualenv --no-download $SLURM_TMPDIR/env
source $SLURM_TMPDIR/env/bin/activate
pip install --no-index --upgrade pip
pip install --no-index -r amptk-reqs.txt
cd /home/msalamon/projects/def-mcristes/msalamon/scripts/AMPtk/12SMimammal/
amptk taxonomy -f ASVs_12SMimammal_length_filter_DADA2.fasta -o AMPtk_12SMimammal_Midori2_res.txt --usearch_db MIDORI2_UNIQ_NUC_GB261_srRNA_SINTAX.fasta --method hybrid --sintax_cutoff 0.8 --cpus $SLURM_CPUS_PER_TASK
AMPtk_12SMimammal_Midori2_res.taxonomy.txt
AMPtk_12SMimammal_Midori2_results.xlsx
The text was updated successfully, but these errors were encountered: