Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run pipeline on single sample #1

Open
dfermin opened this issue Mar 4, 2022 · 1 comment
Open

Cannot run pipeline on single sample #1

dfermin opened this issue Mar 4, 2022 · 1 comment

Comments

@dfermin
Copy link

dfermin commented Mar 4, 2022

Hello.

I'm trying to run SCOUT on a single sample I have from a 10X genomics run.
I have run the BAM file through the GATK recalibration steps and try to use SCOUT to call genotypes.

This is the command I use:

python ../SCOUT/bin/SCOUT_WholeGenome.py -N samp154 -r $FASTA -i ../samp154.recal.bam -o ../out -P 20 -c chr2 -S 1000

The program runs for a bit and then I get this error:

Fri Mar  4 15:18:01 2022: The first Annotation finished!
Fri Mar  4 15:18:01 2022: The second Annotation finished!
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 30, in WorkPip
    PipEST.MakeCandidateDf(chrom, Start, End)
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/SCOUT.git/source/Calculate/Candidate.py", line 296, in MakeCandidateDf
    self.GetCutoffSimple()
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/SCOUT.git/source/Calculate/Candidate.py", line 316, in GetCutoffSimple
    estimator.fit(pd.DataFrame(MixDf['RawRate']))
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/cluster/_agglomerative.py", line 917, in fit
    X = self._validate_data(X, ensure_min_samples=2, estimator=self)
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/base.py", line 566, in _validate_data
    X = check_array(X, **check_params)
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/site-packages/sklearn/utils/validation.py", line 805, in check_array
    raise ValueError(
ValueError: Found array with 1 sample(s) (shape=(1, 1)) while a minimum of 2 is required by AgglomerativeClustering.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 169, in <module>
    main(sys.argv[1:])
  File "/nfs/scratch/projects/dfermin.scRNAseq_genotyping/bin/../SCOUT.git/bin/SCOUT.py", line 147, in main
    res = ResultPool[k].get()
  File "/home/dfermin/.conda/envs/scout/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
ValueError: Found array with 1 sample(s) (shape=(1, 1)) while a minimum of 2 is required by AgglomerativeClustering.

This happens with both SCOUT scripts.
Any suggestion on how to fix it?

Thanks

@Goatofmountain
Copy link
Owner

Hello,
I've tried to run SCOUT on single-cell WGS or bulk WGS data, and the pipline is OK.
Would you mind to provide one of your bam file for me to reproduce this error ?

Thanks,
Kailing Tu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants