Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in subgenome separation with K-mer and TEs #10

Open
kashiff007 opened this issue May 27, 2022 · 3 comments
Open

Error in subgenome separation with K-mer and TEs #10

kashiff007 opened this issue May 27, 2022 · 3 comments

Comments

@kashiff007
Copy link

kashiff007 commented May 27, 2022

Hi, I am trying to run ploycracker directly with the command line. I have a allotetraploid genome and want to separate the subgenomes (which contain 2 subgenomes). I assembled the genome seq in fasta format. I am using following command

polycracker my_pipeline

Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
N E X T F L O W  ~  version 19.04.1
Launching `polycracker.nf` [high_cori] - revision: 34523bee09
./blast_files/
./kmercount_files/
./test_data/test_fasta_files/
./bed_files/
5
4
algae.fa
1
2
3
50000
0
26
13
linear
30
0
cosine
30
20
10,2
50000
1
0
0
0
1
0
0
2000000
1
0
1
1
3
tsne
SpectralClustering
1
1
1
1
0
1
0
1
1
1
1
1
[warm up] executor > local
executor >  local (5)
[1b/c37745] process > splitFastaProcess [100%] 1 of 1 ✔
[ec/38f1f0] process > writeKmerCount    [100%] 1 of 1 ✔
[ca/b2b9d8] process > kmer2Fasta        [100%] 1 of 1 ✔
executor >  local (5)
[1b/c37745] process > splitFastaProcess [100%] 1 of 1 ✔
[ec/38f1f0] process > writeKmerCount    [100%] 1 of 1 ✔
[ca/b2b9d8] process > kmer2Fasta        [100%] 1 of 1 ✔
executor >  local (5)
[1b/c37745] process > splitFastaProcess [100%] 1 of 1 ✔
[ec/38f1f0] process > writeKmerCount    [100%] 1 of 1 ✔
[ca/b2b9d8] process > kmer2Fasta        [100%] 1 of 1 ✔
[cc/139af8] process > createOrigDB      [100%] 1 of 1 ✔
[02/0dbb46] process > BlastOff          [100%] 1 of 1, failed: 1 ✘


WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
ERROR ~ Error executing process > 'BlastOff (1)'

Caused by:
  Process `BlastOff (1)` terminated with an error exit status (1)

Command executed:

  #!/bin/bash
  cwd=$(pwd)
  cd /workdir/polycracker
  polycracker blast_kmers -m 5 -t 4 -r /workdir/polycracker/./test_data/test_fasta_files/algae_split.fa -k $cwd/query.fa -o results.sam -pm 0 -kl 13
  cd -
  mv /workdir/polycracker/results.sam blast_result

Command exit status:
  1

Command output:
  /workdir/polycracker/work/02/0dbb46caaaed2efb5ac9767c297e62

Command error:
  java -Djava.library.path=/opt/conda/opt/bbmap-38.49-0/jni/ -ea -Xmx5g -cp /opt/conda/opt/bbmap-38.49-0/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 vslow=t ambiguous=all noheader=t secondary=t k=13 perfectmode=f threads=4 maxsites=2000000000 outputunmapped=f ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa in=query.fa outm=results.sam -Xmx5g
  Picked up _JAVA_OPTIONS: -Xms5G -Xmx5G
  Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, ambiguous=all, noheader=t, secondary=t, k=13, perfectmode=f, threads=4, maxsites=2000000000, outputunmapped=f, ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa, in=query.fa, outm=results.sam, -Xmx5g]
  Version 38.49
  
  Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250
  Set threads to 4
  Set OUTPUT_MAPPED_ONLY to true
  Retaining all best sites for ambiguous mappings.
  NOTE:nIgnoring reference file because it already appears to have been processed.
  NOTE:aIf you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
  Set genome to 1
  
  Loaded Reference:sults1.895 seconds.
  Loading index for chunk 1-1, build 1
  Generated Index:	2.473 seconds.
  /opt/conda/bin/bbmap.sh: line 377:  3047 Killed                  java -Djava.library.path=/opt/conda/opt/bbmap-38.49-0/jni/ -ea -Xmx5g -cp /opt/conda/opt/bbmap-38.49-0/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 vslow=t ambiguous=all noheader=t secondary=t k=13 perfectmode=f threads=4 maxsites=2000000000 outputunmapped=f ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa in=query.fa outm=results.sam -Xmx5g
  mv: cannot stat '/workdir/polycracker/results.sam': No such file or directory

Work dir:
  /workdir/polycracker/work/02/0dbb46caaaed2efb5ac9767c297e62

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 4547, in final_stats
    ctx.invoke(convert_subgenome_output_to_pickle,input_dir=polycracker_bed, scaffolds_pickle='scaffolds_stats.p', output_pickle='scaffolds_stats.poly.labels.p')
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1908, in convert_subgenome_output_to_pickle
    for file in os.listdir(input_dir):
OSError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/clusterResults/'
Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1898, in convert_subgenome_output_to_pickle
    scaffolds = pickle.load(open(scaffolds_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
cp: cannot stat 'polycracker.stats.analysis.csv': No such file or directory
cp: cannot stat 'SpectralClusteringmain_tsne_2_n3ClusterTest.html': No such file or directory
awk: cannot open blasted_merged.bed (No such file or directory)
/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py:1372: UserWarning:

genfromtxt: Empty input file: "<open file 'awk \'{print gsub(/,/,"")+1}\' blasted_merged.bed', mode 'r' at 0x7f817cef95d0>"

/opt/conda/lib/python2.7/site-packages/seaborn/distributions.py:198: RuntimeWarning:

Mean of empty slice.

/opt/conda/lib/python2.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning:

invalid value encountered in double_scalars

Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1266, in plotPositions
    labels = pickle.load(open(labels_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
Please see results in ./test_results.

Original genome in ./test_data/test_fasta_files .

exit
root@68f656bdbf8c:~/polycracker# exit
exit
tar: Error opening archive: Failed to open './test_data/test_fasta_files/algae.fa.tar.gz'
Traceback (most recent call last):
  File "/usr/local/bin/polycracker", line 5, in <module>
    from polycracker.polycracker import polycracker
  File "/usr/local/lib/python3.9/site-packages/polycracker/polycracker.py", line 189
    print f
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(f)?

I have tried playing with different parameters but in every case (polycracker subgenomeExtraction -s PolyCracker/main -os PolyCracker/ -p genome/ -g assembled.fasta -bb 1 -b 3 -i 3 -r 0 -o 1 ), it yielded some error. Am I in the right direction?
My aim is to get the subgenome separated with K-mer and TEs.

@jlevy44
Copy link
Owner

jlevy44 commented May 28, 2022

Hi @kashiff007 polycracker was built for python 2.7 but you appear to have installed it using 3.9, hence the print issue. Can you try reinstalling with 2.7 and run using Python2?

@kashiff007
Copy link
Author

Hi @jlevy44 Thank you very much for your quick response.

I have also tried in my local macOS using docker. After running polocracker test_pipeline command its shows similar error:

Picked up _JAVA_OPTIONS: -Xms3G -Xmx5G
N E X T F L O W  ~  version 19.04.1
Launching `polycracker.nf` [ecstatic_allen] - revision: 34523bee09
./blast_files/
./kmercount_files/
./test_data/test_fasta_files/
./bed_files/
5
4
algae.fa
1
2
3
50000
0
26
13
linear
30
0
cosine
30
20
10,2
50000
1
0
0
0
1
0
0
2000000
1
0
1
1
3
tsne
SpectralClustering
1
1
1
1
0
1
0
1
1
1
1
1
[warm up] executor > local
executor >  local (5)
[46/973da5] process > splitFastaProcess [100%] 1 of 1 ✔
[ee/fa92fa] process > writeKmerCount    [100%] 1 of 1 ✔
[21/3e0649] process > kmer2Fasta        [100%] 1 of 1 ✔
executor >  local (5)
[46/973da5] process > splitFastaProcess [100%] 1 of 1 ✔
[ee/fa92fa] process > writeKmerCount    [100%] 1 of 1 ✔
[21/3e0649] process > kmer2Fasta        [100%] 1 of 1 ✔
executor >  local (5)
[46/973da5] process > splitFastaProcess [100%] 1 of 1 ✔
[ee/fa92fa] process > writeKmerCount    [100%] 1 of 1 ✔
[21/3e0649] process > kmer2Fasta        [100%] 1 of 1 ✔
[27/5178ec] process > createOrigDB      [100%] 1 of 1 ✔
[1b/aa5196] process > BlastOff          [100%] 1 of 1, failed: 1 ✘


WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
ERROR ~ Error executing process > 'BlastOff (1)'

Caused by:
  Process `BlastOff (1)` terminated with an error exit status (1)

Command executed:

  #!/bin/bash
  cwd=$(pwd)
  cd /workdir/polycracker
  polycracker blast_kmers -m 5 -t 4 -r /workdir/polycracker/./test_data/test_fasta_files/algae_split.fa -k $cwd/query.fa -o results.sam -pm 0 -kl 13
  cd -
  mv /workdir/polycracker/results.sam blast_result

Command exit status:
  1

Command output:
  /workdir/polycracker/work/1b/aa5196fbaaacc055f91149cc83e56a

Command error:
  java -Djava.library.path=/opt/conda/opt/bbmap-38.49-0/jni/ -ea -Xmx5g -cp /opt/conda/opt/bbmap-38.49-0/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 vslow=t ambiguous=all noheader=t secondary=t k=13 perfectmode=f threads=4 maxsites=2000000000 outputunmapped=f ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa in=query.fa outm=results.sam -Xmx5g
  Picked up _JAVA_OPTIONS: -Xms5G -Xmx5G
  Executing align2.BBMap [tipsearch=150, minhits=1, minratio=0.25, rescuemismatches=50, rescuedist=3000, build=1, overwrite=true, fastareadlen=500, ambiguous=all, noheader=t, secondary=t, k=13, perfectmode=f, threads=4, maxsites=2000000000, outputunmapped=f, ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa, in=query.fa, outm=results.sam, -Xmx5g]
  Version 38.49
  
  Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.250
  Set threads to 4
  Set OUTPUT_MAPPED_ONLY to true
  Retaining all best sites for ambiguous mappings.
  NOTE:nIgnoring reference file because it already appears to have been processed.
  NOTE:aIf you wish to regenerate the index, please manually delete ref/genome/1/summary.txt
  Set genome to 1
  
  Loaded Reference:sults1.726 seconds.
  Loading index for chunk 1-1, build 1
  Generated Index:	1.929 seconds.
  /opt/conda/bin/bbmap.sh: line 377: 26615 Killed                  java -Djava.library.path=/opt/conda/opt/bbmap-38.49-0/jni/ -ea -Xmx5g -cp /opt/conda/opt/bbmap-38.49-0/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 vslow=t ambiguous=all noheader=t secondary=t k=13 perfectmode=f threads=4 maxsites=2000000000 outputunmapped=f ref=/workdir/polycracker/./test_data/test_fasta_files/algae_split.fa in=query.fa outm=results.sam -Xmx5g
  mv: cannot stat '/workdir/polycracker/results.sam': No such file or directory

Work dir:
  /workdir/polycracker/work/1b/aa5196fbaaacc055f91149cc83e56a

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 4547, in final_stats
    ctx.invoke(convert_subgenome_output_to_pickle,input_dir=polycracker_bed, scaffolds_pickle='scaffolds_stats.p', output_pickle='scaffolds_stats.poly.labels.p')
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1908, in convert_subgenome_output_to_pickle
    for file in os.listdir(input_dir):
OSError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/clusterResults/'
Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1898, in convert_subgenome_output_to_pickle
    scaffolds = pickle.load(open(scaffolds_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
cp: cannot stat 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/bootstrap_0/model_subgenome_*.fa': No such file or directory
cp: cannot stat 'polycracker.stats.analysis.csv': No such file or directory
cp: cannot stat 'SpectralClusteringmain_tsne_2_n3ClusterTest.html': No such file or directory
awk: cannot open blasted_merged.bed (No such file or directory)
/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py:1372: UserWarning:

genfromtxt: Empty input file: "<open file 'awk \'{print gsub(/,/,"")+1}\' blasted_merged.bed', mode 'r' at 0x7f5faebb25d0>"

/opt/conda/lib/python2.7/site-packages/seaborn/distributions.py:198: RuntimeWarning:

Mean of empty slice.

/opt/conda/lib/python2.7/site-packages/numpy/core/_methods.py:85: RuntimeWarning:

invalid value encountered in double_scalars

Traceback (most recent call last):
  File "/opt/conda/bin/polycracker", line 11, in <module>
    sys.exit(polycracker())
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python2.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/polycracker/polycracker.py", line 1266, in plotPositions
    labels = pickle.load(open(labels_pickle,'rb'))
IOError: [Errno 2] No such file or directory: 'analysisOutputs/SpectralClusteringmain_tsne_2_n3/scaffolds_connect.p'
Please see results in ./test_results.

Original genome in ./test_data/test_fasta_files .

You can see the first line of error shows that I need to install Graphviz in my laptop:

WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
ERROR ~ Error executing process > 'BlastOff (1)'

I did that but the issue remains same. I am using python2.7 in my laptop.

@kashiff007
Copy link
Author

Hi, it’s working perfectly in Ubuntu with python2.7. Thanks for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants