You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I'm using AnnotSV to annotate a CNVkit VCF.
When trying to convert the AnnotSV TSV file, I get this error for all my vcfs, in some line:
variantconvert convert -i /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/BH14418_TUMOR_call_no_theta_.cnv.vcf.tsv -o /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/BH14418_TUMOR_call_no_theta_.cnv.vcf -c GRCh38/annotsv2.json
2024-07-09 14:24:20 [INFO] running variantconvert 2.0.1
Traceback (most recent call last):
File "/home/ggama1/.conda/envs/annotsv/bin/variantconvert", line 8, in <module>
sys.exit(main())
^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 222, in main
main_convert(args)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 77, in main_convert
converter.convert(args.inputFile, args.outputFile)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 398, in convert
self.input_df = self._build_input_dataframe()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 44, in _build_input_dataframe
df = pd.read_csv(
^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
data = self._reader.read(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^
File "parsers.pyx", line 820, in pandas._libs.parsers.TextReader.read
File "parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 8112
(annotsv) [ggama1@c003 GRCh38]$ variantconvert convert -i /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/annot_sample_tumor.vcf.tsv -o /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/annot_sample_tumor.vcf -c GRCh38/annotsv2.json
2024-07-09 14:25:08 [INFO] running variantconvert 2.0.1
Traceback (most recent call last):
File "/home/ggama1/.conda/envs/annotsv/bin/variantconvert", line 8, in <module>
sys.exit(main())
^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 222, in main
main_convert(args)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 77, in main_convert
converter.convert(args.inputFile, args.outputFile)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 398, in convert
self.input_df = self._build_input_dataframe()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 44, in _build_input_dataframe
df = pd.read_csv(
^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
data = self._reader.read(nrows)
^^^^^^^^^^^^^^^^^^^^^^^^
File "parsers.pyx", line 820, in pandas._libs.parsers.TextReader.read
File "parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at row 1447
This error is hapenning to varianconvert 2.0.1 installed with conda and also when installed with pip inside a virtual conda env.
Traceback (most recent call last):
File "/home/ggama1/.conda/envs/annotsv/bin/variantconvert", line 8, in <module>
sys.exit(main())
^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 222, in main
main_convert(args)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 77, in main_convert
converter.convert(args.inputFile, args.outputFile)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 404, in convert
self.input_df = self._build_input_dataframe()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 48, in _build_input_dataframe
df = pd.read_csv(
^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read
) = self._engine.read( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 252, in read
content = self._get_lines(rows)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 1140, in _get_lines
next_row = self._next_iter_line(row_num=self.pos + rows + 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 834, in _next_iter_line
self._alert_malformed(msg, row_num)
File "/home/ggama1/.conda/envs/annotsv/lib/python3.12/site-packages/pandas/io/parsers/python_parser.py", line 781, in _alert_malformed
raise ParserError(msg)
pandas.errors.ParserError: unexpected end of data
(annotsv) [ggama1@c003 GRCh38]$ variantconvert convert -i /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/annot_sample_tumor.vcf.tsv -o /scratch4/nsobrei2/ggama1/somatic_SVs/cnvkit/vcfs/annotated/annot_sample_tumor.vcf -c GRCh38/annotsv2.json
2024-07-09 14:42:42 [INFO] running variantconvert 2.0.1
Traceback (most recent call last):
File "/home/ggama1/.conda/envs/annotsv/bin/variantconvert", line 8, in <module>
sys.exit(main())
^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 222, in main
main_convert(args)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/__main__.py", line 77, in main_convert
converter.convert(args.inputFile, args.outputFile)
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 404, in convert
self.input_df = self._build_input_dataframe()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch4/nsobrei2/programs/variantconvert/src/variantconvert/converters/vcf_from_annotsv.py", line 57, in _build_input_dataframe
[self.config["VCF_COLUMNS"]["#CHROM"], self.config["VCF_COLUMNS"]["INFO"]["SV_start"]],
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'SV_start'
Hello!
I'm using AnnotSV to annotate a CNVkit VCF.
When trying to convert the AnnotSV TSV file, I get this error for all my vcfs, in some line:
This error is hapenning to varianconvert 2.0.1 installed with conda and also when installed with pip inside a virtual conda env.
Attached is the file used.
annot_sample_tumor.vcf.tsv.txt
I've also found this blog
https://www.shanelynn.ie/pandas-csv-error-error-tokenizing-data-c-error-eof-inside-string-starting-at-line/
As it seems to be related to reading the file by using C instead of python, which seems to cause the problem.
I've tried to modify vcf_from_annotsv.py by adding:
But this gives other errors:
Another proposed solution would be:
Which gives me the same error for SV_Start
The text was updated successfully, but these errors were encountered: