Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error converting raw/bim to h5/matrix_header files #12

Open
saurabhbelsare opened this issue May 5, 2022 · 2 comments
Open

Error converting raw/bim to h5/matrix_header files #12

saurabhbelsare opened this issue May 5, 2022 · 2 comments

Comments

@saurabhbelsare
Copy link

Hi, I'm trying to run a toy example with GEPSi, but am running into an error at the first step, to convert my genotypes to h5 format the way GEPSi expects. Here is my toy vcf generated from an msprime simulation:

##fileformat=VCFv4.2
##source=tskit 0.4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	tsk_0indv	tsk_1indv	tsk_2indv	tsk_3indv	tsk_4indv	tsk_5indv	tsk_6indv	tsk_7indv	tsk_8indv	tsk_9indv
1	409	.	T	A	.	PASS	.	GT	0|1	0|0	0|1	0|0	0|0	0|0	0|0	0|0	1|0	0|0
1	752	.	C	T	.	PASS	.	GT	0|0	1|1	0|0	1|1	1|1	1|0	1|1	1|0	0|0	0|1
1	1043	.	C	A	.	PASS	.	GT	0|0	1|0	0|0	0|0	0|0	0|0	0|0	0|0	0|0	0|0
1	1207	.	A	C	.	PASS	.	GT	0|0	1|0	0|0	0|1	0|0	1|0	0|0	0|0	0|0	0|0
1	1952	.	C	A	.	PASS	.	GT	1|0	0|0	0|0	0|0	0|0	0|1	0|0	0|0	0|0	0|0
1	2521	.	T	C	.	PASS	.	GT	0|0	0|0	0|0	0|1	0|0	1|0	0|0	0|0	0|0	0|0
1	2542	.	C	T	.	PASS	.	GT	1|0	0|0	0|0	0|0	0|0	0|1	0|0	0|0	0|0	0|0
1	2707	.	C	T	.	PASS	.	GT	0|0	0|1	0|0	1|0	1|1	0|0	1|1	1|0	0|0	0|1
1	4352	.	C	T	.	PASS	.	GT	0|0	0|0	1|0	0|0	0|0	0|0	0|0	0|1	0|1	1|0
1	5416	.	T	A	.	PASS	.	GT	0|0	1|1	0|0	1|1	1|1	1|0	1|1	1|0	0|0	0|1
1	6656	.	A	G	.	PASS	.	GT	1|1	0|0	1|1	0|0	0|0	0|1	0|0	0|1	1|1	1|0
1	7850	.	T	A	.	PASS	.	GT	0|0	0|0	1|0	0|0	0|0	0|0	0|0	0|1	0|1	1|0
1	9263	.	A	T	.	PASS	.	GT	0|0	1|1	0|0	1|1	1|1	1|0	1|1	1|0	0|0	0|1

I converted this vcf to .raw and .bim files using both, plink1.9 and plink2 with these commands:

plink --vcf output.vcf --recode A --out output
plink --vcf output.vcf --make-just-bim --out output

I try to convert this to the h5/matrix_header format using the following command:

gepsi genotype --data_path ./ --matrix_name output.raw --snplist_name output.bim --ignore_gene_map --features "NONE" --annotation_name NONE

However, I'm getting errors at this step. They are different errors for plink1.9 and plink2.
The error while trying to convert raw/bim files generated from plink1.9 is

OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
    self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
    self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
Traceback (most recent call last):
  File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
    sys.exit(main())
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
    data = genotype_sim.simulate_genotype()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
    return self.generate_genotype_file()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
    sf, risk_alleles = self.script_result_clean()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 136, in script_result_clean
    array_c = f.create_earray(f.root, 'data', tables.IntCol(), (0,num_feat), expectedrows=self.num_patients,filters=tables.Filters(complib='zlib', complevel=1))
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/file.py", line 1380, in create_earray
    ptobj = EArray(parentnode, name,
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 158, in __init__
    super(EArray, self).__init__(parentnode, name, atom, shape, title,
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 219, in __init__
    super(Array, self).__init__(parentnode, name, new, filters,
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/leaf.py", line 286, in __init__
    super(Leaf, self).__init__(parentnode, name, _log)
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/node.py", line 264, in __init__
    self._v_objectid = self._g_create()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 182, in _g_create
    return self._g_create_common(self._v_expectedrows)
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
    self._v_objectid = self._create_carray(self._v_new_title)
  File "tables/hdf5extension.pyx", line 1366, in tables.hdf5extension.Array._create_carray
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5S.c", line 1507, in H5Screate_simple
    invalid dataspace information

End of HDF5 error back trace

Problems creating the EArray.
Closing remaining open files:./genotype_100k_NONE.h5...done

and the error while trying to convert raw/bim files generated from plink2 is:

Traceback (most recent call last):
  File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
    sys.exit(main())
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
    data = genotype_sim.simulate_genotype()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
    return self.generate_genotype_file()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
    sf, risk_alleles = self.script_result_clean()
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 139, in script_result_clean
    self.h5_append(df,f)
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 183, in h5_append
    f.root.data.append(a)
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 219, in append
    self._check_shape_append(nparr)
  File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 196, in _check_shape_append
    raise ValueError(("the shapes of the appended object and the "
ValueError: the shapes of the appended object and the ``/data`` EArray differ in non-enlargeable dimension 1
Closing remaining open files:./genotype_100k_NONE.h5...done

Which version of plink should I use, and how should I fix the corresponding error to get the h5/matrix_header files?

Thank you!

@mahdikafi
Copy link

Hi, I hope you are doing well.
I have the same problem. I wanted to create phenotypes for sim1000g simulated genotypes.

@avantikalal
Copy link
Contributor

@mahdikafi can you try the following.

  • In configs/genotype.yaml, change memory_cautious to False and repeat pip installation of gepsi (pip install .)
  • Please check the format of your .raw and .bim files - are they tab or space separated? If the .raw file is space-separated and the .bim is tab-separated, can you convert the .raw to be tab separated using pandas/bash and try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants