You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to run a toy example with GEPSi, but am running into an error at the first step, to convert my genotypes to h5 format the way GEPSi expects. Here is my toy vcf generated from an msprime simulation:
##fileformat=VCFv4.2
##source=tskit 0.4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=1,length=10000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tsk_0indv tsk_1indv tsk_2indv tsk_3indv tsk_4indv tsk_5indv tsk_6indv tsk_7indv tsk_8indv tsk_9indv
1 409 . T A . PASS . GT 0|1 0|0 0|1 0|0 0|0 0|0 0|0 0|0 1|0 0|0
1 752 . C T . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
1 1043 . C A . PASS . GT 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
1 1207 . A C . PASS . GT 0|0 1|0 0|0 0|1 0|0 1|0 0|0 0|0 0|0 0|0
1 1952 . C A . PASS . GT 1|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0
1 2521 . T C . PASS . GT 0|0 0|0 0|0 0|1 0|0 1|0 0|0 0|0 0|0 0|0
1 2542 . C T . PASS . GT 1|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0
1 2707 . C T . PASS . GT 0|0 0|1 0|0 1|0 1|1 0|0 1|1 1|0 0|0 0|1
1 4352 . C T . PASS . GT 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|1 0|1 1|0
1 5416 . T A . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
1 6656 . A G . PASS . GT 1|1 0|0 1|1 0|0 0|0 0|1 0|0 0|1 1|1 1|0
1 7850 . T A . PASS . GT 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|1 0|1 1|0
1 9263 . A T . PASS . GT 0|0 1|1 0|0 1|1 1|1 1|0 1|1 1|0 0|0 0|1
I converted this vcf to .raw and .bim files using both, plink1.9 and plink2 with these commands:
However, I'm getting errors at this step. They are different errors for plink1.9 and plink2.
The error while trying to convert raw/bim files generated from plink1.9 is
OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
OverflowError: can't convert negative value to hsize_t
Exception ignored in: 'tables.utilsextension.malloc_dims'
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
OverflowError: can't convert negative value to hsize_t
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
sys.exit(main())
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
data = genotype_sim.simulate_genotype()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
return self.generate_genotype_file()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
sf, risk_alleles = self.script_result_clean()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 136, in script_result_clean
array_c = f.create_earray(f.root, 'data', tables.IntCol(), (0,num_feat), expectedrows=self.num_patients,filters=tables.Filters(complib='zlib', complevel=1))
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/file.py", line 1380, in create_earray
ptobj = EArray(parentnode, name,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 158, in __init__
super(EArray, self).__init__(parentnode, name, atom, shape, title,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 219, in __init__
super(Array, self).__init__(parentnode, name, new, filters,
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/leaf.py", line 286, in __init__
super(Leaf, self).__init__(parentnode, name, _log)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/node.py", line 264, in __init__
self._v_objectid = self._g_create()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 182, in _g_create
return self._g_create_common(self._v_expectedrows)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/carray.py", line 250, in _g_create_common
self._v_objectid = self._create_carray(self._v_new_title)
File "tables/hdf5extension.pyx", line 1366, in tables.hdf5extension.Array._create_carray
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "H5S.c", line 1507, in H5Screate_simple
invalid dataspace information
End of HDF5 error back trace
Problems creating the EArray.
Closing remaining open files:./genotype_100k_NONE.h5...done
and the error while trying to convert raw/bim files generated from plink2 is:
Traceback (most recent call last):
File "/home/sbelsare/software/anaconda3/envs/gepsi/bin/gepsi", line 8, in <module>
sys.exit(main())
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/main.py", line 22, in main
data = genotype_sim.simulate_genotype()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 36, in simulate_genotype
return self.generate_genotype_file()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 87, in generate_genotype_file
sf, risk_alleles = self.script_result_clean()
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 139, in script_result_clean
self.h5_append(df,f)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/scripts/genotype_simulator.py", line 183, in h5_append
f.root.data.append(a)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 219, in append
self._check_shape_append(nparr)
File "/home/sbelsare/software/anaconda3/envs/gepsi/lib/python3.8/site-packages/tables/earray.py", line 196, in _check_shape_append
raise ValueError(("the shapes of the appended object and the "
ValueError: the shapes of the appended object and the ``/data`` EArray differ in non-enlargeable dimension 1
Closing remaining open files:./genotype_100k_NONE.h5...done
Which version of plink should I use, and how should I fix the corresponding error to get the h5/matrix_header files?
Thank you!
The text was updated successfully, but these errors were encountered:
In configs/genotype.yaml, change memory_cautious to False and repeat pip installation of gepsi (pip install .)
Please check the format of your .raw and .bim files - are they tab or space separated? If the .raw file is space-separated and the .bim is tab-separated, can you convert the .raw to be tab separated using pandas/bash and try?
Hi, I'm trying to run a toy example with GEPSi, but am running into an error at the first step, to convert my genotypes to h5 format the way GEPSi expects. Here is my toy vcf generated from an msprime simulation:
I converted this vcf to .raw and .bim files using both, plink1.9 and plink2 with these commands:
I try to convert this to the h5/matrix_header format using the following command:
gepsi genotype --data_path ./ --matrix_name output.raw --snplist_name output.bim --ignore_gene_map --features "NONE" --annotation_name NONE
However, I'm getting errors at this step. They are different errors for plink1.9 and plink2.
The error while trying to convert raw/bim files generated from plink1.9 is
and the error while trying to convert raw/bim files generated from plink2 is:
Which version of plink should I use, and how should I fix the corresponding error to get the h5/matrix_header files?
Thank you!
The text was updated successfully, but these errors were encountered: