Stackoverflow in imputation of multiparental population #2

TrineAalborg · 2024-11-21T09:27:05Z

Hello,

I am currently experiencing an isue with using PolyOrigin for imputation of a panel of F1 offspring of an incomplete diallel of 18 tetraploid potato breeding clones.

The panel consists of 768 F1 clones that have been genotyped by sequencing. I have filtered the data based on quality and read depth and have a dataset of roughly 105,000 biallelic SNPs with 17 % missing data that I would like to impute. I have the phased genotyped of the 18 parents for all of the positions and use these as input for the imputation. I have tried using both the impute_LA function of the polyBreedR R-wrapper for PolyOrigin in a Windows setup (64 GB) and PolyOrigin itself in a linux environment (75 threads, 375 GB). However, when I run the program in either setup (I run each chromosome separately), I reach a Stackoverflow error at some point. The memory monitor indicated that the memory usage did not increase beyond 126 GB during the run, so it does not seem limited by memory.

I have tested different data reductions, and I can run the imputation for single families (there are 119 in total, with 1-14 offspring in each) for all SNPs in all chromosomes, and that runs successfully in my windows setup using the polyBreedR wrapper. I can also run the program across all families, but that requires that I reduce the set to 2000 SNPs.

However, due to the complex family structure of the 18-parent diallel, performing the imputation for subsets of full-sibs does not capture the population structure between half-sibs successfully (the similarity of full-sibs compared to half-sibs seems overestimated).

As linkage groups are disrupted by reducing the marker density beyond single chromsomes (that is around 9000 SNPs) and the pedigree is disrupted by using full-sibs alone, I cannot reduce my data further.

Can anything be done to overcome the stackoverflow in PolyOrigin when using data with high dimensionality (both population size, no. parents, and high SNP density)?

I have also raised this issue on the PolyOrigin github (#13)

Sincerely, Trine

jendelman · 2024-12-05T22:40:36Z

Trine, thank you for raising this issue. I checked with the developer of PolyOrigin, Dr. Chaozhi Zheng, and he is working to address it with a future release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stackoverflow in imputation of multiparental population #2

Stackoverflow in imputation of multiparental population #2

TrineAalborg commented Nov 21, 2024 •

edited

Loading

jendelman commented Dec 5, 2024

Stackoverflow in imputation of multiparental population #2

Stackoverflow in imputation of multiparental population #2

Comments

TrineAalborg commented Nov 21, 2024 • edited Loading

jendelman commented Dec 5, 2024

TrineAalborg commented Nov 21, 2024 •

edited

Loading