Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackoverflow in imputation of multiparental population #2

Open
TrineAalborg opened this issue Nov 21, 2024 · 1 comment
Open

Stackoverflow in imputation of multiparental population #2

TrineAalborg opened this issue Nov 21, 2024 · 1 comment

Comments

@TrineAalborg
Copy link

TrineAalborg commented Nov 21, 2024

Hello,

I am currently experiencing an isue with using PolyOrigin for imputation of a panel of F1 offspring of an incomplete diallel of 18 tetraploid potato breeding clones.

The panel consists of 768 F1 clones that have been genotyped by sequencing. I have filtered the data based on quality and read depth and have a dataset of roughly 105,000 biallelic SNPs with 17 % missing data that I would like to impute. I have the phased genotyped of the 18 parents for all of the positions and use these as input for the imputation. I have tried using both the impute_LA function of the polyBreedR R-wrapper for PolyOrigin in a Windows setup (64 GB) and PolyOrigin itself in a linux environment (75 threads, 375 GB). However, when I run the program in either setup (I run each chromosome separately), I reach a Stackoverflow error at some point. The memory monitor indicated that the memory usage did not increase beyond 126 GB during the run, so it does not seem limited by memory.

I have tested different data reductions, and I can run the imputation for single families (there are 119 in total, with 1-14 offspring in each) for all SNPs in all chromosomes, and that runs successfully in my windows setup using the polyBreedR wrapper. I can also run the program across all families, but that requires that I reduce the set to 2000 SNPs.

However, due to the complex family structure of the 18-parent diallel, performing the imputation for subsets of full-sibs does not capture the population structure between half-sibs successfully (the similarity of full-sibs compared to half-sibs seems overestimated).

As linkage groups are disrupted by reducing the marker density beyond single chromsomes (that is around 9000 SNPs) and the pedigree is disrupted by using full-sibs alone, I cannot reduce my data further.

Can anything be done to overcome the stackoverflow in PolyOrigin when using data with high dimensionality (both population size, no. parents, and high SNP density)?

I have also raised this issue on the PolyOrigin github (#13)

Sincerely, Trine

@jendelman
Copy link
Owner

Trine, thank you for raising this issue. I checked with the developer of PolyOrigin, Dr. Chaozhi Zheng, and he is working to address it with a future release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants