Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in results[[elt]][[K_index, ll_index]] <- el(res, where = elt) : incompatible types (from NULL to list) in [[ assignment (aka NAs are not supported) #6

Open
Rahel14350 opened this issue Aug 6, 2017 · 27 comments

Comments

@Rahel14350
Copy link

Hi,
I have RRBS data from 44 Lung cancer samples (Ctrl and diseases). There is high heterogeneity beside the tumor cells in the samples, that is why after pre-processing with RnBeads, I submitted the Beta value matrix to MeDeCom. Now, I got this error, Can you please help me. Many thanks in advance,

>medecom.result<-runMeDeCom(persites_betaValue, Ks=3:10, lambdas=c(0,10^(-5:-1)), NINIT=10, NFOLDS=10, ITERMAX=300, NCORES=9)

[Main:] checking inputs
[Main:] preparing data
[Main:] preparing jobs
[Main:] 2768 factorization runs in total
[Main:] finished all jobs. Creating the object
Error in results[[elt]][[K_index, ll_index]] <- el(res, where = elt) :
incompatible types (from NULL to list) in [[ assignment
In addition: Warning messages:
1: In mclapply(rev(concurr_indices), function(index_group) { :
80 function calls resulted in an error
2: In mclapply(rev(concurr_indices), function(index_group) { :
8 function calls resulted in an error

@shmilyfhh
Copy link

I have the sample problem as well.

@lutsik
Copy link
Member

lutsik commented Aug 14, 2017

This error message is unspecific, merely indicating that the worker threads have failed in a multicore setting. To get the actual exception, try rerunning with NCORES=1.

@nikitaved
Copy link
Collaborator

Could be a problem within a solver. Could you please share the dataset and/or your script?

@shmilyfhh
Copy link

shmilyfhh commented Aug 14, 2017 via email

@shmilyfhh
Copy link

shmilyfhh commented Aug 14, 2017 via email

@lutsik
Copy link
Member

lutsik commented Aug 14, 2017

Regardless of decomposition, substituting NAs with zeroes is a bad idea anyway since 0 is one of the two extreme methylation values and will definitely lead to artefacts in your analysis.

I would suggest to reduce the data set to something minimal (several thousands of rows) and try running on one core to identify the source of your problem.

Did the test data set work fine for you?

@shmilyfhh
Copy link

shmilyfhh commented Aug 14, 2017 via email

@Rahel14350
Copy link
Author

I also have problem with missing values and it is not even possible to get 1000 rows without NAs to run the test. Is there any solution for the NAs for example replacing them with 0.5? Or should I continue till the end of analysis with a small subset of my data (for example 500 rows)? ...
Many thanks in advance,

@shmilyfhh
Copy link

shmilyfhh commented Aug 14, 2017 via email

@nikitaved
Copy link
Collaborator

nikitaved commented Aug 14, 2017

The solver does not support NAs, that is for sure.

@Rahel14350
Copy link
Author

For me, also when I replace the NAs value with some number, everything is working on MeDeCom, but I know that this is wrong. At the end there is no solution for NAs? If I continue with only a subset of my methylation values then is the analysis and output correct?

@nikitaved
Copy link
Collaborator

nikitaved commented Aug 16, 2017

No, NAs are not supported as of now. Shall we just ignore their contribution in the loss function? We will check how people solve the task of missing values, and, if it is not that much of a work, we will add this feature once we have a bit more time... But I certainly agree it is worth doing.

@nikitaved nikitaved added this to the Add support for NAs milestone Aug 16, 2017
@lutsik
Copy link
Member

lutsik commented Aug 17, 2017

As a temporary solution, I would recommend looking into the imputation methods in case removing all NAs seems like an overkill.

In fact I started implementing something along these lines a while ago (see argument na.values and its usage in onerun.alternate, which is factorisations.R:445 and below). The idea was to disregard NAs when calculating the loss, and exclude NA-containing rows for the T and A updates. Not sure this was the right way, though.

@nikitaved
Copy link
Collaborator

nikitaved commented Aug 17, 2017

In fact I started implementing something along these lines a while ago (see argument na.values and its usage in onerun.alternate, which is factorisations.R:445 and below). The idea was to disregard NAs when calculating the loss, and exclude NA-containing rows for the T and A updates. Not sure this was the right way, though.
I think it would not be that much different from simply removing NAs from a dataset (maybe slower convergence but smaller matrices)... I think we could develop an algorithm to solve an adopted version of the matrix completion problem (0-1 constraints) and then apply MeDeCom...

@Rahel14350
Copy link
Author

Dear Nikitaved and Dear Pavel,
Many thanks for your effort and time to solve this issue. I was also thinking to impute all the NAs using impute package in R (If it is the right solution to do) but you are talking to remove all the probes with NAs (if I am right). In my case removing probes with NAs will remove 90% of probes, is this then correct to continue to the MeDeCom analysis afterwards with only 10% subset of samples? Many tanks in advance,

@nikitaved
Copy link
Collaborator

Well, the more you use, the better, but much slower =) It also depends on your problem - could be you loose no information after all the filtering. Unfortunately, MeDeCom is not designed to deal with missing data as of now, because the inference of unknown observations is a bit different task.

@nikitaved nikitaved changed the title Error in results[[elt]][[K_index, ll_index]] <- el(res, where = elt) : incompatible types (from NULL to list) in [[ assignment Error in results[[elt]][[K_index, ll_index]] <- el(res, where = elt) : incompatible types (from NULL to list) in [[ assignment (aka NAs are not supported) Aug 18, 2017
@Rahel14350
Copy link
Author

Dear Nikitaved and Dear Pavlo,
I have imputed all the NAs in my Beta value matrix using champ (now I have more than million of sites in 44 samples). Then I did run again MeDeCom but again I got the same error. I am sure that I have no NAs in the matrix. Would you please help me here ...
medecom.result <- runMeDeCom(as.matrix(beta), Ks=2:10, lambdas=c(0,10^(-5:-1)), NINIT=10, NFOLDS=10, ITERMAX=300, NCORES=10)

[Main:] finished all jobs. Creating the object
Error in results[[elt]][[K_index, ll_index]] <- el(res, where = elt) :
incompatible types (from NULL to list) in [[ assignment
In addition: Warning message:
In mclapply(rev(concurr_indices), function(index_group) { :
3 function calls resulted in an error

warnings()
Warning message:
In mclapply(rev(concurr_indices), function(index_group) { ... :
3 function calls resulted in an error

@lutsik
Copy link
Member

lutsik commented Sep 5, 2017

Dear Rahel,

As already explained, in a multicore setting the exception pointing at the actual problem is not accessible. Please, rerun with NCORES=1.

Best,

Pavlo

@Rahel14350
Copy link
Author

Dear Pavlo, Many thanks for your prompt reply. With this number of cores it took me more than 1 week to get this error. Then it will be slower, but I will do it, when it is the only solution.
Kind regards,
Rahel

@lutsik
Copy link
Member

lutsik commented Sep 5, 2017

First check whether a small subset of your data runs without problems. Take 100 randomly selected rows and run with 1 core.

@Rahel14350
Copy link
Author

Yes, I already did it and it worked well, that is why I imputed the NAs and did with all the data ...

@lutsik
Copy link
Member

lutsik commented Sep 5, 2017

what is the total number of rows (CpGs)?

@Rahel14350
Copy link
Author

1282369 CpGs and 44 samples

@lutsik
Copy link
Member

lutsik commented Sep 5, 2017

Applying MeDeCom to over 1 mln CpGs makes little sense since the vast majority of them are not informative. Calculate row-wise variance and make a histogram of that. You will see that most of your rows have zero or very close to zero variance and have no benefit for the deconvolution. Find a reasonable cutoff and select 50-100k rows at most.

@Rahel14350
Copy link
Author

OK, I am going to do that and continue with a subset of samples. But the last question (Sorry, if it is not related to the question above), Is it possible to extract the methylomes that reflect the underlying biology of constituent cell types after all? I mean when I continue with a subset of samples and the result showed that I have 5 different cell types, Can I only get the methylomes most related to Tumor cells to continue with differential methylome analysis (Please let me know if I am wrong and did not get the idea of MeDeCom)?

@lutsik
Copy link
Member

lutsik commented Sep 5, 2017

If the method works as expected and reveals 5 underlying cell types, for each of them you will get proportions in each sample as well as an "average" methylation signature (but only in the rows that you have used for the deconvolution). For the differential methylation analysis you should include the obtained proportions as covariates.

@Rahel14350
Copy link
Author

I am so grateful for your prompt reply. therefore, I will only continue with a subset of my data. What if I make several subset of CpGs and at the end merge all the proportions for each samples in tumor cells? I hope this is also correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants