Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when batch_size[1] <= number of genes #143

Open
le-ander opened this issue Mar 30, 2022 · 2 comments
Open

Error when batch_size[1] <= number of genes #143

le-ander opened this issue Mar 30, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@le-ander
Copy link
Member

when I run batchglm through diffxpy (diffxpy.test.wald()) and my dataset has less or equal the number of features as I have set as the second dimension of the batch size, I get an error.

So for example using data with 255 (or 256) genes and a batch_size of (1e9, 256), I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-29-2450422e51e3> in <module>
     11 
     12 with dask.config.set(**{'array.slicing.split_large_chunks': True}):
---> 13     dxp_hsp_d70 = de.test.wald(
     14                     data = adata_de,
     15                     dmat_loc = patsy.dmatrix(form, adata_de.obs),

~/.local/lib/python3.8/site-packages/diffxpy/testing/tests.py in wald(data, factor_loc_totest, coef_to_test, formula_loc, formula_scale, as_numeric, init_a, init_b, gene_names, sample_description, dmat_loc, dmat_scale, constraints_loc, constraints_scale, noise_model, size_factors, batch_size, backend, train_args, training_strategy, quick_scale, dtype, **kwargs)
    722 
    723     # Fit model.
--> 724     model = _fit(
    725         noise_model=noise_model,
    726         data=data,

~/.local/lib/python3.8/site-packages/diffxpy/testing/tests.py in _fit(noise_model, data, design_loc, design_scale, design_loc_names, design_scale_names, constraints_loc, constraints_scale, init_model, init_a, init_b, gene_names, size_factors, batch_size, backend, training_strategy, quick_scale, train_args, close_session, dtype)
    242         pass
    243 
--> 244     estim.train_sequence(
    245         training_strategy=training_strategy,
    246         **train_args

~/.local/lib/python3.8/site-packages/batchglm/models/base/estimator.py in train_sequence(self, training_strategy, **kwargs)
    122                         (x, str(d[x]), str(kwargs[x]))
    123                     )
--> 124             self.train(**d, **kwargs)
    125             logger.debug("Training sequence #%d complete", idx + 1)
    126 

~/.local/lib/python3.8/site-packages/batchglm/train/numpy/base_glm/estimator.py in train(self, max_steps, method_b, update_b_freq, ftol_b, lr_b, max_iter_b, nproc, **kwargs)
    137                 idx_update = self.model.idx_not_converged
    138                 if self._train_loc:
--> 139                     a_step = self.iwls_step(idx_update=idx_update)
    140                     # Perform trial update.
    141                     self.model.a_var = self.model.a_var + a_step

~/.local/lib/python3.8/site-packages/batchglm/train/numpy/base_glm/estimator.py in iwls_step(self, idx_update)
    299             else:
    300                 if np.linalg.cond(a.compute(), p=None) < 1 / sys.float_info.epsilon:
--> 301                     delta_theta[:, idx_update] = np.expand_dims(
    302                         np.linalg.solve(a[0], b[0]).compute(),
    303                         axis=-1

ValueError: assignment destination is read-only
@le-ander le-ander added the bug Something isn't working label Mar 30, 2022
@ilan-gold
Copy link
Collaborator

ilan-gold commented Apr 20, 2022

@le-ander Sorry to just get to this, I was taking some time off. I understand if it's out of your scope to make a smaller reproducible example, but could you highlight what makes you think that this difference in size is the issue? I saw this type of error while fixing the unit tests a few months ago so hopefully this goes away with the upcoming release.

@le-ander
Copy link
Member Author

Hey there! :)
I tested a couple of different scenarios back then:

  • 2 genes, batch_size=(1e9, 256): error
  • 255 genes, batch_size=(1e9, 256): error
  • 256 genes, batch_size=(1e9, 256): error
  • 257 genes, batch_size=(1e9, 256): no error
  • 300 genes, batch_size=(1e9, 256): no error

So I figured that the batch_size <= number of features must be the culprit here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants