Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple response variables #42

Open
eb8680 opened this issue Oct 14, 2019 · 4 comments
Open

Support for multiple response variables #42

eb8680 opened this issue Oct 14, 2019 · 4 comments

Comments

@eb8680
Copy link
Member

eb8680 commented Oct 14, 2019

The mvbrmsformula function in brms provides support for multiple formulas with shared inputs:

xf <- bf(x ~ z + 1)
yf <- bf(y ~ x + z + 1)
formula <- mvbf(yf, xf, ...)
fitted_model <- brm(formula, ...)

It would be nice if brmp supported this behavior as well. I believe that in code generation this would just correspond to generating a single model with multiple response sample statements.

Additional question for discussion: would this require implementing brms's (expr | ID | group) grouping syntax as well?

@eb8680
Copy link
Member Author

eb8680 commented Oct 15, 2019

@null-a what do you think about the feasibility of this? Are there any major stumbling blocks you're aware of?

@null-a
Copy link
Collaborator

null-a commented Oct 16, 2019

what do you think about the feasibility of this? Are there any major stumbling blocks you're aware of?

@eb8680: To the extent that I'm familiar with this feature, it seems perfectly feasible to me -- I see no major stumbling blocks.

I believe that in code generation this would just correspond to generating a single model with multiple response sample statements.

Yes, that's my understanding too. When multiple responses share the same family this could also be a single sample statement from a multivariate distribution? It seems that when the reponse is either multivariate normal or student's-t, brms models residual correlations by default. (In particular, the response distribution is parameterised by standard deviations and a correlation matrix.)

would this require implementing brms's (expr | ID | group) grouping syntax as well?

My understanding is that it is possible implement multivariate responses without this, but that once we do so this would be a useful extension to add? (Because it allows for group level terms
in different formulas to be modeled as correlated.)

Some further thoughts, noted for future reference:

  • Would be want to support models in which response variables have different response families? (e.g. y1 comes from a normal and y2 from a Bernoulli.) I don't see how to do this in brms, but it wouldn't surprise me if it's possible; it's very flexible.

  • There's perhaps some overlap here with models with a categorical response, as they also result in models in which mu is a vector rather than a scalar. "Distributional models" are also similar (though perhaps only superficially), since they too are specified with multiple formulas.

  • It seems most of the work would be in extending code generation. The mechanism used to specify priors would need to be extended so that parameters from individual formulas can be picked out. (brms appears to do this using the name of the response variable.) fitted would return an array with an extra dimension. Formula parsing and design matrix coding would be unchanged I guess.

  • The mvbind notation for setting up a multivariate model (in which each element of the response uses the same formula) seems like it would be useful eventually. Perhaps that could also be written as e.g. [y1,y2] ~ 1 + x. Perhaps for high-dimensional data there's a way to specify a range of columns, e.g. have y[0:2] ~ 1 + x be eqv. to [y0,y1,y2] ~ 1 + x.

@eb8680
Copy link
Member Author

eb8680 commented Oct 16, 2019

Would be want to support models in which response variables have different response families?

I think a first version supporting only Normal responses with uncorrelated residuals would be fine, but in general we should be able to support all response families in the case where responses are fully observed, and at least Normal and Categorical/Bernoulli when responses are missing (#43, #44). Support for the general case might involve doing relatively naive code generation in brmp and expecting the Pyro backend to be smarter about simplifying the resulting model.

Modelling correlations in residuals across arbitrary families seems more difficult, we can punt on that for now.

There's perhaps some overlap here with models with a categorical response, as they also result in models in which mu is a vector rather than a scalar ...

I wonder if we might want to draw a distinction between multiple variables and vector/tensor-valued variables or means, in the same way that Pyro allows tensor-valued sample statements via Distribution.to_event. That way brmp could support non-scalar responses of different shapes, families, etc, and we could naturally support mvbind and generalizations. That's sort of what I was thinking when I opened #46 although I don't have a fully formed proposal for that.

I would also like to see categorical responses supported eventually, since one of my motivating examples for this series of issues is a hierarchical HMM where the transition and emission distributions can each be written as GLMMs.

It seems most of the work would be in extending code generation

Yeah, that sounds right. Maybe a good starting point would be to collect some examples that brms handles?

@null-a
Copy link
Collaborator

null-a commented Oct 17, 2019

I wonder if we might want to draw a distinction between multiple variables and vector/tensor-valued variables or means

This sounds like a promising direction to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants