-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6 glm for cue #7
base: main
Are you sure you want to change the base?
Conversation
I guess it does make sense to try different distributions on this side, but the issue would be that we don't currently have any way to utilise alternative functional implementations within the |
We could do it cheaply with a config option that chose the function. This might be a thing where we hard coded two competing theoretical approaches rather than having to build a registry. But one option for now is fine! Also you don't need to go anywhere near the GLM itself. The paramaterisation does that and you just need to use an appropriate expression to turn the parameters into predictions. |
Ahh I meant that I would have to have the mechanistic model be written in such a way that it could accept arbitrary GLM parameters, which a mechanistic representation of a particular distribution wouldn't do (e.g. there's no parameterisation by which you can get But yes for now, I would probably only want to implement a specific distribution, because the alternative is not a scalable approach to soil model parameterisation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've chatted to Rob and @jacobcook1995 about notebook formats. We think it is probably going to be better to use Jupyter notebooks rather than RMarkdown. The main reason is that these outputs render better on GitHub.
I've uploaded a couple of files. One is Myst Markdown - the only real advantage there is that Github knows it is Markdown and tries to render it. It doesn't know that .Rmd
is Markdown. The .ipynb
format is rendered properly though and includes the binary data for the images. That makes them a bit bulky but they are also much easier to read and review.
None of this is nailed down yet, but Jupyter is what we'd use for Python notebooks and having the same framework and proper rendering is a real advantage.
The model was fitted with Bayesian inference using the `brms` package in `R`. | ||
|
||
```{r Model} | ||
m <- brm( | ||
CUE ~ 1 + Temp_centered + (1 | Author), | ||
data = dat_cue, | ||
family = Beta(), | ||
prior = | ||
prior(normal(0, 0.5), ub = 0, class = b), | ||
warmup = 3000, | ||
iter = 4000, | ||
cores = 4, | ||
file = "out/model_cue", | ||
file_refit = "on_change" | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using Bayes rather than the frequentist betareg
package here because of the random effect of Author?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Bayes only because it's my routine these days... For a frequentist approach there is glmmTMB
, which has a beta family. I think if we want speed / ease to run on everyone's computer, then we might want frequentist / maximum likelihood as a default going forward???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think broadly we should use the "best" analysis - and it's great to have a Bayesian viewpoint as I'm very frequentist (but only from habit, really). I don't think everybody has to run every analysis - although we want the repo set up so people can.
Good point. In addition to the config option @davidorme suggested, a simpler (but less general?) approach is to add a (inverse) link function to a
And We do not need to worry about generalising this if we only pick one GLM for now, so these are more like notes for the future. |
Thanks @davidorme , I'm all for intergration so happy to favour Jupyter over RMarkdown. That said, would it be possible to do a bit of both by building an auto-conversion into git's workflow? There seems to be It would be ideal if anyone on the data team like me could stick to the RStudio routine but still deliver Jupyter notebooks as an end product. But if this is too clunky then I'm still happy to switch over 😄 |
Yes these are definitely useful notes future! For mainly academic background reasons (only just learning what GLMs are 🫠), I'm pretty strongly in favour of trying to use solely mechanistically derived process representations for the soil model. But something Rob has talked about before is trying to implement alternative empirical derived representations, so down the road we could split the soil model into two (e.g. |
On the data side I think using @qiao2019 is pretty much a no-brainer (vs data drawn from a single system at only 3 temperatures!). The lack of tropical study sites is obviously an issue, but I think this is something we are going to have to learn to generally accept (both the tropics and soils are generally understudied, in combo it's really bad). Looking at the plotted distribution I do prefer your model to the linear models shown. Staying with the range of If we want to implement your model + parameterisation, I would need to change the functional form of
(Obviously |
Agreed, and now I can more clearly see what you were coming from. I think our recent discussion is quite relevant in this regard, i.e., on one hand we have Arrhenius equations that represent the mechanistic / process-based part, and on the other hand we have the CUE line what is a curve-fitting "empirical" model. At some point, it would be great for the data team to chat about our "default" approach / model choice:
Yeap, it would be what python calls expit and what R tends to call inverse-logit or logistic. But the slope is missing:
For renaming there are a few options:
|
Hey Hao Ran, below some comments:
|
This PR addresses #6 (stems from ImperialCollegeLondon/virtual_ecosystem#746), which is to set up a generalised linear regression to estimate parameters for temperature-dependent microbial carbon use efficiency (CUE) that does not predict CUE out of bound (stays between 0 and 1).
There are two aspects to review: (1) model choice and (2) folder structure of this repo going forward.
Model choice:
Folder structure:
data
directory, but it wasn't commited because the csv files etc. are gitignored, How do we envision data download? Do we always include the URL in the code for others to manually download it...?code/soil/cue
for my case, so thecode
directory is a bit like themodels
directory in VE. Not sure if this is best.bib
directory to address Bibliography #1 and followed the same filename asvirtual_ecosystem
. The bibliography is supposed to include data sources and refs used in html reports that I create with RMarkdown, which leads to:-.R
script, but decided to trial with.Rmd
to also generate a report at the end. It is in html intended for anyone to jump right in without having to worry about the details. But the html file would easily go >500 kb (which is the lintr limit), and mine was 900 kb due to figures, so I didn't commit it.That's all for now on the top of my head. Looking forward to pin down a folder structure for this repo :)