GSFE01 - F. Guillaume Blanchet & Steve Vissault
https://steveviss.github.io/PR-GSFE01/INLA/index.html
https://steveviss.github.io/PR-GSFE01/INLA/practiceMesh.html
https://steveviss.github.io/PR-GSFE01/INLA/practiceINLASpace.html
https://steveviss.github.io/PR-GSFE01/INLA/practiceINLASpaceTime.html
https://steveviss.github.io/PR-GSFE01/INLA/practiceINLASpaceTimeCont.html
So far, all the techniques we used the frequentist paradigm when inferring results from data.
Frequentists want to find the best model parameter(s) for the data at hand
LikelihoodP(Data|Model)
They are interested in maximizing the Likelihood
They need data
Bayesians want to find how good the model parameter(s) are given some data
PosteriorP(Model|Data)
They are interested in the posterior distribution
They need data and prior information
The general framework used in Bayesian modelling is
P(Model|Data)⏟Posterior∝P(Data|Model)⏟LikelihoodP(Model)⏟Prior
INLA is an efficient way to estimate Bayesian model parameters.
To understand how INLA works, we need to learn about:
To understand how to use INLA for spatial and spatiotemporal models, we need to also learn about:
So... A lot of rather complex mathematics is ahead of us...
Let's jump in !
Latent Gaussian models are a form of hierarchical model with three levels
A special case of a latent Gaussian model is multiple regression, which is commonly presented as follow:
yi=βjXij+εi where
The likelihood function for our multiple regression (assuming a Gaussian error) can be written as
\mathbf{y} | \mathbf{X}, \boldsymbol{\beta},\sigma^2 \sim \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y_i - \beta X_i)^2}{2\sigma^2}}
where
In our multiple regression, the latent field is
\mathbf{X}, \boldsymbol{\beta}|\boldsymbol{\theta} \sim N(0,\mathbf{\Sigma}(\boldsymbol{\theta})) where
This is a key to Latent Gaussian field. That is for a model to be a Latent Gaussian model it is essential that
\beta\sim N(0,\mathbf{\Sigma}(\boldsymbol{\theta}))
The hyperparameters are parameters unrelated to the Gaussian distribution from which the regression parameters are derived. As was seen in the conceptual representation of the latent Gaussian model, there are two types of hyperparameters
Latent Gaussian models are not limited to multiple regression model, they are a framework that emcompasses
In this respect, in INLA
the multiple regression we wrote as
y_i = \beta_jX_{ij} + \varepsilon_i
is presented as
y_i = \mathbf{A}\mathbf{X}_i + \varepsilon_i where \mathbf{A} is general notation that represents an observation matrix which includes a sets of parameters used to model \mathbf{y}.
We will discuss three ways to estimate latent Gaussian model
INLA is designed to estimate a latent Gaussian model to learn about its:
Simplified Laplace approximation is the default estimation procedure in INLA, which can be understood through a brief overview of
In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where
In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where
This approach is very fast (Yééé !)
In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where
This approach is very fast (Yééé !)
But... the assumptions are too strong and often results in poor estimation (Bouh !)
Let's suppose that the distribution of \boldsymbol{\beta}_j looks like
A Laplace approximation can estimate the distribution of any unimodal distribution through a clever use of Gaussian approximation.
In other words, Laplace approximation can be conceptually understood as a "weighted" Gaussian approximation.
If we look at the probability distribution of a Gaussian distribution
\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y - \mu)^2}{2\sigma^2}}
There are two parameters associated to the Gaussian distribution
The Laplace approximation's is designed to finds the best mean and variance values of a Gaussian distribution to fit a the distribution associated to \boldsymbol\beta_j.
Visually, The difference between the \boldsymbol\beta_j and its Laplace approximation looks like
Advantage : More precise than a Gaussian approximation (it handles skewness much better)
Disadvantage: Slow (Bouh!)
Simplified Laplace approximation was proposed by Rue et al. (2009) and is designed to use the best out of Gaussian approximation and Laplace approximation.
Specifically, simplified Laplace approximation is
The theoretical developpement underlying simplified Laplace approximation is beyond the scope of this course, in essence it includes a series of smart tricks.
Integrated Nested Laplace Approximation (INLA) has a number of interesting properties:
The last few slides present a very (!) general overview of the theory behind INLA.
If you want to know more about how INLA works, take a look at http://www.r-inla.org
The most commonly used approach to estimate Bayesian models is Markov Chain Monte Carlo (MCMC).
JAGS is the most commonly used software to implement various types of Bayesian models.
Let's compare JAGS and INLA to estimate the parameters and hyperparameter of a linear model y = \alpha + \beta x where \beta \sim N(0,\sigma^2)
In this model the parameters to estimate are
N <- 500x <- rnorm(N, mean=6,sd=2)y <- rnorm(N, mean=x,sd=1) dat <- list(x=x,y=y,N=N)
library(rjags)cat('model{ for(i in 1:N) { y[i] ~ dnorm(mu[i],tau) mu[i] <- alpha + beta*x[i] } alpha ~ dnorm(0,0.001) beta ~ dnorm(0,0.001) tau ~ dgamma(0.01,0.01) }', file = "JAGSmodel.txt")JAGSmodel <- jags.model(file = "JAGSmodel.txt", data=dat, n.chains=3, n.adapt = 5000)params <- c("alpha","beta","tau","mu")
system.time(resJAGS <- coda.samples(JAGSmodel, params, n.iter=50000 ))
## user system elapsed ## 16.638 4.524 21.301
library(INLA)system.time(resINLA <- inla(y~x, family = "gaussian", data = dat))
## user system elapsed ## 0.791 0.612 1.612
par(mfrow = c(1,2), mar = c(2,3,4,1))plot(resINLA$marginals.fixed$`(Intercept)`, type = "n", xlab = "", ylab = "", main = expression(alpha), las = 1, cex.main = 4)hist(resJAGS[[1]][,1], breaks = 100, freq = FALSE, add = TRUE, col = "lightblue", border = "white")lines(resINLA$marginals.fixed$`(Intercept)`, lwd = 3, col = "darkblue")legend("topright", legend = c("JAGS", "INLA"), col = c("lightblue", "darkblue"), lty = c(1,1), lwd = c(6,3))plot(resINLA$marginals.fixed$x, type = "n", xlab = "", ylab = "", main = expression(beta), las = 1, cex.main = 4)hist(resJAGS[[1]][,2], breaks = 100, freq = FALSE, add = TRUE, col = "orange", border = "white")lines(resINLA$marginals.fixed$x, lwd = 3, col = "brown")legend("topright", legend = c("JAGS", "INLA"), col = c("orange", "brown"), lty = c(1,1), lwd = c(6,3))
Spatial modelling is computationnally intensive.
When the number of samples becomes largeish, the traditional ways of estimating spatial models falls apart especially when making spatial interpolation.
Specifically, computation time increases exponentially as the number of samples increases.
Using INLA, it is possible to bypass some of the time consuming procedures and, as such, we can construct accurate spatial and spatiotemporal models even with a very large number of samples.
As mentionned earlier, to understand how to use INLA for spatial and spatiotemporal models, we need to also learn about:
Actually, knowing about Gaussian Markov Random Fields is technically important for the basic INLA algorithm but we skimmed over it earlier, however now we need to look into it in more details.
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
Gaussian Markov Random Field can be illustrated using the following transect
A Gaussian Markov random field \mathbf{y} is defined as
\mathbf{y}\sim N(\mathbf{\mu}, \mathbf{\Sigma}) which satisfies p(y_i | \{y_j: i\ne j\}) = p(y_i | \{y_j: j \in \mathcal{N}_i\}) where \mathcal{N} is the neigbourhood around sample i
When defining a Gaussian Markov random field above, we relied on the covariance matrix \mathbf{\Sigma}.
Another way to present Gaussian Markov random field is to rely on precision (\mathbf{Q}).
Precision is the inverse of covariance. This can be formalized as \mathbf{\Sigma} = \mathbf{Q}^{-1}
What has been known for a while is that when y_i and y_j are uncorrelated, their covariance is 0. Mathematically, this can be written as
y_i \perp y_j \Leftrightarrow \mathbf{\Sigma}_{ij} = 0
What Rue et al. (2009) showed is that conditional on \mathbf{y}_{-ij} (a shorthand for y_j : i\ne j), when y_i and y_j are uncorrelated, their precision is 0. Mathematically, this can be written as
y_i \perp y_j | \mathbf{y}_{-ij} \Leftrightarrow \mathbf{Q}_{ij} = 0 This discovery by Rue et al. (2009) is especially useful because the resulting precision matrix \mathbf{Q} only has non-zero values for neighbours and diagonal elements. It is thus a sparse matrix, making it much easier to invert (a necessary procedure when estimating models).
Note: Computationally, inverting a matrix is challenging especially when the matrix is large and dense. However, the more zeros a matrix has the easier it gets to invert.
As you now know, in geostatistical context, the semivariogram serves as reference to make interpolations.
Through some fancy theoretical work by Lindgren et al. (2011) relying on stochastic partial differential equation we learned that a well suited model to derive a good precision matrix \mathbf{Q} is the Matérn covariance model
\gamma(\boldsymbol{h}) = \frac{\sigma^2}{2^{\nu-1}\Gamma(\nu)}\left(\kappa \boldsymbol{h}\right)^\nu K_\nu\left(\kappa \boldsymbol{h}\right) where
Discrete approximation of the Matérn covariance model can be constructed using weighted basis functions.
Concretly, we can decide on a mesh underlying the study area to approximate the Matérn covariance model and as such the precision matrix \mathbf{Q}
We can thus visually see the difference between the INLA approach and the classical approach
What we want is a mesh that looks like this
In INLA, estimation is performed at each point of the mesh and the interpolation is made between each vertex with an approach akin to a regression.
You should be careful about a mesh that looks like this
It is not wrong, but it will take longer to estimate
You should be careful about a mesh that looks like this
It is not wrong, but it will give a very (!) coarse interpolation
Mesh like this should not use!
The triangles are very different in size.
https://steveviss.github.io/PR-GSFE01/INLA/index.html
https://steveviss.github.io/PR-GSFE01/INLA/practiceMesh.html
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |