Processing math: 18%
+ - 0:00:00
Notes for current slide
Notes for next slide

Integrated nested Laplace Approximation (INLA)

GSFE01 - F. Guillaume Blanchet & Steve Vissault

1 / 51

Bayesian inference


4 / 51

Frequentist

So far, all the techniques we used the frequentist paradigm when inferring results from data.

Frequentists want to find the best model parameter(s) for the data at hand

LikelihoodP(Data|Model)

They are interested in maximizing the Likelihood

They need data

Estimating model parameters

  • Minimizing the sums of squares
  • Simulated annealing
  • Nelder-Mead Simplex
  • ...
5 / 51

Bayesian

Bayesians want to find how good the model parameter(s) are given some data

PosteriorP(Model|Data)

They are interested in the posterior distribution

They need data and prior information

The general framework used in Bayesian modelling is

P(Model|Data)PosteriorP(Data|Model)LikelihoodP(Model)Prior

Estimating model parameters

  • Markov Chain Monte Carlo
  • INLA
  • ...
6 / 51

Our way of thinking is Bayesian

IMAGE ALT TEXT HERE

7 / 51

Integrated Nested Laplace Approximation


8 / 51

INLA

INLA is an efficient way to estimate Bayesian model parameters.

To understand how INLA works, we need to learn about:

  • Latent Gaussian models

To understand how to use INLA for spatial and spatiotemporal models, we need to also learn about:

  • Gaussian Markov Random Fields

So... A lot of rather complex mathematics is ahead of us...

Let's jump in !

9 / 51

Latent Gaussian models


10 / 51

Conceptual representation

Latent Gaussian models are a form of hierarchical model with three levels

11 / 51

How it works

Using multiple regression as an example

A special case of a latent Gaussian model is multiple regression, which is commonly presented as follow:

yi=βjXij+εi where

  • yi is the value of a response variable sampled at site i out of n
  • Xij is the value of the jth explanatory variable out of p sampled at site i
  • βj is the regression parameter associated to the jth explanatory variable
  • εi is the model error at site i. This is assumed to follow a Gaussian distribution
12 / 51

How it works

Using multiple regression as an example

Likelihood model

The likelihood function for our multiple regression (assuming a Gaussian error) can be written as

\mathbf{y} | \mathbf{X}, \boldsymbol{\beta},\sigma^2 \sim \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y_i - \beta X_i)^2}{2\sigma^2}}

where

  • y_i is the value of a response variable sampled at site i
  • \mathbf{X}_{i} are the values of all explanatory variables sampled at site i
  • \boldsymbol{\beta} is a vector of regression parameters for all explanatory variables
  • \sigma^2 is the variance of the Gaussian distribution underlying the model
13 / 51

How it works

Using multiple regression as an example

Latent field

In our multiple regression, the latent field is

\mathbf{X}, \boldsymbol{\beta}|\boldsymbol{\theta} \sim N(0,\mathbf{\Sigma}(\boldsymbol{\theta})) where

  • \mathbf{\Sigma} is a covariance matrix
  • \boldsymbol{\theta} is a hyperparameter that influence the structure of the covariance matrix \mathbf{\Sigma}. This could be a spatial/temporal constraint.

This is a key to Latent Gaussian field. That is for a model to be a Latent Gaussian model it is essential that

\beta\sim N(0,\mathbf{\Sigma}(\boldsymbol{\theta}))

14 / 51

How it works

Using multiple regression as an example

Hyperparameters

The hyperparameters are parameters unrelated to the Gaussian distribution from which the regression parameters are derived. As was seen in the conceptual representation of the latent Gaussian model, there are two types of hyperparameters

  • Hyperparameters associated to the likelihood
    • Variance ( \sigma^2 in our multiple regression)
    • Overdispersion (e.g. Negative binomial model)
    • Scale (e.g. Gamma model)
    • ...
  • Hyperparameters associated to the Latent field
    • Variance constraints
    • Spatial/temporal correlation
    • Autoregressive coefficients
    • ...
15 / 51

General aspects of latent Gaussian models

Latent Gaussian models are not limited to multiple regression model, they are a framework that emcompasses

  • Generalized linear models
  • Generalized linear mixed models
  • Generalized additive models
  • Generalized additive mixed models
  • Spatial and temporal correlated effect
  • Latent variable models

In this respect, in INLA the multiple regression we wrote as

y_i = \beta_jX_{ij} + \varepsilon_i

is presented as

y_i = \mathbf{A}\mathbf{X}_i + \varepsilon_i where \mathbf{A} is general notation that represents an observation matrix which includes a sets of parameters used to model \mathbf{y}.

16 / 51

Estimating latent Gaussian model

We will discuss three ways to estimate latent Gaussian model

  1. Gaussian approximation
  2. Laplace approximation
  3. Simplified Laplace approximation

INLA is designed to estimate a latent Gaussian model to learn about its:

  • Parameters
    • Regression coefficients
  • Hyperparameters
    • Correlation in an autoregressive model
    • Variance of a random effect

Simplified Laplace approximation is the default estimation procedure in INLA, which can be understood through a brief overview of

  • Gaussian approximation
  • Laplace approximation
17 / 51

Gaussian approximation

Using multiple regression as an example

In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where

  • the mean \mu_j(\boldsymbol{\theta})
  • the marginal variance \sigma_j^2(\boldsymbol{\theta}) are estimated through a careful use of the Gaussian Markov random field
18 / 51

Gaussian approximation

Using multiple regression as an example

In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where

  • the mean \mu_j(\boldsymbol{\theta})
  • the marginal variance \sigma_j^2(\boldsymbol{\theta}) are estimated through a careful use of the Gaussian Markov random field

This approach is very fast (Yééé !)

18 / 51

Gaussian approximation

Using multiple regression as an example

In a Gaussian approximation we assume that \mathbf{X}, \boldsymbol{\beta}_j|\boldsymbol{\theta}_k, \mathbf{y} \sim N(\mu_j(\boldsymbol{\theta}),\sigma_j^2(\boldsymbol{\theta})) where

  • the mean \mu_j(\boldsymbol{\theta})
  • the marginal variance \sigma_j^2(\boldsymbol{\theta}) are estimated through a careful use of the Gaussian Markov random field

This approach is very fast (Yééé !)

But... the assumptions are too strong and often results in poor estimation (Bouh !)

18 / 51

Laplace approximation

Using multiple regression as an example

Let's suppose that the distribution of \boldsymbol{\beta}_j looks like

A Laplace approximation can estimate the distribution of any unimodal distribution through a clever use of Gaussian approximation.

In other words, Laplace approximation can be conceptually understood as a "weighted" Gaussian approximation.

19 / 51

Laplace approximation

Using multiple regression as an example

If we look at the probability distribution of a Gaussian distribution

\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(y - \mu)^2}{2\sigma^2}}

There are two parameters associated to the Gaussian distribution

  • \mu : The mean of the Gaussian distribution
  • \sigma^2 : The variance of the Gaussian distribution

The Laplace approximation's is designed to finds the best mean and variance values of a Gaussian distribution to fit a the distribution associated to \boldsymbol\beta_j.

20 / 51

Laplace approximation

Using multiple regression as an example

Visually, The difference between the \boldsymbol\beta_j and its Laplace approximation looks like Advantage : More precise than a Gaussian approximation (it handles skewness much better)

Disadvantage: Slow (Bouh!)

21 / 51

Simplified Laplace approximation

Simplified Laplace approximation was proposed by Rue et al. (2009) and is designed to use the best out of Gaussian approximation and Laplace approximation.

Specifically, simplified Laplace approximation is

  • Computationally very fast
  • Almost as precise as Laplace approximation

The theoretical developpement underlying simplified Laplace approximation is beyond the scope of this course, in essence it includes a series of smart tricks.

22 / 51

Generalities about INLA

Integrated Nested Laplace Approximation (INLA) has a number of interesting properties:

  • It is to be used within a Bayesian framework
    • This means that priors need to be chosen and a likelihood needs to be calculated
  • The type of models estimated are all variant of a latent Gaussian model
    • This is a very broad group of modelss
  • INLA is a parameter (and hyperparameter) estimation procedure (nothing more, nothing less)
  • It is an approximation, so it is not perfect... However, it does works most of the time
  • Compare to classic estimation approaches it is blisteringly fast

The last few slides present a very (!) general overview of the theory behind INLA.

If you want to know more about how INLA works, take a look at http://www.r-inla.org

23 / 51

How fast is INLA ?

The most commonly used approach to estimate Bayesian models is Markov Chain Monte Carlo (MCMC).

JAGS is the most commonly used software to implement various types of Bayesian models.

Let's compare JAGS and INLA to estimate the parameters and hyperparameter of a linear model y = \alpha + \beta x where \beta \sim N(0,\sigma^2)

In this model the parameters to estimate are

  • \alpha
  • \beta
  • \sigma^2
24 / 51

How fast is INLA ?

Data Simulation

N <- 500
x <- rnorm(N, mean=6,sd=2)
y <- rnorm(N, mean=x,sd=1)
dat <- list(x=x,y=y,N=N)
25 / 51

How fast is INLA ?

JAGS code

library(rjags)
cat('model{
for(i in 1:N) {
y[i] ~ dnorm(mu[i],tau)
mu[i] <- alpha + beta*x[i]
}
alpha ~ dnorm(0,0.001)
beta ~ dnorm(0,0.001)
tau ~ dgamma(0.01,0.01)
}', file = "JAGSmodel.txt")
JAGSmodel <- jags.model(file = "JAGSmodel.txt", data=dat,
n.chains=3, n.adapt = 5000)
params <- c("alpha","beta","tau","mu")
system.time(
resJAGS <- coda.samples(JAGSmodel, params, n.iter=50000 )
)
## user system elapsed
## 16.638 4.524 21.301
26 / 51

How fast is INLA ?

INLA code

library(INLA)
system.time(
resINLA <- inla(y~x,
family = "gaussian",
data = dat)
)
## user system elapsed
## 0.791 0.612 1.612
27 / 51

How fast is INLA ?

Do the results match

28 / 51

How fast is INLA ?

Do the results match

par(mfrow = c(1,2), mar = c(2,3,4,1))
plot(resINLA$marginals.fixed$`(Intercept)`,
type = "n", xlab = "", ylab = "",
main = expression(alpha), las = 1, cex.main = 4)
hist(resJAGS[[1]][,1], breaks = 100,
freq = FALSE, add = TRUE, col = "lightblue", border = "white")
lines(resINLA$marginals.fixed$`(Intercept)`, lwd = 3, col = "darkblue")
legend("topright",
legend = c("JAGS", "INLA"),
col = c("lightblue", "darkblue"),
lty = c(1,1), lwd = c(6,3))
plot(resINLA$marginals.fixed$x,
type = "n", xlab = "", ylab = "",
main = expression(beta), las = 1, cex.main = 4)
hist(resJAGS[[1]][,2], breaks = 100,
freq = FALSE, add = TRUE, col = "orange", border = "white")
lines(resINLA$marginals.fixed$x, lwd = 3, col = "brown")
legend("topright",
legend = c("JAGS", "INLA"),
col = c("orange", "brown"),
lty = c(1,1), lwd = c(6,3))
29 / 51

Spatial and spatiotemporal modelling using INLA


30 / 51

Why use INLA for space(-time) modelling?

Spatial modelling is computationnally intensive.

When the number of samples becomes largeish, the traditional ways of estimating spatial models falls apart especially when making spatial interpolation.

Specifically, computation time increases exponentially as the number of samples increases.

Using INLA, it is possible to bypass some of the time consuming procedures and, as such, we can construct accurate spatial and spatiotemporal models even with a very large number of samples.

31 / 51

Space(-time) modelling using INLA

As mentionned earlier, to understand how to use INLA for spatial and spatiotemporal models, we need to also learn about:

  • Gaussian Markov Random Fields

Actually, knowing about Gaussian Markov Random Fields is technically important for the basic INLA algorithm but we skimmed over it earlier, however now we need to look into it in more details.

32 / 51

Gaussian Markov Random Field


33 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

34 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

35 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

36 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

37 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

38 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

39 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

40 / 51

Conceptual representation

Gaussian Markov Random Field can be illustrated using the following transect

Property of Gaussian Markov Random Field

  • A stochastic process relying on the Gaussian distribution
  • A memoryless process that depends only on the present to decide the next outcome, not the events that occured in the past
41 / 51

Conceptual representation in 2D

42 / 51

Mathematics of Gaussian Markov Random Field

A Gaussian Markov random field \mathbf{y} is defined as

\mathbf{y}\sim N(\mathbf{\mu}, \mathbf{\Sigma}) which satisfies p(y_i | \{y_j: i\ne j\}) = p(y_i | \{y_j: j \in \mathcal{N}_i\}) where \mathcal{N} is the neigbourhood around sample i

A trick about Gaussian Markov random field

When defining a Gaussian Markov random field above, we relied on the covariance matrix \mathbf{\Sigma}.

Another way to present Gaussian Markov random field is to rely on precision (\mathbf{Q}).

Precision is the inverse of covariance. This can be formalized as \mathbf{\Sigma} = \mathbf{Q}^{-1}

43 / 51

Why should we rely on \mathbf{Q} and not \mathbf{\Sigma}?

What has been known for a while is that when y_i and y_j are uncorrelated, their covariance is 0. Mathematically, this can be written as

y_i \perp y_j \Leftrightarrow \mathbf{\Sigma}_{ij} = 0

What Rue et al. (2009) showed is that conditional on \mathbf{y}_{-ij} (a shorthand for y_j : i\ne j), when y_i and y_j are uncorrelated, their precision is 0. Mathematically, this can be written as

y_i \perp y_j | \mathbf{y}_{-ij} \Leftrightarrow \mathbf{Q}_{ij} = 0 This discovery by Rue et al. (2009) is especially useful because the resulting precision matrix \mathbf{Q} only has non-zero values for neighbours and diagonal elements. It is thus a sparse matrix, making it much easier to invert (a necessary procedure when estimating models).

Note: Computationally, inverting a matrix is challenging especially when the matrix is large and dense. However, the more zeros a matrix has the easier it gets to invert.

44 / 51

Constructing \mathbf{Q}

As you now know, in geostatistical context, the semivariogram serves as reference to make interpolations.

Through some fancy theoretical work by Lindgren et al. (2011) relying on stochastic partial differential equation we learned that a well suited model to derive a good precision matrix \mathbf{Q} is the Matérn covariance model

\gamma(\boldsymbol{h}) = \frac{\sigma^2}{2^{\nu-1}\Gamma(\nu)}\left(\kappa \boldsymbol{h}\right)^\nu K_\nu\left(\kappa \boldsymbol{h}\right) where

  • \boldsymbol{h} is the distance between two samples
  • \sigma^2 is the variance
  • \nu and \kappa are scaling parameters (always > 0)
  • K_\nu is a Bessel function of the second kind of order \nu

45 / 51

Constructing \mathbf{Q}

Discrete approximation of the Matérn covariance model can be constructed using weighted basis functions.

Concretly, we can decide on a mesh underlying the study area to approximate the Matérn covariance model and as such the precision matrix \mathbf{Q}

We can thus visually see the difference between the INLA approach and the classical approach

Classic

INLA

46 / 51

Constructing the mesh

What we want is a mesh that looks like this

47 / 51

Constructing the mesh

Property of a good mesh

  • The samples do not need to be at the corner of edges
  • Triangles formed in the mesh are roughly of the same size
  • The number of vertex (points in the mesh) is reasonable
  • There should be a buffer around the points

Important technical consideration

In INLA, estimation is performed at each point of the mesh and the interpolation is made between each vertex with an approach akin to a regression.

48 / 51

Constructing the mesh

A few example of bad mesh

You should be careful about a mesh that looks like this It is not wrong, but it will take longer to estimate

49 / 51

Constructing the mesh

A few example of bad mesh

You should be careful about a mesh that looks like this It is not wrong, but it will give a very (!) coarse interpolation

50 / 51

Constructing the mesh

A few example of bad mesh

Mesh like this should not use! The triangles are very different in size.

51 / 51
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow