Moran I test under randomisation
data: df_local$Data_Value
weights: lw
Moran I statistic standard deviate = 15.968, p-value < 2.2e-16
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
0.4687127949 -0.0022935780 0.0008700179
Conditionally Autoregressive (CAR) models
Let Yi be our data at areal unit i
We assume Yi∣Yj,j=i∼N(j∑bijyj,τi2)i=1,…,n
We apply Brook’s lemma (see book p. 78) to find the joint density from all the full conditionals
We obtain
Y∼N(0,(I−B)−1D)
B is the matrix of bij and D is diagonal with elements dii=τi2
Moran I test under randomisation
data: model_residuals
weights: lw
Moran I statistic standard deviate = -5.1855, p-value = 2.154e-07
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
-0.1548429467 -0.0022935780 0.0008654362
In this case there is still spatial dependence in the residuals
We did not use any covariates
One worry we should have: do we need an offset?
Offset: an additive or multiplicative constant that adjusts the observed value by the size of the relative area
For example, if we were counting the absolute number of cases, we should offset by population
Not doing so may result in a residual pattern that looks just like the area size!
In the heart disease case we have data per 100k population so we do not need offsets
Fitting a CAR model using geostan
The model we saw did not allow for the introduction of measurement error
We thus fit a CAR model to the data directly
geostan seems to lack the ability to fit a latent model to Gaussian data
Simultaneously Autoregressive models (SAR)
In this case we model all outcomes Y as arising from independent innovations:
Define B as earlier as a weight matrix, possibly binary
Define R as diagonal with region-specific diagonal entries rii
We can simplify and let R=σ2In
εY∼N(0,R)=Xβ+(In−ρB)−1ε
As a result of this assumption, we have
Y∼N(Xβ,(In−ρW)−1R(In−ρW)−T)
This implies that the precision matrix is found as (In−ρW)TR−1(In−ρW)
Any SAR model can be cast as a CAR model, the converse is not true
CAR models are thus a larger class
Crazy idea: fit a GP to areal data using centroids?
Gaussian “Process” is a misnomer for the model we are fitting
We do not want to estimate a smooth infinite-dimensional surface on a spatial domain
CAR/SAR models take into account neighbor structure
They do so for ease of computations and interpretation
Restrictions can be hard to swallow
Gaussian “Processes” in areal data?
All we need is a coordinate system for the areal dataset
First option is to use regions’ centroids
Distance between centroids is the average distance between pairs of points across regions
Alternatives: compute centroids using population distribution within each region
Once we have centroids, we can compute distances!
Once we can compute distances, we can model covariance
The covariance function will lead to a fixed set of pairwise correlations between regions
Because we are not interested in predictions at new locations, we technically do not even need a covariance function!
Gaussian “Processes” in areal data?
For the vector Y of outcomes at each region, we have
w∣θY∣β,τ2,w∼N(0,Cθ)∼N(Xβ+w,τ2In)
Here, Cθ(i,j) will be interpreted as the “strength of connection” between i and j
The corresponding graphical model interpretation is obtained by looking at Cθ−1(i,j)
Experiment
Prepare the coordinate matrix and the data (centered, for simplicity)
Moran I test under randomisation
data: Y - yhat
weights: lw
Moran I statistic standard deviate = -3.4487, p-value = 0.0005634
alternative hypothesis: two.sided
sample estimates:
Moran I statistic Expectation Variance
-0.1040171021 -0.0022935780 0.0008700435
We have spatial information about the spatial occurence of some event
We are mostly interested with learning about the pattern of locations
Where did the events occur?
The dataset this gives rise to is a pattern of locations (points)
Our dataset boils down to the set of observed locations {S1,…,Sn}
We model these locations as random
If we were also interested on other information about the events we would talk of a marked point pattern. In that case both {S1,…,Sn} and {Y(S1),…,Y(Sn)} are random
Recall with point-referenced data we only model {Y(s1),…,Y(sn)} assuming the locations are not random (i.e. known, fixed, constant)
Areal data as aggregates of point patterns
Suppose we want to model the spread of a disease
Or the spread of a type of invasive species
Or the occurrence of forest fires
Or the location of traffic accidents
Areal data as aggregates of point patterns
Aggregation into areas
Rather than directly using the dataset of locations, we summarize information across pre-specified areas
Typically, we count how many events occurred in each area
Areal data as aggregates of point patterns
Be mindful of the size of the areas when we count!
Areal data as aggregates of point patterns
Motivation/Scenario 1
In many cases, we only have information at an aggregate level
Even if we have exact location of “disease” we may only have socio-demographic info at zip-county-state level
See book Section 6.4
Motivation/Scenario 2
We are interested in the intensity of the underlying process
Higher intensity means larger rate of occurrence of events
Estimating intensity continuously is a difficult problem in 2+ dimensions
By aggregating, we can simplify (i.e. approximate) and make the problem manageable
See book Section 8 (specifically 8.4.3)
Areal data as aggregates of point patterns
Let’s get modeling!
We have a set of areas or spatial indices or centroids or coordinates {s1,…,sn}
We have counts at those locations: {Y1,…,Yn} where Yi=Y(si)
Model those counts as Poisson random variables
Set up a spatial GLM model with log link
Call Ei the size of area i. This is problem dependent!
Example: if Yi are disease counts, Ei should be the expected number of counts
Call ni the population size at i (not a random variable), then Ei=ni⋅rˉ where rˉ is the overall occurrence rate across the region
Yi∼Poisson(Eiηi)
Our goal is to estimate ηi, i.e. the relative intensity or risk at area i
Conjugate Bayesian model for Poisson counts
ηiYi∣ηi∼Gamma(a,b)∼Poisson(Eiηi)
The Gamma prior on ηi is conjugate to the Poisson model for Yi
This implies that the posterior is also a Gamma distribution
The posterior can be derived analytically and it is
p(ηi∣Y)=p(ηi∣Yi)=Gamma(yi+a,Ei+b)
Therefore the Bayesian point estimate for ηi is η^i=E(ηi∣Y)=Ei+byi+a
Conjugacy is convenient! However, it is difficult to model spatial dependence via a Gamma distribution
Spatial Bayesian model for Poisson counts: setup
ψiYi∣ηi=Xi⊤β+wi+εi∼Poisson(Eieiψ)
ψi=logηi is the log-relative risk or log-intensity
We now have wi and εi which we can use to model
spatial correlation
non-spatial “noise” in the observed counts
What priors could we use for w and ε?
Spatial Bayesian model for Poisson counts
εiθ{wi}i=1nψiYi∣ηi∼N(0,τ2)∼p(θ)∼Gaussian with sparse graph(0,Cθ)=Xi⊤β+wi+εi∼Poisson(Eieiψ)
Spatial Bayesian model for Poisson counts: example 1
{wi}i=1n∼ICAR
We can use a CAR model based on neighbor relationships between areas
Spatial Bayesian model for Poisson counts: example 2
{wi}i=1n∼DAG-Gaussian(0,C~θ)
We can use a DAG-based model using centroids and a user-defined covariance function
Notice this is not technically a GP prior because we have areas
However, setting up this model proceeds the same way as for a DAG-GP case (see previous slides)
Spatial Bayesian model for Poisson counts: computations
εiθ{wi}i=1nψiYi∣ηi∼N(0,τ2)∼p(θ)∼Gaussian with sparse graph(0,Cθ)=Xi⊤β+wi+εi∼Poisson(Eieiψ)
Regardless of our prior choice for w, MCMC generally works via Metropolis-within-Gibbs
Move τ2 with target p(τ2∣Y,…)
Move θ with target p(θ∣Y,…)
Move {wi}i=1n with target p(w∣Y,…)
Move β with target p(β∣Y,…)
Here, we let “…” refer to all other unknowns in the model
Spatial Bayesian model for Poisson counts: computations
Recall that when the model was Gaussian, i.e. Y∣⋯∼N(…) and with conjugate priors, we had:
Gibbs sample τ2 with target p(τ2∣Y,…) (Inverse Gamma)
Update θ with target p(θ∣Y,…) (Metropolis step) Tuning/choice of update is important for efficiency!
Gibbs sample {wi}i=1n with target p(w∣Y,…) (Gaussian in 1 block or visiting each node of the graphical model)
Gibbs sample β with target p(β∣Y,…) (Gaussian)
Thus, Metropolis-within-Gibbs but predominantly a Gibbs sampler with analytically tractable distributions. Now we have a Poisson outcome model.
Spatial Bayesian model for Poisson counts: computations
This model is not conjugate
Recall: if the full conditional posterior are available analytically then we can run a Gibbs sampler
In that case, we would have a conditionally conjugate model
The model above is not conditionally conjugate
Here, p(w∣Y,θ,β,τ2) is not available analytically
The other full conditional densities are also typically not conditionally conjugate
p(w∣Y,θ,β,τ2) is typically the highest-dimensional and hence most problematic
Spatial Bayesian model for Poisson counts: computations
With Y∣⋯∼Poisson(…) we have:
Update τ2 with target p(τ2∣Y,…) (Metropolis step) Tuning/choice of update is important for efficiency!
Update θ with target p(θ∣Y,…) (Metropolis step) Tuning/choice of update is important for efficiency!
Update {wi}i=1n with target p(w∣Y,…) (Metropolis step) Tuning/choice of update is important for efficiency!
Update β with target p(β∣Y,…) (Metropolis step) Tuning/choice of update is important for efficiency!
At each step we need to
Formulate a proposal distribution
Accept or reject the proposed value
Tune the proposal according to some criterion
Doing so for each step complicates matters significantly
Spatial Bayesian model for Poisson counts: computations
Difficult setup of MCMC or poor practical performance of MCMC in high-dimensional non-Gaussian settings lacking (conditional) conjugacy
Motivates development of MCMC-free posterior computing algorithms
See e.g. Laplace approximations, Integrated Nested Laplace Approximation (INLA), Variational approximations, Expectation propagation…
Also motivates research on new MCMC methods that are more efficient
Take the target density p(x) and compute its gradient ∇p(x)
Fix ε, then let
q(x∣xold)=N(x;xold+2ε2∇p(xold),ε2Id) In other terms, let u∼N(0,Id), we accept/reject the following proposal: xprop=xold+2ε2∇p(xold)+εu This proposal takes into account some local information about the distribution which we ultimately want to sample from
Example: MALA with preconditioner
Take the target density p(x) and compute its gradient ∇p(x)
Fix ε and M, then let
q(x∣xold)=N(x;xold+2ε2M∇p(xold),ε2M) In other terms, let u∼N(0,Id), we accept/reject the following proposal: xprop=xold+2ε2M∇p(xold)+εM21u
M21 is a matrix square root (e.g. Cholesky)
This proposal takes into account some local information about the distribution which we ultimately want to sample from
We can use M to further improve upon MALA but fixing it may be difficult. Alternatively:
M=(−∇2p(xold))−1 (inverse of negative Hessian matrix of the target density) leads to the simplified Riemannian-manifold MALA (Girolami and Calderhead 2010)
M can be adapted using (−∇2p(x))−1 leading to a faster and more efficient algorithm in spatial models (P and Dunson 2024)
R package meshed uses MALA with adaptive preconditioner for non-Gaussian spatial models