Gaussian Processes models for point-referenced spatial data, part 1

Introducing Gaussian Processes

We are considering the case of point-referenced data
We want to model spatial dependence of a set of random variables
We have one random variable for each location in \cal L = \{s_1, \dots, s_n \}: \{ Y(s_1), Y(s_2), \dots, Y(s_n) \} = \{ Y(s) : s \in \cal L \}
We need to define p(Y(s_1), \dots, Y(s_n)), that is our model of dependence for this specific set of random variables
What about p(Y(s_1), \dots, Y(s_n)) = \prod_{i=1}^n p(Y(s_i)) ?

Introducing Gaussian Processes

Independence assumption (IND): p(Y(s_1), \dots, Y(s_n)) = \prod_{i=1}^n p(Y(s_i))
IND is very common when we have replicated data, i.e. multiple observations of the same random variable
IND is inappropriate with spatial data because it assumes there is no dependence at all! We want dependence between random variables that is based on the notion of proximity or distance
The closer in space Y(s) and Y(s') are, the more they should be dependent on each other
To model spatial data is to model (spatial) dependence within a set of random variables
Multivariate Gaussian assumption (MVN): assume p(Y(s_1), \dots, Y(s_n)) = MVN(Y_{\cal L}; \mu, \Sigma)
In MVN, Y_{\cal L} is the column vector Y_{\cal L} = (Y(s_1), \dots, Y(s_n))^\top
Let’s write Y = Y_{\cal L} in short to keep notation light
\mu and \Sigma are the mean vector and covariance matrix for the random vector Y
What is Cov(Y(s_5), Y(s_8)) under this model?

Introducing Gaussian Processes

What is Cov(Y(s_5), Y(s_8)) under this model? \Sigma_{5,8}
We usually assume E(Y) = \mu = 0
We have just defined Cov(Y) = \Sigma
Under our Gaussian assumption, what else do we need to define to complete our model for p(Y)?

Introducing Gaussian Processes

Under our Gaussian assumption, what else do we need to define to complete our model for p(Y)?
Nothing. A MVN is completely defined by its mean vector and covariance matrix
How many parameters does this model have?

Y \sim MVN(\mu, \Sigma)

Introducing Gaussian Processes

Let \mu = 0 and estimating \Sigma we have n(n+1)/2 parameters in this model

Y \sim MVN(0, \Sigma)

We estimate \Sigma_{ij} = Cov(Y(s_i), Y(s_j)) for all i=1,\dots,n, j=1,\dots,i since \Sigma is symmetric
This is a difficult problem: very high dimensional and we need \Sigma positive definite!

In this model:

The set of locations \cal L is given and fixed
Our estimate for \Sigma is dependent on \cal L being exactly what it is

Therefore, is the above model useful?

What if we had a different fixed set of locations? How would we model that?
In other words, if we estimated the model above on \cal L, can we say anything about

Cov(Y(s_{n+1}), Y(s_{n+2})) for a pair of locations s_{n+1}, s_{n+2} \notin \cal L?

Introducing Gaussian Processes

What if we had a different fixed set of locations? How would we model that?
In other words, if we estimated the model above on \cal L, can we say anything about

Cov(Y(s_{n+1}), Y(s_{n+2})) for a pair of locations s_{n+1}, s_{n+2} \notin \cal L?

Even if we were able to estimate that difficult-to-estimate model, we still would not be able to say anything about Cov(Y(s_{n+1}), Y(s_{n+2}))

Introducing Gaussian Processes

We need a way to learn about Y(\cdot) at any point in the domain, based on observing data on any finite set \cal L

Building a Gaussian Process (overview)

For any fixed \cal L, we assume that Y_{\cal L} is multivariate Gaussian
We define the finite dimensional distribution for any set of random variables Y(s_1), \dots, Y(s_n) based on a choice of covariance function
(Check Kolmogorov’s conditions for the existence of a corresponding stochastic process: that stochastic process is the Gaussian Process)

Building Gaussian Processes

Consider a set of locations in our spatial domain \cal L \subset \cal D \subset \Re^d
Define the mean function \mu(\cdot) : \cal D \to \Re
Typical assumption: \mu(s) = 0\quad \forall s\in \cal D
Define the covariance function C(\cdot, \cdot): \cal D \times \cal D \to \Re as a symmetric and positive definite function
We then let

Y_{\cal L} \sim N(0, C_{\cal L})

Where the (i,j) element of the matrix C_{\cal L} is C(s_i, s_j)
Therefore, E(Y(s_i) Y(s_j)) = ??

Building Gaussian Processes

By construction we have E(Y(s_i))=0 and E(Y(s_j)) = 0
Therefore, E(Y(s_i) Y(s_j)) = E(Y(s_i) Y(s_j)) - E(Y(s_i)) E(Y(s_j)) = Cov(Y(s_i), Y(s_j))
We have not defined the function C(\cdot, \cdot) yet
C(\cdot, \cdot) tells us all we need to know about how our spatial random variables are dependent with each other. Why?

Building Gaussian Processes

C(\cdot, \cdot) tells us all we need to know about how our spatial random variables are dependent with each other. Why?
Because we have made a Gaussian assumption
Covariance matrix completely defines a mean-zero MVN
Covariance function tells us how to build a covariance matrix on any set of locations
GP modeling boils down to covariance modeling
Our assumptions on C(\cdot, \cdot) will direct our inference
Typically, C(\cdot, \cdot) is a parametric function of a small number of unknown parameters
Covariance parameters describe the underlying process’s variance, smoothness, spatial range
Let \theta be a vector of covariance parameters, then we can write our covariance function as C_{\theta}(\cdot, \cdot)

Covariance function vs Covariance between random variables

Note we are constructing a covariance function that takes pairs of spatial locations as inputs
We use that function to model the covariance between random variables at that pair of spatial locations
In other words we are making this assumption:

Cov(Y(s), Y(s')) = f(s, s') for some function f of our choice. In other words we model covariance uniquely via the random variables’ spatial locations.

Exponential covariance model

Let’s consider a simple covariance model

C(s, s') = \sigma^2 \exp \{ -\phi \| s-s' \| \} \qquad \text{Exponential covariance}

Assume Y(\cdot) \sim GP(0, C(\cdot, \cdot)) with C(\cdot, \cdot) as defined above
On the set of locations \cal T = \{\ell_1, \dots, \ell_n\} we have

Y_{\cal T} \sim MVN(0, C_{\cal T}) where the i,jth element of C_{\cal T} is C_{\cal T}(i,j) = C(\ell_i, \ell_j) = \sigma^2 \exp\{ -\phi \| s_i - s_j \| \}

Exponential covariance model

set.seed(696)
nobs <- 2000
coords <- data.frame(xcoord = runif(nobs, 0, 1),
                     ycoord = runif(nobs, 0, 1))

head(coords, 5)

        xcoord    ycoord
1 0.1369217145 0.3383541
2 0.3150188241 0.4855444
3 0.0009209539 0.1195672
4 0.0533826188 0.2330217
5 0.2588940123 0.3262701

For example, distance between first and third location, \| s_1 - s_3 \| = \sqrt{(s_{1,x}-s_{3,x})^2 + (s_{1,y}-s_{3,y})^2}

sqrt(sum( (coords[1,]-coords[3,])^2 ))

[1] 0.257612

Calculate all pairwise distances

Dmat <- as.matrix(dist(coords))
Dmat[1,3]

[1] 0.257612

Exponential covariance model

Fix covariance parameters for simulation

sigma2 <- 4
phi <- 3

Covariance between Y(s_1) and Y(s_3)

sigma2 * exp(-phi * Dmat[1,3])

[1] 1.846807

Compute covariance element-wise for all locations

C <- sigma2 * exp(- phi * Dmat)
C[1:5, 1:5]

         1         2         3        4        5
1 4.000000 2.0000016 1.8468073 2.672405 2.769276
2 2.000002 4.0000000 0.9412426 1.343700 2.410113
3 1.846807 0.9412426 4.0000000 2.749185 1.483769
4 2.672405 1.3437004 2.7491851 4.000000 2.032492
5 2.769276 2.4101130 1.4837695 2.032492 4.000000

Exponential covariance model

Compute Cholesky decomposition of covariance matrix C

L <- t(chol(C)) # chol() computes upper cholesky U=t(L), C = crossprod(U) = tcrossprod(L)

Sample the MVN

Y <- L %*% rnorm(nobs) + 0.1*rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process

Y <- L %*% rnorm(nobs)

Plot the data