Gaussian Processes models for point-referenced spatial data, part 1

Spatial Statistics - BIOSTAT 696/896

Michele Peruzzi

University of Michigan

Introducing Gaussian Processes

  • We are considering the case of point-referenced data
  • We want to model spatial dependence of a set of random variables
  • We have one random variable for each location in \cal L = \{s_1, \dots, s_n \}: \{ Y(s_1), Y(s_2), \dots, Y(s_n) \} = \{ Y(s) : s \in \cal L \}
  • We need to define p(Y(s_1), \dots, Y(s_n)), that is our model of dependence for this specific set of random variables
  • What about p(Y(s_1), \dots, Y(s_n)) = \prod_{i=1}^n p(Y(s_i)) ?

Introducing Gaussian Processes

  • Independence assumption (IND): p(Y(s_1), \dots, Y(s_n)) = \prod_{i=1}^n p(Y(s_i))
  • IND is very common when we have replicated data, i.e. multiple observations of the same random variable
  • IND is inappropriate with spatial data because it assumes there is no dependence at all! We want dependence between random variables that is based on the notion of proximity or distance
  • The closer in space Y(s) and Y(s') are, the more they should be dependent on each other
  • To model spatial data is to model (spatial) dependence within a set of random variables
  • Multivariate Gaussian assumption (MVN): assume p(Y(s_1), \dots, Y(s_n)) = MVN(Y_{\cal L}; \mu, \Sigma)
  • In MVN, Y_{\cal L} is the column vector Y_{\cal L} = (Y(s_1), \dots, Y(s_n))^\top
  • Let’s write Y = Y_{\cal L} in short to keep notation light
  • \mu and \Sigma are the mean vector and covariance matrix for the random vector Y
  • What is Cov(Y(s_5), Y(s_8)) under this model?

Introducing Gaussian Processes

  • What is Cov(Y(s_5), Y(s_8)) under this model? \Sigma_{5,8}
  • We usually assume E(Y) = \mu = 0
  • We have just defined Cov(Y) = \Sigma
  • Under our Gaussian assumption, what else do we need to define to complete our model for p(Y)?

Introducing Gaussian Processes

  • Under our Gaussian assumption, what else do we need to define to complete our model for p(Y)?
  • Nothing. A MVN is completely defined by its mean vector and covariance matrix
  • How many parameters does this model have?

Y \sim MVN(\mu, \Sigma)

Introducing Gaussian Processes

  • Let \mu = 0 and estimating \Sigma we have n(n+1)/2 parameters in this model

Y \sim MVN(0, \Sigma)

  • We estimate \Sigma_{ij} = Cov(Y(s_i), Y(s_j)) for all i=1,\dots,n, j=1,\dots,i since \Sigma is symmetric
  • This is a difficult problem: very high dimensional and we need \Sigma positive definite!

In this model:

  • The set of locations \cal L is given and fixed
  • Our estimate for \Sigma is dependent on \cal L being exactly what it is

Therefore, is the above model useful?

  • What if we had a different fixed set of locations? How would we model that?
  • In other words, if we estimated the model above on \cal L, can we say anything about

Cov(Y(s_{n+1}), Y(s_{n+2})) for a pair of locations s_{n+1}, s_{n+2} \notin \cal L?

Introducing Gaussian Processes

  • What if we had a different fixed set of locations? How would we model that?
  • In other words, if we estimated the model above on \cal L, can we say anything about

Cov(Y(s_{n+1}), Y(s_{n+2})) for a pair of locations s_{n+1}, s_{n+2} \notin \cal L?

  • Even if we were able to estimate that difficult-to-estimate model, we still would not be able to say anything about Cov(Y(s_{n+1}), Y(s_{n+2}))

Introducing Gaussian Processes

  • We need a way to learn about Y(\cdot) at any point in the domain, based on observing data on any finite set \cal L

Building a Gaussian Process (overview)

  1. For any fixed \cal L, we assume that Y_{\cal L} is multivariate Gaussian
  2. We define the finite dimensional distribution for any set of random variables Y(s_1), \dots, Y(s_n) based on a choice of covariance function
  3. (Check Kolmogorov’s conditions for the existence of a corresponding stochastic process: that stochastic process is the Gaussian Process)

Building Gaussian Processes

  • Consider a set of locations in our spatial domain \cal L \subset \cal D \subset \Re^d
  • Define the mean function \mu(\cdot) : \cal D \to \Re
  • Typical assumption: \mu(s) = 0\quad \forall s\in \cal D
  • Define the covariance function C(\cdot, \cdot): \cal D \times \cal D \to \Re as a symmetric and positive definite function
  • We then let

Y_{\cal L} \sim N(0, C_{\cal L})

  • Where the (i,j) element of the matrix C_{\cal L} is C(s_i, s_j)
  • Therefore, E(Y(s_i) Y(s_j)) = ??

Building Gaussian Processes

  • By construction we have E(Y(s_i))=0 and E(Y(s_j)) = 0
  • Therefore, E(Y(s_i) Y(s_j)) = E(Y(s_i) Y(s_j)) - E(Y(s_i)) E(Y(s_j)) = Cov(Y(s_i), Y(s_j))
  • We have not defined the function C(\cdot, \cdot) yet
  • C(\cdot, \cdot) tells us all we need to know about how our spatial random variables are dependent with each other. Why?

Building Gaussian Processes

  • C(\cdot, \cdot) tells us all we need to know about how our spatial random variables are dependent with each other. Why?
  • Because we have made a Gaussian assumption
  • Covariance matrix completely defines a mean-zero MVN
  • Covariance function tells us how to build a covariance matrix on any set of locations
  • GP modeling boils down to covariance modeling
  • Our assumptions on C(\cdot, \cdot) will direct our inference
  • Typically, C(\cdot, \cdot) is a parametric function of a small number of unknown parameters
  • Covariance parameters describe the underlying process’s variance, smoothness, spatial range
  • Let \theta be a vector of covariance parameters, then we can write our covariance function as C_{\theta}(\cdot, \cdot)

Covariance function vs Covariance between random variables

  • Note we are constructing a covariance function that takes pairs of spatial locations as inputs
  • We use that function to model the covariance between random variables at that pair of spatial locations
  • In other words we are making this assumption:

Cov(Y(s), Y(s')) = f(s, s') for some function f of our choice. In other words we model covariance uniquely via the random variables’ spatial locations.

Exponential covariance model

  • Let’s consider a simple covariance model

C(s, s') = \sigma^2 \exp \{ -\phi \| s-s' \| \} \qquad \text{Exponential covariance}

  • Assume Y(\cdot) \sim GP(0, C(\cdot, \cdot)) with C(\cdot, \cdot) as defined above
  • On the set of locations \cal T = \{\ell_1, \dots, \ell_n\} we have

Y_{\cal T} \sim MVN(0, C_{\cal T}) where the i,jth element of C_{\cal T} is C_{\cal T}(i,j) = C(\ell_i, \ell_j) = \sigma^2 \exp\{ -\phi \| s_i - s_j \| \}

Exponential covariance model

set.seed(696)
nobs <- 2000
coords <- data.frame(xcoord = runif(nobs, 0, 1),
                     ycoord = runif(nobs, 0, 1))

head(coords, 5)
        xcoord    ycoord
1 0.1369217145 0.3383541
2 0.3150188241 0.4855444
3 0.0009209539 0.1195672
4 0.0533826188 0.2330217
5 0.2588940123 0.3262701

For example, distance between first and third location, \| s_1 - s_3 \| = \sqrt{(s_{1,x}-s_{3,x})^2 + (s_{1,y}-s_{3,y})^2}

sqrt(sum( (coords[1,]-coords[3,])^2 )) 
[1] 0.616742

Calculate all pairwise distances

Dmat <- as.matrix(dist(coords))
Dmat[1,3]
[1] 0.616742

Exponential covariance model

Fix covariance parameters for simulation

sigma2 <- 4
phi <- 3

Covariance between Y(s_1) and Y(s_3)

sigma2 * exp(-phi * Dmat[1,3])
[1] 0.6288065

Compute covariance element-wise for all locations

C <- sigma2 * exp(- phi * Dmat)
C[1:5, 1:5]
          1         2         3         4         5
1 4.0000000 0.5613734 0.6288065 0.3767015 2.0599914
2 0.5613734 4.0000000 1.5529928 1.6107210 1.0405359
3 0.6288065 1.5529928 4.0000000 2.2691874 0.9186762
4 0.3767015 1.6107210 2.2691874 4.0000000 0.5991294
5 2.0599914 1.0405359 0.9186762 0.5991294 4.0000000

Exponential covariance model

Compute Cholesky decomposition of covariance matrix C

L <- t(chol(C)) # chol() computes upper cholesky U=t(L), C = crossprod(U) = tcrossprod(L)

Sample the MVN

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process (at the same locations)

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process (at the same locations)

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process (at the same locations)

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process (at the same locations)

Y <- L %*% rnorm(nobs)

Plot the data

Exponential covariance model

Now we can keep sampling from the same Gaussian Process (at the same locations)

Y <- L %*% rnorm(nobs)

Plot the data

Stationary covariance functions

  • C(\cdot, \cdot) is stationary if C(s, s + h) = C(h) for all h \in \Re^d such that s, s+h \in D
  • In other words, covariance only depends on the separation vector h
  • In other words, C(s, s') = C(s-s'), we are modeling covariance as not depending on the absolute location of s and s' but only their relative position in the spatial domain

Isotropic covariance functions

  • C(\cdot, \cdot) is isotropic if C(s, s + h) = C(||h||) for all h \in \Re^d such that s, s+h \in D
  • In other words, covariance only depends on the length of the separation vector h
  • In other words, C(s, s') = C(||s-s'||), we are modeling covariance as not depending on the absolute location of s and s' but only their relative position in the spatial domain, without regard to orientation
  • We can write C(s, s')=C(d) where d = ||s-s'|| is the distance between s and s'

Example: Exponential covariance

C(d) = \sigma^2 \exp \{ -\phi d \}

  • \phi is the spatial decay parameter. Sometimes we use \alpha=1/\phi, called the range
  • \sigma^2 is the spatial variance or sill

Example with \sigma^2=2,\ \phi=10:

Example: Squared exponential covariance (aka Gaussian covariance)

C(d) = \sigma^2 \exp \{ -\phi d^2 \}

  • \phi is the spatial decay parameter. Sometimes we use \alpha=1/\phi, called the range
  • \sigma^2 is the spatial variance or sill

Example with \sigma^2=2,\ \phi=20:

Example: the Matern model of covariance

C(d) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} (\phi d)^{\nu} K_{\nu}( \phi d^2 )

  • \Gamma(\cdot) is the Gamma function, K_{\nu}(\cdot) is a modified Bessel function
  • \nu is the smoothness parameter
  • If \nu=0.5 this is the exponential covariance model
  • At \nu\to\infty this becomes the squared exponential covariance model

Example: the Matern model of covariance

C(d) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} (\phi d)^{\nu} K_{\nu}( \phi d^2 ) Example with \sigma^2 = 2,\ \phi=15,\ \nu=1.2:

Comparing Matern covariances with different smoothness

Comparing Matern covariances with different smoothness

Comparing Matern covariances with different smoothness

Comparing Matern covariances with different smoothness

Comparing Matern covariances with different smoothness

Other examples

  • Power exponential covariance C(d) = \sigma^2 \exp \{ -\phi d^\gamma \} \qquad \text{where}\qquad \gamma\in (0,2]
  • Exponential covariance with nugget term C(d) = \sigma^2 \exp\{ -\phi d \} + \cal I_{d=0}(\tau^2)
    • \tau^2 is called the nugget, whereas \sigma^2 is the sill
    • the nugget term represents measurement error
    • we can add a nugget term to any covariance function
  • See textbook at page 28 for other examples

Nugget covariance

Stationary anisotropic covariance functions

  • All covariance functions need to be symmetric, so C(s, s') = C(s', s)
  • Stationary isotropic covariance function C(s, s') = C(\|h\|) depends only on distance \|h \| = \sqrt{ (s - s')^T (s- s') }
  • Stationary anisotropic? Just need C(s, s') = C(s-s') = C(h)
  • Suppose A is a d\times d dimensional positive definite matrix
  • Simple way to introduce anisotropy: make the covariance function depend on

\sqrt{ (s - s')^T A (s- s') }

  • Obtain the isotropic case by letting A = \phi I_d, with \phi > 0.
  • Example: anisotropic exponential covariance

C(h) = \sigma^2 \exp \left\{ - \sqrt{h^T A h} \right\}

  • Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - (h^T A h)^{\frac{\gamma}{2}} \right\}

Example: anisotropic power exponential covariance

  • Let’s simulate data with this covariance function:

C(h) = \sigma^2 \exp \left\{ - (h^T A h)^{\frac{\gamma}{2}} \right\}

  • We have A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}
  • If we let a_{ij} \in \Re without constraint, we might break symmetry and positive definiteness of A which we require
  • How could we define our covariance function so that the resulting A matrix is always symmetric positive definite?
  • What we need to do is reparametrize our covariance function

Example: anisotropic power exponential covariance

  • How could we define our covariance function so that the resulting A matrix is always symmetric positive definite?
  • What we need to do is reparametrize our covariance function
  • One option: Cholesky. L_A = \begin{bmatrix} l_{11} & 0 \\ l_{21} & l_{22} \end{bmatrix}, then

C(h) = \sigma^2 \exp \left\{ - (h^T L_A L_A^T h )^{\frac{\gamma}{2}} \right\}

  • The matrix L_A L_A^T is symmetric positive definite by construction
  • If l_{21} = l_{22} = 1 then L_A L_A^T = I_d and we are back to the isotropic case
  • But this is not very interpretable. How do we assign a “meaning” to l_{11} and l_{22}?
  • Instead, define S = \text{diag}\{ \sqrt{\phi}, \sqrt{s \phi} \}, R = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}, where s, \phi > 0, -1<\rho<1. Then C(h) = \sigma^2 \exp \left\{ - (h^T S R S h )^{\frac{\gamma}{2}} \right\}

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 1,\ s = 1,\ \rho = 0,\ \gamma = 1

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 1,\ \rho = 0,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 3,\ \rho = 0,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 10,\ \rho = 0,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = 0,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = 0.3,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = 0.9,\ \gamma = 1.5

Example: anisotropic power exponential covariance

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = -0.9,\ \gamma = 1.5

Visualizing a stationary anisotropic covariance function

  • In the stationary anisotropic cases we considered earlier, C(s, s') depended on the magnitude of s-s' (ie the distance between s and s') as well as the orientation
  • Therefore there is at least one new dimension to the plot

Visualizing a stationary anisotropic covariance function

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = 0.3,\ \gamma = 1.5

Visualizing a stationary anisotropic covariance function

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 25,\ \rho = 0.3,\ \gamma = 1.5

Visualizing a stationary anisotropic covariance function

C(h) = \sigma^2 \exp \left\{ - \left(h^T \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix} \begin{bmatrix} \sqrt{\phi} & 0 \\ 0 & \sqrt{s \phi} \end{bmatrix} h \right)^{\frac{\gamma}{2}} \right\}, with \sigma^2 = 1,\ \phi = 5,\ s = 5,\ \rho = 0.9,\ \gamma = 1.5

Interpretation of covariance models

  • We have made covariance plot for
    • stationary isotropic covariances
    • stationary anisotropic covariances
  • Discuss ideas for making covariance plots of nonstationary covariance models
  • No specific model in mind, just think of what nonstationary implies

Interpretation of covariance models

  • One reason we choose a specific covariance model is because we think it is a good model of spatial variation in the data
  • Another reason we choose a specific covariance model is to be able to interpret the results of our estimation
  • There is a tradeoff between ease of estimation, interpretation, and flexibility to realistic assumptions
  • Parameters themselves are not always directly interpretable
  • Different covariance functions’ parameters may not be directly comparable in how they are interpreted
  • Covariance plot is an effective tool for interpretation
  • Covariance plot becomes more difficult to do the more the covariance function is complicated!

Interpretation of covariance models

  • Other questions that we could answer:
    • what is the signal-to-noise ratio?
    • how far do we need to take two locations in order for their correlation to be less than 0.95?
    • what is the distance between locations at which correlation falls under 0.05?
  • The second question is also called the effective range

Effective range

The distance d such that Cor(d)=0.05 where Cor(\cdot) is the implied correlation function.

Example: exponential correlation function Cor(d) = \exp\{ -\phi d \}.

Is this an exponential correlation plot?

Some ways to obtain other covariance functions

  • Sum: if C_1(s, s') and C_2(s, s') are two covariance functions, then C(s, s') = C_1(s, s') + C_2(s, s') is also a valid covariance function

  • Product: if C_1(s, s') and C_2(s, s') are two covariance functions, then C(s, s') = C_1(s, s') \cdot C_2(s, s') is also a valid covariace function

Some ways to obtain other covariance functions: cont’d

  • If w_1(\cdot) \sim GP(0, C_1(\cdot, \cdot)) is independent of w_2(\cdot) \sim GP(0, C_2(\cdot, \cdot)) then w(\cdot) = w_1(\cdot) + w_2(\cdot) \sim GP(0, C_1(\cdot, \cdot) + C_2(\cdot, \cdot))

  • If w_1(\cdot) \sim GP(0, C_1(\cdot, \cdot)) is independent of w_2(\cdot) \sim GP(0, C_2(\cdot, \cdot)) then w(\cdot) = w_1(\cdot) \cdot w_2(\cdot) is such that Cov(w(\cdot)) = C(\cdot, \cdot) = C_1(\cdot, \cdot) \cdot C_2(\cdot, \cdot). But notice w(\cdot) is not a GP.

Revisiting the nugget term

C(s, s') = \sigma^2 \exp\{ -\phi \| s-s' \| \} + \cal I_{s=s'}(\tau^2)

  • We see that if w(\cdot) \sim GP(0, C) then we can write

w(\cdot) = v(\cdot) + \varepsilon(\cdot) where v(\cdot) is a GP with exponential covariance and \varepsilon(\cdot) is a white noise process

Some ways to obtain other covariance functions: cont’d

  • Direct sum: if C_1(s_1, s_1') is a covariance function on domain D_1 and C_2(s_2, s_2') is a covariance function on domain D_2 then C(s, s') = C_1(s_1, s_1') + C_2(s_2, s_2') is a covariance function on the domain D_1 \times D_2.

Example:

Suppose D_1=D \subset \Re^2 is the usual spatial domain, D_2=T\subset \Re is the time domain.

Let u(s,t) = w(s) + v(t) where w(\cdot) and v(\cdot) are GPs defined on D and T, respectively, we have that u(s,t) is a GP on D \times T.

In other words, this is a simple way to define a spatiotemporal process

  • Tensor product: if C_1(s_1, s_1') is a covariance function on domain D_1 and C_2(s_2, s_2') is a covariance function on domain D_2 then C(s, s') = C_1(s_1, s_1') \cdot C_2(s_2, s_2') is a covariance function on the domain D_1 \times D_2

Examples of non-stationary covariances

  • Paciorek and Schervish (2006), based on the Matern model:

C(s_i, s_j) = \sigma^2 \frac{2^{1-\nu}}{\Gamma(\nu)} |\Sigma_i|^{-\frac{1}{4}} |\Sigma_j|^{\frac{1}{4}} \left| \frac{\Sigma_i + \Sigma_j}{2} \right|^{-\frac{1}{2}} \left(2 \sqrt{\nu Q_{ij}} \right)^{\nu} K_{\nu}\left(2 \sqrt{\nu Q_{ij}} \right) where Q_{ij} = (s_i - s_j)^T \left(\frac{\Sigma_i + \Sigma_j}{2} \right)^{-1}(s_i - s_j) and \Sigma_i = \Sigma(x_i) is called the kernel matrix.

  • Another example

C(s_i, s_j) = \exp\{ \phi(s_i) + \phi(s_j) \} M(s_i, s_j) where M(s_i, s_j) is a Matern covariance function and \phi(x) = c_1 \phi_1(x) + \dots + c_p \phi_p(x).

Here, \phi_r(x) is the rth basis function (of a total of p) evaluated at x, and c_r,\ r=1, \dots p are nonstationary variance parameters

  • Why are these nonstationary?

Paciorek and Schervish 2006: examples

Paciorek and Schervish 2006: simulated data examples

Paciorek and Schervish 2006: simulated data examples

[896/Advanced topics] Extending GPs to multivariate data

  • We have considered GPs for univariate data
  • Univariate data: one random variable for each spatial location
  • Multivariate data: one random vector for each spatial location
  • Nothing in our discussion changes when we want to define a multivariate GP
  • Except: we need to define a cross-covariance function
  • Cross-covariance function: not only defines covariance in space, but also across variables
  • C(\ell_i, \ell_j) is a matrix valued function. The (r,s) element of C(\ell_i, \ell_j) is the covariance between the r variable measured at location \ell_i and the s variable measured at location \ell_j
  • See book Chapter 9.2.
  • More on this at the end of the semester.

First steps in modeling data via GPs

  • Probability models for point-referenced spatial data interpret observations as realizations of a set of spatially-indexed random variables
  • At each observed spatial location, we “see” 1 realization of a random variable
  • All random variables are dependent on each other
  • We assume that dependence generally decays with distance
  • We assume that a multivariate Gaussian distribution is appropriate for our collection of random variables
  • We assume a covariance model with a small number of parameters
  • We aim to learn about the covariance model using the observed data
  • Everything is in place to apply our simple GP model to point-referenced data