From charlesreid1

Revision as of 10:09, 6 November 2017 by Admin (talk | contribs) (→‎GPFlow)

GPM = Gaussian Process Modeling

Notes

Link to nice notebook explaining how to implement this in Python: https://blog.dominodatalab.com/fitting-gaussian-process-models-python/

Python Libraries

To do Gaussian Process Modeling, can use several libraries:

  • GPFlow (Gaussian Process Modeling using TensorFlow under the hood)
  • PyMC3 (PyMC implements Markov-chain Monte Carlo methods for GPM)
  • scikit-learn (implements several GPM routines/objects/functions)

These packages principally implement covariance functions that can be used to describe non-linearity in data, and methods for fitting GPM parameters.

Scikit-Learn

scikit-learn has several relevant functions/implementations of GPMs appropriate for different applications:

  • GaussianProcessRegressor - create this by specifying an appropriate covariance function (kernel). Fitting is accomplished by maximizing log marginal likelihood (avoids computationally intensive cross-validation strategy). Does not allow for specification of mean function (assumed to be zero). This is because the mean is not as important when computing the posterior.
  • GaussianProcessClassifier - useful for classification tasks and fitting categorical/binary data. Uses a latent Gaussian response variable, transforms it to the unit interval (or simplex). Soft probabilisitic classification task, rather than a hard classification prediction. User provides a kernel to describe the covariance in the data set. The posterior is non-normal. Laplace approximation used in place of maximizing marginal likelihood.

Kernels:

  • from sklearn import gaussian_process will import code/functions related to Gaussian process modeling
  • from sklearn.gaussian_process.kernels import Matern will import one of about a dozen GPM kernels
  • Matern covariance is a good, flexible first-choice:

$ k_M(x) = \dfrac{ \sigma^2 }{\Gamma(\nu) 2^{\nu - 1}} \left( \dfrac{ \sqrt{2 \nu} x }{ l } \right)^{\nu} K_{\nu} \left( \dfrac{ \sqrt{2 \nu } x }{ l } \right) $

  • $ \sigma $ is amplitude, scalar multiplier that controls y-axis scaling
  • $ l $ is length scale, scales realizations laong x-axis
  • $ \nu $ is roughness, controls sharpness/smoothing of ridges in covariance function

Example usage:

  • A single GPM kernel can be a combination of multiple kernels
  • Start by using a Matern component (Matern), then use an amplitude factor (ConstantKernel), then use an observation noise (WhiteKernel) kernel
kernel = ConstantKernel() 
    + Matern(length_scale=2, nu=3/2) 
    + WhiteKernel(noise_level=1)

Then create a GaussianProcessRegressor object:

gp = gaussian_process.GaussianProcessRegressor(kernel=kernel)
gp.fit(X, y)

To do this all in a single call, while also setting the optimizer:

GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
    kernel=1**2 + Matern(length_scale=2, nu=1.5) + WhiteKernel(noise_level=1),
    n_restarts_optimizer=0, normalize_y=False,
    optimizer='fmin_l_bfgs_b', random_state=None)

Some of these are set by default unless explicitly set:

  • The L-BFGS-B algorithm is used to optimize hyperparameters
  • The output variable is not normalized
  • n_restarts_optimizer is a setting to help prevent getting stuck in a local and not a global maximum. It restarts the optimization algorithm the number of times specified.
  • Parameters attributed with the fit function have underscore appended to their names

Fit vs. Predict:

  • The fit method runs an optimization procedure to fit the parameters
  • The predict method generates predicted outcomes $ y^{\star} $ given a new set of predictors $ X^{\star} $
  • The predictors are fulfilled by the posterior predictive distribution - the Gaussian process mean and covariance that are updated to the posterior forms:

$ p(y^{\star} \vert y, x, x^{\star}) = GP( m^{\star}(x^{\star}), k^{\star}(x^{\star}) ) $

To do this with scikit-learn, using the interval from -5 to 5 as an input::

x_pred = np.linspace(-5, 5).reshape(-1,1)
y_pred, sigma = gp.predict(x_pred, return_std=True)

GPFlow

GPFlow is software derived from GPy software, a Gaussian Process fitting software from teh Sheffield machine learning group. Both provide a set of classes for creating Gaussian process models, and a library of kernels for mean and covariate functions. Under the hood, GPFlow uses the TensorFlow library. The use of TensorFlow allows fitting larger and more complex Gaussian Process models, and more complicated methods like variational inference.

The API requires a tabulated input for predictors (features) and outputs.

To reshape y to a tabular format:

theta = [1, 10]
sigma_0 = exponential_cov(0, 0, theta)
y = [np.random.normal(scale=sigma_0)]

# reshape into tabular format
Y = y.reshape(-1,1)

(a reminder, the exponential_cov function is defined as:)

import numpy as np
def exponential_cov(x, y, params):
    return params[0] * np.exp(-0.5*params[1]*np.subtract.outer(x, y)**2)

Now import the GPFlow module:

import GPflow

GPFlow provides six GP classes for the covariance and model structure (non-normal likelihood).

We can fit the covariance and model using Markov Chain Monte Carlo (MCMC) or approximate it via variational inference.

Here is the same type of covariance matrix we used with scikit-learn, the Matern distribution:

k = GPflow.kernels.Matern32(1, variance=1, lengthscales=1.2)

m = GPflow.gpr.GPR(X, Y, kern=k)

print(m)

Flags