GPM: Difference between revisions
From charlesreid1
No edit summary |
|||
| Line 50: | Line 50: | ||
</pre> | </pre> | ||
To do this all in a single call, while also setting the optimizer: | |||
<pre> | <pre> | ||
| Line 57: | Line 57: | ||
n_restarts_optimizer=0, normalize_y=False, | n_restarts_optimizer=0, normalize_y=False, | ||
optimizer='fmin_l_bfgs_b', random_state=None) | optimizer='fmin_l_bfgs_b', random_state=None) | ||
</pre> | |||
Some of these are set by default unless explicitly set: | |||
* The L-BFGS-B algorithm is used to optimize hyperparameters | |||
* The output variable is not normalized | |||
* <code>n_restarts_optimizer</code> is a setting to help prevent getting stuck in a local and not a global maximum. It restarts the optimization algorithm the number of times specified. | |||
* Parameters attributed with the fit function have underscore appended to their names | |||
Fit vs. Predict: | |||
* The fit method runs an optimization procedure to fit the parameters | |||
* The predict method generates predicted outcomes <math>y^{\star}</math> given a new set of predictors <math>X^{\star}</math> | |||
* The predictors are fulfilled by the posterior predictive distribution - the Gaussian process mean and covariance that are updated to the posterior forms: | |||
<math> | |||
p(y^{\star} \vert y, x, x^{\star}) = GP( m^{\star}(x^{\star}), k^{\star}(x^{\star}) ) | |||
</math> | |||
To do this with scikit-learn, using the interval from -5 to 5 as an input:: | |||
<pre> | |||
x_pred = np.linspace(-5, 5).reshape(-1,1) | |||
y_pred, sigma = gp.predict(x_pred, return_std=True) | |||
</pre> | </pre> | ||
Revision as of 09:19, 6 November 2017
GPM = Gaussian Process Modeling
Notes
Link to nice notebook explaining how to implement this in Python: https://blog.dominodatalab.com/fitting-gaussian-process-models-python/
Python Libraries
To do Gaussian Process Modeling, can use several libraries:
- GPFlow (Gaussian Process Modeling using TensorFlow under the hood)
- PyMC3 (PyMC implements Markov-chain Monte Carlo methods for GPM)
- scikit-learn (implements several GPM routines/objects/functions)
These packages principally implement covariance functions that can be used to describe non-linearity in data, and methods for fitting GPM parameters.
Scikit-Learn
scikit-learn has several relevant functions/implementations of GPMs appropriate for different applications:
- GaussianProcessRegressor - create this by specifying an appropriate covariance function (kernel). Fitting is accomplished by maximizing log marginal likelihood (avoids computationally intensive cross-validation strategy). Does not allow for specification of mean function (assumed to be zero). This is because the mean is not as important when computing the posterior.
- GaussianProcessClassifier - useful for classification tasks and fitting categorical/binary data. Uses a latent Gaussian response variable, transforms it to the unit interval (or simplex). Soft probabilisitic classification task, rather than a hard classification prediction. User provides a kernel to describe the covariance in the data set. The posterior is non-normal. Laplace approximation used in place of maximizing marginal likelihood.
Kernels:
from sklearn import gaussian_processwill import code/functions related to Gaussian process modelingfrom sklearn.gaussian_process.kernels import Maternwill import one of about a dozen GPM kernels- Matern covariance is a good, flexible first-choice:
$ k_M(x) = \dfrac{ \sigma^2 }{\Gamma(\nu) 2^{\nu - 1}} \left( \dfrac{ \sqrt{2 \nu} x }{ l } \right)^{\nu} K_{\nu} \left( \dfrac{ \sqrt{2 \nu } x }{ l } \right) $
- $ \sigma $ is amplitude, scalar multiplier that controls y-axis scaling
- $ l $ is length scale, scales realizations laong x-axis
- $ \nu $ is roughness, controls sharpness/smoothing of ridges in covariance function
Example usage:
- A single GPM kernel can be a combination of multiple kernels
- Start by using a Matern component (Matern), then use an amplitude factor (ConstantKernel), then use an observation noise (WhiteKernel) kernel
kernel = ConstantKernel()
+ Matern(length_scale=2, nu=3/2)
+ WhiteKernel(noise_level=1)
Then create a GaussianProcessRegressor object:
gp = gaussian_process.GaussianProcessRegressor(kernel=kernel) gp.fit(X, y)
To do this all in a single call, while also setting the optimizer:
GaussianProcessRegressor(alpha=1e-10, copy_X_train=True,
kernel=1**2 + Matern(length_scale=2, nu=1.5) + WhiteKernel(noise_level=1),
n_restarts_optimizer=0, normalize_y=False,
optimizer='fmin_l_bfgs_b', random_state=None)
Some of these are set by default unless explicitly set:
- The L-BFGS-B algorithm is used to optimize hyperparameters
- The output variable is not normalized
n_restarts_optimizeris a setting to help prevent getting stuck in a local and not a global maximum. It restarts the optimization algorithm the number of times specified.- Parameters attributed with the fit function have underscore appended to their names
Fit vs. Predict:
- The fit method runs an optimization procedure to fit the parameters
- The predict method generates predicted outcomes $ y^{\star} $ given a new set of predictors $ X^{\star} $
- The predictors are fulfilled by the posterior predictive distribution - the Gaussian process mean and covariance that are updated to the posterior forms:
$ p(y^{\star} \vert y, x, x^{\star}) = GP( m^{\star}(x^{\star}), k^{\star}(x^{\star}) ) $
To do this with scikit-learn, using the interval from -5 to 5 as an input::
x_pred = np.linspace(-5, 5).reshape(-1,1) y_pred, sigma = gp.predict(x_pred, return_std=True)