mssm package
Subpackages
Submodules
mssm.models module
- class mssm.models.GAMM(formula: Formula, family: Family)
Bases:
object
Class to fit Generalized Additive Mixed Models.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
- Variables:
pred ([float]) – The model prediction for the training data. Of the same dimension as
self.formula.__lhs
. Initialized withNone
.res ([float]) – The working residuals for the training data. Of the same dimension as
self.formula.__lhs
.Initialized withNone
.edf (float) – The model estimated degrees of freedom as a float. Initialized with
None
.term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.lvi (scipy.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with
None
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.hessian (scipy.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.
- approx_smooth_p_values()
Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).
Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function
mssm.models.print_parametric_terms()
) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2012) we here approximate the reference distribution under the Null by means of a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\). In case of a two-parameter distribution (i.e., estimated scale parameter \(\phi\)), the Chi-square reference distribution needs to be corrected, again resulting in a weighted chi-square distribution which should behave something like a F distribution with DoF1 = \(r\) and DoF2 = \(\epsilon_{DoF}\) (i.e., the residual degrees of freedom), which would be the reference distribution for \(T_r/r\) if \(r\) were integer and \(\mathbf{V}_{\boldsymbol{\beta}_j}\) full rank. Hence, we approximate the reference distribution for \(T_r/r\) with a Beta distribution, with \(\alpha=r/2\) and \(\beta=\epsilon_{DoF}/2\) (see Wikipedia for the specific transformation applied to \(T_r/r\) so that the resulting transformation is approximately beta distributed) - which is similar to the Gamma approximation used for the Chi-square distribution in the no-scale parameter case.
Warning: Because of the approximations of the Null reference distribution, the resulting p-values are even more approximate. They should only be treated as indicative - even more so than the values returned by
gam.summary
inmgcv
.Note: Just like in
mgcv
, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.
testStat
function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780
- Returns:
Tuple conatining two lists: first list holds approximate p-values for all smooth terms, second list holds test statistic.
- Return type:
([float],[float])
- fit(max_outer=50, max_inner=100, conv_tol=1e-07, extend_lambda=True, control_lambda=True, exclude_lambda=False, extension_method_lam='nesterov', restart=False, method='Chol', check_cond=1, progress_bar=True, n_cores=10, offset=None)
Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.
- Parameters:
max_outer (int,optional) – The maximum number of fitting iterations. Defaults to 50.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step updating the coefficients for Generalized models. Defaults to 100.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.
control_lambda (bool,optional) – Whether lambda proposals should be checked (and if necessary decreased) for actually improving the Restricted maximum likelihood of the model. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.
exclude_lambda (bool,optional) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
method (str,optional) – Which method to use to solve for the coefficients. The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but then also pivots for stability in order to get an estimate of rank defficiency. This takes substantially longer. This argument is ignored if
len(self.formula.file_paths)>0
that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to “Chol”.check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Whencheck_cond=2
, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosenmethod
. Is ignored, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to 1.progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
offset (float or [float],optional) – Mimics the behavior of the
offset
argument forgam
inmgcv
in R. If a value is provided here (can either be a float or anumpy.array
of shape (-1,1) - if it is an array, then the first dimension has to match the number of observations in the data. NANs present in the dependent variable will be excluded from the offset vector.) then it is consistently added to the linear predictor during estimation. It will not be used by any other function of theGAMM
class (e.g., for prediction). This argument is ignored iflen(self.formula.file_paths)>0
that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to None.
- get_llk(penalized: bool = True, ext_scale: float = None)
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data. LLK can optionally be evaluated for an external scale parameter
ext_scale
.Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
ext_scale (float, optional) – Optionally provide an external scale parameter at which to evaluate the log-likelihood, defaults to None
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- Returns:
llk score
- Return type:
float or None
- get_mmat(use_terms=None, drop_NA=True)
Returns exaclty the model matrix used for fitting as a scipy.sparse.csc_array. Will throw an error when called for a model for which the model matrix was never former completely - i.e., when \(\mathbf{X}^T\mathbf{X}\) was formed iteratively for estimation, by setting the
file_paths
argument of theFormula
to a non-empty list.Optionally, all columns not corresponding to terms for which the indices are provided via
use_terms
can be zeroed.- Parameters:
use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely
- Returns:
Model matrix \(\mathbf{X}\) used for fitting.
- Return type:
scp.sparse.csc_array
- get_pars()
Returns a tuple. The first entry is a np.array with all estimated coefficients. The second entry is the estimated scale parameter.
Will instead return
(None,None)
if called before fitting.- Returns:
Model coefficients and scale parameter that were estimated
- Return type:
(np.array,float) or (None, None)
- get_reml()
Get’s the (Laplace approximate) REML (Restricted Maximum Likelihood) score (as a float) for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
TypeError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid(type='Pearson')
Returns the residuals \(e_i = y_i - \mu_i\) for additive models and (by default) the Pearson residuals \(w_i^{0.5}*(z_i - \eta_i)\) (see Wood, 2017 sections 3.1.5 & 3.1.7) for generalized additive models. Here \(w_i\) are the Fisher scoring weights, \(z_i\) the pseudo-data point for each observation, and \(\eta_i\) is the linear prediction (i.e., \(g(\mu_i)\) - where \(g()\) is the link function) for each observation.
If
type= "Deviance"
, the deviance residuals are returned, which are equivalent to \(sign(y_i - \mu_i)*D_i^{0.5}\), where \(\sum_{i=1,...N} D_i\) equals the model deviance (see Wood 2017, section 3.1.7).Throws an error if called before model was fitted.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
type (str,optional) – The type of residual to return for a Generalized model, “Pearson” by default, but can be set to “Deviance” as well. Ignorred for additive models with identity link.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
Empirical residual vector
- Return type:
[float]
- predict(use_terms, n_dat, alpha=0.05, ci=False, whole_interval=False, n_ps=10000, seed=None)
Make a prediction using the fitted model for new data
n_dat
.But only using the terms indexed by
use_terms
. Importantly, predictions and standard errors are always returned on the scale of the linear predictor. When estimating a Generalized Additive Model, the mean predictions and standard errors (often referred to as the ‘response’-scale predictions) can be obtained by applying the link inverse function to the predictions and the CI-bounds on the linear predictor scale (DO NOT transform the standard error first and then add it to the transformed predictions - only on the scale of the linear predictor is the standard error additive):gamma_model = GAMM(gamma_formula,Gamma()) # A true GAM gamma_model.fit() # Now get predictions on the scale of the linear predictor pred,_,b = gamma_model.predict(None,new_dat,ci=True) # Then transform to the response scale mu_pred = gamma_model.family.link.fi(pred) mu_upper_CI = gamma_model.family.link.fi(pred + b) mu_lower_CI = gamma_model.family.link.fi(pred - b)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).ci (bool, optional) – Whether the standard error
se
for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred
-se
,pred
+se
]whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [
pred
-se
,pred
+se
]n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Returns:
A tuple with 3 entries. The first entry is the prediction
pred
based on the new datan_dat
. The second entry is the model matrix built forn_dat
that was post-multiplied with the model coefficients to obtainpred
. The third entry isNone
ifci``==``False
else the standard errorse
in the prediction.- Return type:
(np.array,scp.sparse.csc_array,np.array or None)
- predict_diff(dat1, dat2, use_terms, alpha=0.05, whole_interval=False, n_ps=10000, seed=None)
Get the difference in the predictions for two datasets.
Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case,
dat1
anddat2
should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see thepredict()
method for details.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/get_difference.html
- Parameters:
dat1 – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this
dat1
will be returned.use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Returns:
A tuple with 2 entries. The first entry is the predicted difference (between the two data sets
dat1
&dat2
)diff
. The second entry is the standard errorse
of the predicted difference. The difference CI is then [diff
-se
,diff
+se
]- Return type:
(np.array,np.array)
- print_parametric_terms()
Prints summary output for linear/parametric terms in the model, not unlike the one returned in R when using the
summary
function formgcv
models.For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- print_smooth_terms(pen_cutoff=0.2, p_values=False)
Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
- sample_post(n_ps, use_post=None, deviations=False, seed=None)
Obtain
n_ps
samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where V is \([\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}*/\phi\) (see Wood, 2017; section 6.10). To obtain samples for \(\boldsymbol{\beta}\), setdeviations
to false.see
sample_MVN()
for more details.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
- Returns:
An np.array of dimension
[len(use_post),n_ps]
containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves/predictions.- Return type:
[float]
- class mssm.models.GAMMLSS(formulas: [<class 'mssm.src.python.formula.Formula'>], family: ~mssm.src.python.exp_fam.GAMLSSFamily)
Bases:
GAMM
Class to fit Generalized Additive Mixed Models of Location Scale and Shape (see Rigby & Stasinopoulos, 2005).
Example:
# Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = \alpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = \alpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now define the model and fit! model = GAMMLSS(formulas,family) model.fit()
- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
formulas ([Formula]) – A list of formulas for the GAMMLS model
family (GAMLSSFamily) – A
GAMLSSFamily
. CurrentlyGAUMLSS
,MULNOMLSS
, andGAMMALS
are supported.
- Variables:
overall_preds ([[float]]) – The predicted means for every parameter of
family
evaluated for each observation in the training data. Initialized withNone
.edf (float) – The model estimated degrees of freedom as a float. Initialized with
None
.overall_term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.lvi (scipy.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with
None
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.overall_coef ([int]) – Contains all coefficients estimated for the model. Initialized with
None
.coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of
family
. Initialized after fitting!hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with
None
.overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.
- approx_smooth_p_values()
Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).
Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function
mssm.models.print_parametric_terms()
) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2012) we here approximate the reference distribution under the Null by means of a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\).
Warning: Because of the approximations of the Null reference distribution, the resulting p-values are even more approximate. They should only be treated as indicative - even more so than the values returned by
gam.summary
inmgcv
.Note: Just like in
mgcv
, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.
testStat
function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780
- Returns:
Tuple conatining two lists: first list holds approximate p-values for all smooth terms in a separate list for each distribution parameter, second list holds test statistic again in a separate list for each distribution parameter.
- Return type:
([[float]],[[float]])
- fit(max_outer=50, max_inner=200, min_inner=200, conv_tol=1e-07, extend_lambda=True, extension_method_lam='nesterov2', control_lambda=1, method='Chol', check_cond=1, piv_tol=np.float64(0.23651441168139897), should_keep_drop=True, progress_bar=True, n_cores=10, seed=0, init_lambda=None)
Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.
- Parameters:
max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved. Set to 1 by default.
method (str,optional) – Which method to use to solve for the coefficients. The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. Defaults to “Chol”.
check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Defaults to 1.piv_tol (float,optional) – Only used when
method='QR/Chol'
. The numerical pivoting strategy for the preceding QR decomposition then rotates columns to the end in case the norm of it is lower thanpiv_tol * sqrt(H.diag().abs().max())
- where H is the current estimate for the negative Hessian of the penalized likelihood. Defaults tonp.power(np.finfo(float).eps,0.04)
.should_keep_drop (bool,optional) – Only used when
method='QR/Chol'
. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None
- get_llk(penalized: bool = True)
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.
Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
- Returns:
llk score
- Return type:
float or None
- get_mmat(use_terms=None, drop_NA=True)
Returns a list containing exaclty the model matrices used for fitting as a
scipy.sparse.csc_array
. Will raise an error when fitting was not completed before calling this function.Optionally, all columns not corresponding to terms for which the indices are provided via
use_terms
can be zeroed.- Parameters:
use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
Model matrices \(\mathbf{X}\) used for fitting - one per parameter of
self.family
.- Return type:
[scp.sparse.csc_array]
- get_pars()
Returns a list containing all coefficients estimated for the model. Use
self.coef_split_idx
to split the vector into separate subsets per distribution parameter.Will return None if called before fitting was completed.
- Returns:
Model coefficients - before splitting!
- Return type:
[float] or None
- get_reml()
Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid()
Returns standarized residuals for GAMMLSS models (Rigby & Stasinopoulos, 2005).
The computation of the residual vector will differ a lot between different GAMMLSS models and is thus implemented as a method by each GAMMLSS family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking \(N\) independent samples from \(N(0,1)\).
References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – An error is raised in case the residuals are to be computed for a Multinomial GAMMLSS model, which is currently not supported.
ValueError – An error is raised in case the residuals are requested before the model has been fit.
- Returns:
A list of standardized residuals that should be \(\sim N(0,1)\) if the model is correct.
- Returns:
Empirical residual vector
- Return type:
[float]
- predict(par, use_terms, n_dat, alpha=0.05, ci=False, whole_interval=False, n_ps=10000, seed=None)
Make a prediction using the fitted model for new data
n_dat
using only the terms indexed byuse_terms
and for distribution parameterpar
.Importantly, predictions and standard errors are always returned on the scale of the linear predictor. For the Gaussian GAMMLSS model, the predictions for the standard deviation will thus reflect the log of the standard deviation. To get the predictions on the standard deviation scale, one can apply the inverse log-link function to the predictions and the CI-bounds on the scale of the respective linear predictor.:
model = GAMMLSS(formulas,GAUMLSS([Identity(),LOG()])) # Fit a Gaussian GAMMLSS model model.fit() # Mean predictions don't have to be transformed since the Identity link is used for this predictor. mu_mean,_,b_mean = model.predict(0,None,new_dat,ci=True) mean_upper_CI = mu_mean + b_mean mean_lower_CI = mu_mean - b_mean # Standard deviation predictions do have to be transformed - by default they are on the log-scale. eta_sd,_,b_sd = model.predict(1,None,new_dat,ci=True) mu_sd = model.family.links[1].fi(eta_sd) # Index to `links` is 1 because the sd is the second parameter! sd_upper_CI = model.family.links[1].fi(eta_sd + b_sd) sd_lower_CI = model.family.links[1].fi(eta_sd - b_sd)
Standard errors cannot currently be computed for Multinomial GAMMLSS models - attempting to set
ci=True
for such a model will result in an error.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
par (int) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean)
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).ci (bool, optional) – Whether the standard error
se
for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred
-se
,pred
+se
]whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [
pred
-se
,pred
+se
]n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Raises:
ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 3 entries. The first entry is the prediction
pred
based on the new datan_dat
. The second entry is the model matrix built forn_dat
that was post-multiplied with the model coefficients to obtainpred
. The third entry isNone
ifci``==``False
else the standard errorse
in the prediction.- Return type:
(np.array,scp.sparse.csc_array,np.array or None)
- predict_diff(dat1, dat2, par, use_terms, alpha=0.05, whole_interval=False, n_ps=10000, seed=None)
Get the difference in the predictions for two datasets and for distribution parameter
par
. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case,dat1
anddat2
should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see thepredict()
method for details.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/get_difference.html
- Parameters:
dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
par (int) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean)
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Raises:
ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 2 entries. The first entry is the predicted difference (between the two data sets
dat1
&dat2
)diff
. The second entry is the standard errorse
of the predicted difference. The difference CI is then [diff
-se
,diff
+se
]- Return type:
(np.array,np.array)
- print_parametric_terms()
Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.
For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- print_smooth_terms(pen_cutoff=0.2, p_values=False)
Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
- sample_post(n_ps, use_post=None, deviations=False, seed=None, par=0)
Obtain
n_ps
samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10). \(\mathbf{H}\) here is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}\), setdeviations
to false.see
sample_MVN()
for more details.References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
par (int) – The index corresponding to the distribution parameter for which to make the prediction (e.g., 0 = mean)
- Returns:
An np.array of dimension
[len(use_post),n_ps]
containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves.- Return type:
[float]
- class mssm.models.GSMM(formulas: [<class 'mssm.src.python.formula.Formula'>], family: ~mssm.src.python.exp_fam.GENSMOOTHFamily)
Bases:
GAMMLSS
Class to fit General Smooth/Mixed Models (see Wood, Pya, & Säfken; 2016). Estimation is possible via exact Newton method for coefficients of via BFGS (see example below).
Example:
class NUMDIFFGENSMOOTHFamily(GENSMOOTHFamily): # Implementation of the ``GENSMOOTHFamily`` class that uses ``numdifftools`` to obtain the # gradient and hessian of the likelihood to estimate a Gaussian GAMLSS via the general smooth code. # For BFGS :func:``gradient`` and :func:``hessian`` can also just return None. # References: # - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models. # - P. Brodtkorb (2014). numdifftools. see https://github.com/pbrod/numdifftools # - Nocedal & Wright (2006). Numerical Optimization. Springer New York. def __init__(self, pars: int, links:[Link], llkfun:Callable, *llkargs) -> None: super().__init__(pars, links, *llkargs) self.llkfun = llkfun def llk(self, coef, coef_split_idx, y, Xs): return self.llkfun(coef, coef_split_idx, self.links, y, Xs,*self.llkargs) def gradient(self, coef, coef_split_idx, y, Xs): return Gradient(self.llkfun)(np.ndarray.flatten(coef),coef_split_idx,self.links,y,Xs,*self.llkargs) def hessian(self, coef, coef_split_idx, y, Xs): return scp.sparse.csc_array(Hessian(self.llkfun)(np.ndarray.flatten(coef),coef_split_idx,self.links,y,Xs)) def llk_gamm_fun(coef,coef_split_idx,links,y,Xs): # Likelihood for a Gaussian GAM(LSS) - implemented so # that the model can be estimated using the general smooth code. coef = coef.reshape(-1,1) split_coef = np.split(coef,coef_split_idx) eta_mu = Xs[0]@split_coef[0] eta_sd = Xs[1]@split_coef[1] mu_mu = links[0].fi(eta_mu) mu_sd = links[1].fi(eta_sd) family = GAUMLSS([Identity(),LOG()]) llk = family.llk(y,mu_mu,mu_sd) return llk # Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = lpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = lpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] links = [Identity(),LOG()] # Now define the general family + model and fit! gsmm_fam = NUMDIFFGENSMOOTHFamily(2,links,llk_gamm_fun) model = GSMM(formulas=formulas,family=gsmm_fam) # First fit with bfgs and, to speed things up, only then with Newton model.fit(init_coef=None,method="BFGS",extend_lambda=False,max_outer=100,seed=10,conv_tol=1e-3) # Use BFGS estimate as initial estimate for Newton model coef = model.overall_coef # Now re-fit with full Newton model.fit(init_coef=coef,method="Newton",extend_lambda=False,max_outer=100,seed=10,conv_tol=1e-7,restart=True)
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
- Parameters:
formulas ([Formula]) – A list of formulas, one per parameter of the likelihood that is to be modeled as a smooth model
family (GENSMOOTHFamily) – A GENSMOOTHFamily family.
- Variables:
edf (float) – The model estimated degrees of freedom as a float. Initialized with
None
.overall_term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.lvi (scipy.sparse.csc_array or scipy.sparse.linalg.LinearOperator) – Either the inverse of the Cholesky factor of the conditional model coefficient covariance matrix - or (in case the
L-BFGS-B
optimizer was used andform_VH
was set to False when callingmodel.fit()
) ascipy.sparse.linalg.LinearOperator
of the covariance matrix not the root. Initialized withNone
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.overall_coef ([int]) – Contains all coefficients estimated for the model. Initialized with
None
.coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of
family
. Initialized after fitting!hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with
None
.overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.
- fit(init_coef=None, max_outer=50, max_inner=200, min_inner=200, conv_tol=1e-07, extend_lambda=True, extension_method_lam='nesterov2', control_lambda=1, restart=False, optimizer='Newton', method='Chol', check_cond=1, piv_tol=np.float64(0.23651441168139897), progress_bar=True, n_cores=10, seed=0, drop_NA=True, init_lambda=None, form_VH=True, use_grad=False, build_mat=None, should_keep_drop=True, gamma=1, qEFSH='SR1', overwrite_coef=True, max_restarts=0, qEFS_init_converge=True, prefit_grad=False, init_bfgs_options=None, **bfgs_options)
Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.
- Parameters:
max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved. Set to 1 by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
optimizer (str,optional) – Deprecated. Defaults to “Newton”
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. If this is set to
'qEFS'
, then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Defaults to 1.piv_tol (float,optional) – Only used when
method='QR/Chol'
. The numerical pivoting strategy for the preceding QR decomposition then rotates columns to the end in case the norm of it is lower thanpiv_tol * sqrt(H.diag().abs().max())
- where H is the current estimate for the negative Hessian of the penalized likelihood. Defaults tonp.power(np.finfo(float).eps,0.04)
.progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None
form_VH (bool,optional) – Whether to explicitly form matrix
V
- the estimated inverse of the negative Hessian of the penalized likelihood - andH
- the estimate of said Hessian - when using theqEFS
method. If set to False, onlyV
is returned - as ascipy.sparse.linalg.LinearOperator
- and available inself.overall_lvi
. Additionally,self.hessian
will then be equal toNone
. Note, that this will break default prediction/confidence interval methods - so do not call them. Defaults to Trueuse_grad (bool,optional) – Whether to pass the
self.family.gradient()
function to the quasi newton optimizer. If set to False, the gradient of the penalized likelihood will be approximated via finite differences. Defaults to Falsebuild_mat ([bool], optional) – An (optional) list, containing one bool per
mssm.src.python.formula.Formula
inself.formulas
- indicating whether the corresponding model matrix should be built. Useful if multiple formulas specify the same model matrix, in which case only one needs to be built. Do not make use of this (i.e., pass anything other than None) if you set ``method=’QR/Chol’``, since the rank deficiency handling will break for shared matrices. Defaults to None, which means all model matrices are built.should_keep_drop (bool,optional) – Only used when
method='QR/Chol'
. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.gamma (float,optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models! Defaults to 1.
qEFSH (str,optional) – Should the hessian approximation use a symmetric rank 1 update (
qEFSH='SR1'
) that is forced to result in positive definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'
) . Defaults to ‘SR1’.overwrite_coef (bool,optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when
method='qEFS'
. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect whenqEFS_init_converge=True
. Defaults to True.max_restarts (int,optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when
method='qEFS'
. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more. Defaults to 0.qEFS_init_converge (bool,optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if
overwrite_coef=True
) to initialize the q-EFS solver. Ignored ifmethod!='qEFS'
. Defaults to True.prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to False.
init_bfgs_options (dict,optional) – An optional dictionary holding the same key:value pairs that can be passed to
bfgs_options
but pased to the optimizer of the un-penalized problem. If this is None, it will be set to a copy ofbfgs_options
. Defaults to None.bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
. If none are provided, thegtol
argument will be initialized toconv_tol
. Note also, that in any case themaxiter
argument is automatically set tomax_inner
. Defaults to None.
- Raises:
ValueError – Will throw an error when
optimizer
is not one of ‘Newton’, ‘BFGS’, ‘L-BFGS-B’.
- get_llk(penalized: bool = True, drop_NA=True)
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.
Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped, defaults to True
- Returns:
llk score
- Return type:
float or None
- get_reml(drop_NA=True)
Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped when computing the log-likelihood, defaults to True
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid()
What qualifies as “residual” will differ vastly between different implementations of this class, so this method simply returns
None
.