mssm package

Subpackages

Submodules

mssm.models module

class mssm.models.GAMM(formula: Formula, family: Family)

Bases: object

Class to fit Generalized Additive Mixed Models.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • formula (Formula) – A formula for the GAMM model

  • family (Family) – A distribution implementing the Family class. Currently Gaussian, Gamma, and Binomial are implemented.

Variables:
  • pred ([float]) – The model prediction for the training data. Of the same dimension as self.formula.__lhs. Initialized with None.

  • res ([float]) – The working residuals for the training data. Of the same dimension as self.formula.__lhs.Initialized with None.

  • edf (float) – The model estimated degrees of freedom as a float. Initialized with None.

  • term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • lvi (scipy.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • hessian (scipy.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

approx_smooth_p_values()

Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).

Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function mssm.models.print_parametric_terms()) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.

They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2012) we here approximate the reference distribution under the Null by means of a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\). In case of a two-parameter distribution (i.e., estimated scale parameter \(\phi\)), the Chi-square reference distribution needs to be corrected, again resulting in a weighted chi-square distribution which should behave something like a F distribution with DoF1 = \(r\) and DoF2 = \(\epsilon_{DoF}\) (i.e., the residual degrees of freedom), which would be the reference distribution for \(T_r/r\) if \(r\) were integer and \(\mathbf{V}_{\boldsymbol{\beta}_j}\) full rank. Hence, we approximate the reference distribution for \(T_r/r\) with a Beta distribution, with \(\alpha=r/2\) and \(\beta=\epsilon_{DoF}/2\) (see Wikipedia for the specific transformation applied to \(T_r/r\) so that the resulting transformation is approximately beta distributed) - which is similar to the Gamma approximation used for the Chi-square distribution in the no-scale parameter case.

Warning: Because of the approximations of the Null reference distribution, the resulting p-values are even more approximate. They should only be treated as indicative - even more so than the values returned by gam.summary in mgcv.

Note: Just like in mgcv, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.

  • testStat function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780

Returns:

Tuple conatining two lists: first list holds approximate p-values for all smooth terms, second list holds test statistic.

Return type:

([float],[float])

fit(max_outer=50, max_inner=100, conv_tol=1e-07, extend_lambda=True, control_lambda=True, exclude_lambda=False, extension_method_lam='nesterov', restart=False, method='Chol', check_cond=1, progress_bar=True, n_cores=10, offset=None)

Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.

Parameters:
  • max_outer (int,optional) – The maximum number of fitting iterations. Defaults to 50.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step updating the coefficients for Generalized models. Defaults to 100.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.

  • control_lambda (bool,optional) – Whether lambda proposals should be checked (and if necessary decreased) for actually improving the Restricted maximum likelihood of the model. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.

  • exclude_lambda (bool,optional) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.

  • restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.

  • method (str,optional) – Which method to use to solve for the coefficients. The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but then also pivots for stability in order to get an estimate of rank defficiency. This takes substantially longer. This argument is ignored if len(self.formula.file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to “Chol”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). When check_cond=2, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosen method. Is ignored, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to 1.

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • offset (float or [float],optional) – Mimics the behavior of the offset argument for gam in mgcv in R. If a value is provided here (can either be a float or a numpy.array of shape (-1,1) - if it is an array, then the first dimension has to match the number of observations in the data. NANs present in the dependent variable will be excluded from the offset vector.) then it is consistently added to the linear predictor during estimation. It will not be used by any other function of the GAMM class (e.g., for prediction). This argument is ignored if len(self.formula.file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to None.

get_llk(penalized: bool = True, ext_scale: float = None)

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data. LLK can optionally be evaluated for an external scale parameter ext_scale.

Will instead return None if called before fitting.

Parameters:
  • penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

  • ext_scale (float, optional) – Optionally provide an external scale parameter at which to evaluate the log-likelihood, defaults to None

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

Returns:

llk score

Return type:

float or None

get_mmat(use_terms=None, drop_NA=True)

Returns exaclty the model matrix used for fitting as a scipy.sparse.csc_array. Will throw an error when called for a model for which the model matrix was never former completely - i.e., when \(\mathbf{X}^T\mathbf{X}\) was formed iteratively for estimation, by setting the file_paths argument of the Formula to a non-empty list.

Optionally, all columns not corresponding to terms for which the indices are provided via use_terms can be zeroed.

Parameters:
  • use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

  • drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True

Raises:
  • ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

  • NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely

Returns:

Model matrix \(\mathbf{X}\) used for fitting.

Return type:

scp.sparse.csc_array

get_pars()

Returns a tuple. The first entry is a np.array with all estimated coefficients. The second entry is the estimated scale parameter.

Will instead return (None,None) if called before fitting.

Returns:

Model coefficients and scale parameter that were estimated

Return type:

(np.array,float) or (None, None)

get_reml()

Get’s the (Laplace approximate) REML (Restricted Maximum Likelihood) score (as a float) for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

Raises:
  • ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

  • NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

  • TypeError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid(type='Pearson')

Returns the residuals \(e_i = y_i - \mu_i\) for additive models and (by default) the Pearson residuals \(w_i^{0.5}*(z_i - \eta_i)\) (see Wood, 2017 sections 3.1.5 & 3.1.7) for generalized additive models. Here \(w_i\) are the Fisher scoring weights, \(z_i\) the pseudo-data point for each observation, and \(\eta_i\) is the linear prediction (i.e., \(g(\mu_i)\) - where \(g()\) is the link function) for each observation.

If type= "Deviance", the deviance residuals are returned, which are equivalent to \(sign(y_i - \mu_i)*D_i^{0.5}\), where \(\sum_{i=1,...N} D_i\) equals the model deviance (see Wood 2017, section 3.1.7).

Throws an error if called before model was fitted.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

type (str,optional) – The type of residual to return for a Generalized model, “Pearson” by default, but can be set to “Deviance” as well. Ignorred for additive models with identity link.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Empirical residual vector

Return type:

[float]

predict(use_terms, n_dat, alpha=0.05, ci=False, whole_interval=False, n_ps=10000, seed=None)

Make a prediction using the fitted model for new data n_dat.

But only using the terms indexed by use_terms. Importantly, predictions and standard errors are always returned on the scale of the linear predictor. When estimating a Generalized Additive Model, the mean predictions and standard errors (often referred to as the ‘response’-scale predictions) can be obtained by applying the link inverse function to the predictions and the CI-bounds on the linear predictor scale (DO NOT transform the standard error first and then add it to the transformed predictions - only on the scale of the linear predictor is the standard error additive):

gamma_model = GAMM(gamma_formula,Gamma()) # A true GAM
gamma_model.fit()
# Now get predictions on the scale of the linear predictor
pred,_,b = gamma_model.predict(None,new_dat,ci=True)
# Then transform to the response scale
mu_pred = gamma_model.family.link.fi(pred)
mu_upper_CI = gamma_model.family.link.fi(pred + b)
mu_lower_CI = gamma_model.family.link.fi(pred - b)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction.

Return type:

(np.array,scp.sparse.csc_array,np.array or None)

predict_diff(dat1, dat2, use_terms, alpha=0.05, whole_interval=False, n_ps=10000, seed=None)

Get the difference in the predictions for two datasets.

Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

  • get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:
  • dat1 – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.

  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference. The difference CI is then [diff - se, diff + se]

Return type:

(np.array,np.array)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, not unlike the one returned in R when using the summary function for mgcv models.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff=0.2, p_values=False)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False

sample_post(n_ps, use_post=None, deviations=False, seed=None)

Obtain n_ps samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where V is \([\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}*/\phi\) (see Wood, 2017; section 6.10). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n_ps (int,optional) – Number of samples to obtain from posterior.

  • use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.

  • deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False

  • seed (int,optional) – A seed to use for the sampling, defaults to None

Returns:

An np.array of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves/predictions.

Return type:

[float]

class mssm.models.GAMMLSS(formulas: [<class 'mssm.src.python.formula.Formula'>], family: ~mssm.src.python.exp_fam.GAMLSSFamily)

Bases: GAMM

Class to fit Generalized Additive Mixed Models of Location Scale and Shape (see Rigby & Stasinopoulos, 2005).

Example:

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now define the model and fit!
model = GAMMLSS(formulas,family)
model.fit()
References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • formulas ([Formula]) – A list of formulas for the GAMMLS model

  • family (GAMLSSFamily) – A GAMLSSFamily. Currently GAUMLSS, MULNOMLSS, and GAMMALS are supported.

Variables:
  • overall_preds ([[float]]) – The predicted means for every parameter of family evaluated for each observation in the training data. Initialized with None.

  • edf (float) – The model estimated degrees of freedom as a float. Initialized with None.

  • overall_term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • lvi (scipy.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • overall_coef ([int]) – Contains all coefficients estimated for the model. Initialized with None.

  • coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. Initialized after fitting!

  • hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with None.

  • overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

approx_smooth_p_values()

Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).

Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function mssm.models.print_parametric_terms()) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.

They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2012) we here approximate the reference distribution under the Null by means of a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\).

Warning: Because of the approximations of the Null reference distribution, the resulting p-values are even more approximate. They should only be treated as indicative - even more so than the values returned by gam.summary in mgcv.

Note: Just like in mgcv, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.

  • testStat function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780

Returns:

Tuple conatining two lists: first list holds approximate p-values for all smooth terms in a separate list for each distribution parameter, second list holds test statistic again in a separate list for each distribution parameter.

Return type:

([[float]],[[float]])

fit(max_outer=50, max_inner=200, min_inner=200, conv_tol=1e-07, extend_lambda=True, extension_method_lam='nesterov2', control_lambda=1, method='Chol', check_cond=1, piv_tol=np.float64(0.23651441168139897), should_keep_drop=True, progress_bar=True, n_cores=10, seed=0, init_lambda=None)

Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.

Parameters:
  • max_outer (int,optional) – The maximum number of fitting iterations.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.

  • min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.

  • control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved. Set to 1 by default.

  • method (str,optional) – Which method to use to solve for the coefficients. The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. Defaults to “Chol”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.

  • piv_tol (float,optional) – Only used when method='QR/Chol'. The numerical pivoting strategy for the preceding QR decomposition then rotates columns to the end in case the norm of it is lower than piv_tol * sqrt(H.diag().abs().max()) - where H is the current estimate for the negative Hessian of the penalized likelihood. Defaults to np.power(np.finfo(float).eps,0.04).

  • should_keep_drop (bool,optional) – Only used when method='QR/Chol'. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0

  • init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None

get_llk(penalized: bool = True)

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:

penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

Returns:

llk score

Return type:

float or None

get_mmat(use_terms=None, drop_NA=True)

Returns a list containing exaclty the model matrices used for fitting as a scipy.sparse.csc_array. Will raise an error when fitting was not completed before calling this function.

Optionally, all columns not corresponding to terms for which the indices are provided via use_terms can be zeroed.

Parameters:
  • use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

  • drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Model matrices \(\mathbf{X}\) used for fitting - one per parameter of self.family.

Return type:

[scp.sparse.csc_array]

get_pars()

Returns a list containing all coefficients estimated for the model. Use self.coef_split_idx to split the vector into separate subsets per distribution parameter.

Will return None if called before fitting was completed.

Returns:

Model coefficients - before splitting!

Return type:

[float] or None

get_reml()

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid()

Returns standarized residuals for GAMMLSS models (Rigby & Stasinopoulos, 2005).

The computation of the residual vector will differ a lot between different GAMMLSS models and is thus implemented as a method by each GAMMLSS family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking \(N\) independent samples from \(N(0,1)\).

References:

  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:
  • NotImplementedError – An error is raised in case the residuals are to be computed for a Multinomial GAMMLSS model, which is currently not supported.

  • ValueError – An error is raised in case the residuals are requested before the model has been fit.

Returns:

A list of standardized residuals that should be \(\sim N(0,1)\) if the model is correct.

Returns:

Empirical residual vector

Return type:

[float]

predict(par, use_terms, n_dat, alpha=0.05, ci=False, whole_interval=False, n_ps=10000, seed=None)

Make a prediction using the fitted model for new data n_dat using only the terms indexed by use_terms and for distribution parameter par.

Importantly, predictions and standard errors are always returned on the scale of the linear predictor. For the Gaussian GAMMLSS model, the predictions for the standard deviation will thus reflect the log of the standard deviation. To get the predictions on the standard deviation scale, one can apply the inverse log-link function to the predictions and the CI-bounds on the scale of the respective linear predictor.:

model = GAMMLSS(formulas,GAUMLSS([Identity(),LOG()])) # Fit a Gaussian GAMMLSS model
model.fit()
# Mean predictions don't have to be transformed since the Identity link is used for this predictor.
mu_mean,_,b_mean = model.predict(0,None,new_dat,ci=True)
mean_upper_CI = mu_mean + b_mean
mean_lower_CI = mu_mean - b_mean
# Standard deviation predictions do have to be transformed - by default they are on the log-scale.
eta_sd,_,b_sd = model.predict(1,None,new_dat,ci=True)
mu_sd = model.family.links[1].fi(eta_sd) # Index to `links` is 1 because the sd is the second parameter!
sd_upper_CI = model.family.links[1].fi(eta_sd + b_sd)
sd_lower_CI = model.family.links[1].fi(eta_sd - b_sd)

Standard errors cannot currently be computed for Multinomial GAMMLSS models - attempting to set ci=True for such a model will result in an error.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • par (int) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean)

  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Raises:

ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction.

Return type:

(np.array,scp.sparse.csc_array,np.array or None)

predict_diff(dat1, dat2, par, use_terms, alpha=0.05, whole_interval=False, n_ps=10000, seed=None)

Get the difference in the predictions for two datasets and for distribution parameter par. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

  • get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:
  • dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.

  • par (int) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean)

  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Raises:

ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference. The difference CI is then [diff - se, diff + se]

Return type:

(np.array,np.array)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff=0.2, p_values=False)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False

sample_post(n_ps, use_post=None, deviations=False, seed=None, par=0)

Obtain n_ps samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10). \(\mathbf{H}\) here is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n_ps (int,optional) – Number of samples to obtain from posterior.

  • use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.

  • deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False

  • seed (int,optional) – A seed to use for the sampling, defaults to None

  • par (int) – The index corresponding to the distribution parameter for which to make the prediction (e.g., 0 = mean)

Returns:

An np.array of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves.

Return type:

[float]

class mssm.models.GSMM(formulas: [<class 'mssm.src.python.formula.Formula'>], family: ~mssm.src.python.exp_fam.GENSMOOTHFamily)

Bases: GAMMLSS

Class to fit General Smooth/Mixed Models (see Wood, Pya, & Säfken; 2016). Estimation is possible via exact Newton method for coefficients of via BFGS (see example below).

Example:

class NUMDIFFGENSMOOTHFamily(GENSMOOTHFamily):
    # Implementation of the ``GENSMOOTHFamily`` class that uses ``numdifftools`` to obtain the
    # gradient and hessian of the likelihood to estimate a Gaussian GAMLSS via the general smooth code.

    # For BFGS :func:``gradient`` and :func:``hessian`` can also just return None.

    # References:

    #    - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
    #    - P. Brodtkorb (2014). numdifftools. see https://github.com/pbrod/numdifftools
    #    - Nocedal & Wright (2006). Numerical Optimization. Springer New York.


    def __init__(self, pars: int, links:[Link], llkfun:Callable, *llkargs) -> None:
        super().__init__(pars, links, *llkargs)
        self.llkfun = llkfun

    def llk(self, coef, coef_split_idx, y, Xs):
        return self.llkfun(coef, coef_split_idx, self.links, y, Xs,*self.llkargs)

    def gradient(self, coef, coef_split_idx, y, Xs):
        return Gradient(self.llkfun)(np.ndarray.flatten(coef),coef_split_idx,self.links,y,Xs,*self.llkargs)

    def hessian(self, coef, coef_split_idx, y, Xs):
        return scp.sparse.csc_array(Hessian(self.llkfun)(np.ndarray.flatten(coef),coef_split_idx,self.links,y,Xs))


def llk_gamm_fun(coef,coef_split_idx,links,y,Xs):
        # Likelihood for a Gaussian GAM(LSS) - implemented so
        # that the model can be estimated using the general smooth code.

        coef = coef.reshape(-1,1)
        split_coef = np.split(coef,coef_split_idx)
        eta_mu = Xs[0]@split_coef[0]
        eta_sd = Xs[1]@split_coef[1]

        mu_mu = links[0].fi(eta_mu)
        mu_sd = links[1].fi(eta_sd)

        family = GAUMLSS([Identity(),LOG()])
        llk = family.llk(y,mu_mu,mu_sd)
        return llk

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = lpha + f(x0)
formula_m = Formula(lhs("y"),
                [i(),f(["x0"],nk=10)],
                data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = lpha + f(x0)
formula_sd = Formula(lhs("y"),
                [i(),f(["x0"],nk=10)],
                data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]
links = [Identity(),LOG()]

# Now define the general family + model and fit!
gsmm_fam = NUMDIFFGENSMOOTHFamily(2,links,llk_gamm_fun)
model = GSMM(formulas=formulas,family=gsmm_fam)

# First fit with bfgs and, to speed things up, only then with Newton
model.fit(init_coef=None,method="BFGS",extend_lambda=False,max_outer=100,seed=10,conv_tol=1e-3)

# Use BFGS estimate as initial estimate for Newton model
coef = model.overall_coef

# Now re-fit with full Newton
model.fit(init_coef=coef,method="Newton",extend_lambda=False,max_outer=100,seed=10,conv_tol=1e-7,restart=True)
References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:
  • formulas ([Formula]) – A list of formulas, one per parameter of the likelihood that is to be modeled as a smooth model

  • family (GENSMOOTHFamily) – A GENSMOOTHFamily family.

Variables:
  • edf (float) – The model estimated degrees of freedom as a float. Initialized with None.

  • overall_term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • lvi (scipy.sparse.csc_array or scipy.sparse.linalg.LinearOperator) – Either the inverse of the Cholesky factor of the conditional model coefficient covariance matrix - or (in case the L-BFGS-B optimizer was used and form_VH was set to False when calling model.fit()) a scipy.sparse.linalg.LinearOperator of the covariance matrix not the root. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • overall_coef ([int]) – Contains all coefficients estimated for the model. Initialized with None.

  • coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. Initialized after fitting!

  • hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood. Initialized with None.

  • overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

fit(init_coef=None, max_outer=50, max_inner=200, min_inner=200, conv_tol=1e-07, extend_lambda=True, extension_method_lam='nesterov2', control_lambda=1, restart=False, optimizer='Newton', method='Chol', check_cond=1, piv_tol=np.float64(0.23651441168139897), progress_bar=True, n_cores=10, seed=0, drop_NA=True, init_lambda=None, form_VH=True, use_grad=False, build_mat=None, should_keep_drop=True, gamma=1, qEFSH='SR1', overwrite_coef=True, max_restarts=0, qEFS_init_converge=True, prefit_grad=False, init_bfgs_options=None, **bfgs_options)

Fit the specified model. Additional keyword arguments not listed below should not be modified unless you really know what you are doing.

Parameters:
  • max_outer (int,optional) – The maximum number of fitting iterations.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.

  • min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Enabled by default.

  • control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved. Set to 1 by default.

  • restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.

  • optimizer (str,optional) – Deprecated. Defaults to “Newton”

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.

  • piv_tol (float,optional) – Only used when method='QR/Chol'. The numerical pivoting strategy for the preceding QR decomposition then rotates columns to the end in case the norm of it is lower than piv_tol * sqrt(H.diag().abs().max()) - where H is the current estimate for the negative Hessian of the penalized likelihood. Defaults to np.power(np.finfo(float).eps,0.04).

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0

  • drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.

  • init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None

  • form_VH (bool,optional) – Whether to explicitly form matrix V - the estimated inverse of the negative Hessian of the penalized likelihood - and H - the estimate of said Hessian - when using the qEFS method. If set to False, only V is returned - as a scipy.sparse.linalg.LinearOperator - and available in self.overall_lvi. Additionally, self.hessian will then be equal to None. Note, that this will break default prediction/confidence interval methods - so do not call them. Defaults to True

  • use_grad (bool,optional) – Whether to pass the self.family.gradient() function to the quasi newton optimizer. If set to False, the gradient of the penalized likelihood will be approximated via finite differences. Defaults to False

  • build_mat ([bool], optional) – An (optional) list, containing one bool per mssm.src.python.formula.Formula in self.formulas - indicating whether the corresponding model matrix should be built. Useful if multiple formulas specify the same model matrix, in which case only one needs to be built. Do not make use of this (i.e., pass anything other than None) if you set ``method=’QR/Chol’``, since the rank deficiency handling will break for shared matrices. Defaults to None, which means all model matrices are built.

  • should_keep_drop (bool,optional) – Only used when method='QR/Chol'. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.

  • gamma (float,optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models! Defaults to 1.

  • qEFSH (str,optional) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive definiteness of the approximation or the standard bfgs update (qEFSH='BFGS') . Defaults to ‘SR1’.

  • overwrite_coef (bool,optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS'. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect when qEFS_init_converge=True. Defaults to True.

  • max_restarts (int,optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when method='qEFS'. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more. Defaults to 0.

  • qEFS_init_converge (bool,optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS'. Defaults to True.

  • prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to False.

  • init_bfgs_options (dict,optional) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem. If this is None, it will be set to a copy of bfgs_options. Defaults to None.

  • bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'. If none are provided, the gtol argument will be initialized to conv_tol. Note also, that in any case the maxiter argument is automatically set to max_inner. Defaults to None.

Raises:

ValueError – Will throw an error when optimizer is not one of ‘Newton’, ‘BFGS’, ‘L-BFGS-B’.

get_llk(penalized: bool = True, drop_NA=True)

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:
  • penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

  • drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped, defaults to True

Returns:

llk score

Return type:

float or None

get_reml(drop_NA=True)

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped when computing the log-likelihood, defaults to True

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid()

What qualifies as “residual” will differ vastly between different implementations of this class, so this method simply returns None.