api

mssm.models module

class mssm.models.GAMM(formula: Formula, family: Family)

Bases: GAMMLSS

Class to fit Generalized Additive Mixed Models.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

#### Binomial model example ####
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# By default, the Binomial family assumes binary data and uses the logit link.
# Count data is also possible though - see the `Binomial` family.
model = GAMM(formula,Binomial())
model.fit()

# Plot estimated effects on scale of the log-odds
plot(model)

#### Gaussian model with tensor smooth and p-values ####
sim_dat = sim3(n=500,scale=2,c=0,seed=20)

formula = Formula(lhs("y"),[i(),f(["x0","x3"],te=True,nk=9),f(["x1"]),f(["x2"])],data=sim_dat)
model = GAMM(formula,Gaussian())

model.fit()
model.print_smooth_terms(p_values=True)


#### Standard linear (mixed) models are also possible ####
# *li() with three variables: three-way interaction
sim_dat,_ = sim1(100,random_seed=100)

# Specify formula with three-way linear interaction and random intercept term
formula = Formula(lhs("y"),[i(),*li(["fact","x","time"]),ri("sub")],data=sim_dat)

# ... and model
model = GAMM(formula,Gaussian())

# then fit
model.fit()

# get estimates for linear terms
model.print_parametric_terms()
References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • formula (Formula) – A formula for the GAMM model

  • family (Family) – A distribution implementing the Family class. Currently Gaussian, Gamma, and Binomial are implemented.

Variables:
  • formulas ([Formula]) – A list including the formula passed to the constructor.

  • lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.

  • coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.

  • preds ([[float]]) – The first index corresponds to the linear predictors for the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • mus ([[float]]) – The first index corresponds to the estimated value of the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood used during fitting - will be the expected hessian for non-canonical models. Initialized with None. :ivar float edf: The model estimated degrees of freedom as a float. Initialized with None.

  • edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

  • res (np.ndarray) – The working residuals of the model (If applicable). Initialized with None.

  • Wr (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the root of the Fisher weights at convergence. Initialized with None.

  • WN (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the Newton weights at convergence. Initialized with None.

  • hessian_obs (scp.sparse.csc_array) – Observed hessian of the log-likelihood at final coefficient estimate. Not updated for strictly additive models (i.e., Gaussian with identity link). Initialized with None.

  • rho (float) – Optional auto-correlation at lag 1 parameter used during estimation. Initialized with None.

  • res_ar (np.ndarray) – Holding the working residuals of the model corrected for any auto-correlation parameter used during estimation. Initialized with None.

fit(max_outer: int = 200, max_inner: int = None, conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 2, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', restart: bool = False, method: str = 'QR', check_cond: int = 1, progress_bar: bool = True, n_cores: int = 10, offset: float | ndarray | None = None, rho: float | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see the ‘Big model’ example below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *

########## Big Model ##########
dat = pd.read_csv('https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv')

# mssm requires that the data-type for variables used as factors is 'O'=object
dat = dat.astype({'series': 'O',
                'cond':'O',
                'sub':'O',
                'series':'O'})

formula = Formula(lhs=lhs("y"), # The dependent variable - here y!
                    terms=[i(), # The intercept, a
                            l(["cond"]), # For cond='b'
                            f(["time"],by="cond",constraint=ConstType.QR), # to-way interaction between time and cond; one smooth over time per cond level
                            f(["x"],by="cond",constraint=ConstType.QR), # to-way interaction between x and cond; one smooth over x per cond level
                            f(["time","x"],by="cond",constraint=ConstType.QR,nk=9), # three-way interaction
                            fs(["time"],rf="sub")], # Random non-linear effect of time - one smooth per level of factor sub
                    data=dat,
                    print_warn=False,find_nested=False)

model = GAMM(formula,Gaussian())

# To speed up estimation, use the following key-word arguments:
model.fit(method="Chol",max_inner=1) # max_inner only matters for Generalized models (i.e., non-Gaussian) - but for those will often be much faster

########## ar1 model (without resets per time-series) ##########
formula = Formula(lhs=lhs("y"),
                    terms=[i(),
                            l(["cond"]),
                            f(["time"],by="cond"),
                            f(["x"],by="cond"),
                            f(["time","x"],by="cond")],
                    data=dat,
                    print_warn=False,
                    series_id=None) # No series identifier passed to formula -> ar1 model does not reset!

model = GAMM(formula,Gaussian())

model.fit(rho=0.99)

# Visualize the un-corrected residuals:
plot_val(model,resid_type="Pearson")

# And the corrected residuals:
plot_val(model,resid_type="ar1")

########## ar1 model (with resets per time-series) ##########
formula = Formula(lhs=lhs("y"),
                    terms=[i(),
                            l(["cond"]),
                            f(["time"],by="cond"),
                            f(["x"],by="cond"),
                            f(["time","x"],by="cond")],
                    data=dat,
                    print_warn=False,
                    series_id='series') # 'series' variable identifies individual time-series -> ar1 model resets per series!

model = GAMM(formula,Gaussian())

model.fit(rho=0.99)

# Visualize the un-corrected residuals:
plot_val(model,resid_type="Pearson")

# And the corrected residuals:
plot_val(model,resid_type="ar1")
Parameters:
  • max_outer (int,optional) – The maximum number of fitting iterations. Defaults to 200.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step updating the coefficients for Generalized models. Defaults to 500 for non ar1 models.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.

  • control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.

  • exclude_lambda (bool,optional) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.

  • extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.

  • restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.

  • method (str,optional) – Which method to use to solve for the coefficients. (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but then also pivots for stability in order to get an estimate of rank defficiency. This takes substantially longer. This argument is ignored if len(self.formulas[0].file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to “QR”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). When check_cond=2, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosen method. Is ignored, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to 1.

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • offset (float or np.ndarray,optional) – Mimics the behavior of the offset argument for gam in mgcv in R. If a value is provided here (can either be a float or a numpy.array of shape (-1,1) - if it is an array, then the first dimension has to match the number of observations in the data. NANs present in the dependent variable will be excluded from the offset vector.) then it is consistently added to the linear predictor during estimation. It will not be used by any other function of the GAMM class (e.g., for prediction). This argument is ignored if len(self.formulas[0].file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to None.

  • rho (float,optional) – Optional correlation parameter for an “ar1 residual model”. Essentially mimics the behavior of the rho paramter for the bam function in mgcv. Note, if you want to re-start the ar1 process multiple times (for example because you work with time-series data and have multiple time-series) then you must pass the series.id argument to the Formula used for this model. Defaults to None.

get_llk(penalized: bool = True, ext_scale: float | None = None) float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data. LLK can optionally be evaluated for an external scale parameter ext_scale.

Will instead return None if called before fitting.

Parameters:
  • penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

  • ext_scale (float, optional) – Optionally provide an external scale parameter at which to evaluate the log-likelihood, defaults to None

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

Returns:

llk score

Return type:

float or None

get_mmat(use_terms: list[int] | None = None) csc_array

Returns exaclty the model matrix used for fitting as a scipy.sparse.csc_array. Will throw an error when called for a model for which the model matrix was never former completely - i.e., when \(\mathbf{X}^T\mathbf{X}\) was formed iteratively for estimation, by setting the file_paths argument of the Formula to a non-empty list.

Optionally, all columns not corresponding to terms for which the indices are provided via use_terms can be zeroed.

Parameters:

use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

Raises:
  • ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

  • NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely

Returns:

Model matrix \(\mathbf{X}\) used for fitting.

Return type:

scp.sparse.csc_array

get_pars() tuple[ndarray | None, float | None]

Returns a tuple. The first entry is a np.ndarray with all estimated coefficients. The second entry is the estimated scale parameter.

Will instead return (None,None) if called before fitting.

Returns:

Model coefficients and scale parameter that were estimated

Return type:

(np.ndarray,float) or (None, None)

get_reml() float

Get’s the (Laplace approximate) REML (Restricted Maximum Likelihood) score (as a float) for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

Raises:
  • ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

  • NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

  • TypeError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid(type: str = 'Pearson') ndarray

Get different types of residuals from the estimated model.

By default (type='Pearson') this returns the residuals \(e_i = y_i - \mu_i\) for additive models and the pearson/working residuals \(w_i^{0.5}*(z_i - \eta_i)\) (see Wood, 2017 sections 3.1.5 & 3.1.7) for generalized additive models. Here \(w_i\) are the Fisher scoring weights, \(z_i\) the pseudo-data point for each observation, and \(\eta_i\) is the linear prediction (i.e., \(g(\mu_i)\) - where \(g()\) is the link function) for each observation.

If type= "Deviance", the deviance residuals are returned, which are equivalent to \(sign(y_i - \mu_i)*D_i^{0.5}\), where \(\sum_{i=1,...N} D_i\) equals the model deviance (see Wood 2017, section 3.1.7). Additionally, if the model was estimated with rho!=None, type="ar1" returns the standardized working residuals corrected for lag1 auto-correlation. These are best compared to the standard working residuals.

Throws an error if called before model was fitted, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which model.rho==None.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

type (str,optional) – The type of residual to return for a Generalized model, “Pearson” by default, but can be set to “Deviance” and (for some models) to “ar1” as well.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which model.rho==None.

Returns:

Empirical residual vector in a numpy array

Return type:

np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat.

But only using the terms indexed by use_terms. Importantly, predictions and standard errors are always returned on the scale of the linear predictor. When estimating a Generalized Additive Model, the mean predictions and standard errors (often referred to as the ‘response’-scale predictions) can be obtained by applying the link inverse function to the predictions and the CI-bounds on the linear predictor scale (DO NOT transform the standard error first and then add it to the transformed predictions - only on the scale of the linear predictor is the standard error additive). See examples below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat)

# By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now make prediction for `f["x0"]`
new_dat = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":np.linspace(0,1,30),
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

f0,X_f,ci = model.predict([1],new_dat,ci=True)

# Can also use the plot function from mssmViz
plot(model,which=[1])
References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction.

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets.

Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

# Include tensor smooth in model of log(mean)
formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],data=Gammadat)

# By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now we want to know whether the effect of x0 is different for two values of x1:
new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.25 for _ in range(30)],
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.75 for _ in range(30)],
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

# Now we can get the predicted difference of the effect of x0 for the two values of x1:
pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0)

# mssmViz also has a convenience function to visualize it:
plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

  • get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:
  • dat1 – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.

  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference. The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, not unlike the one returned in R when using the summary function for mgcv models.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False

  • edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where V is \([\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}*/\phi\) (see Wood, 2017; section 6.10). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat)

# By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now get model matrix for a couple of example covariates
new_dat = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":np.linspace(0,1,30),
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

f0,X_f,ci = model.predict([1],new_dat,ci=True)

# Get `use_post` to only identify coefficients related to `f(["x0"])` - that way we can efficiently sample the
# posterior only for `f(["x0"])`. If you want to sample all coefficients, simply set `use_post=None`.
use_post = X_f.sum(axis=0) != 0
use_post = np.arange(0,X_f.shape[1])[use_post]
print(use_post)

# `use_post` can now be passed to `sample_post`:
post = model.sample_post(10000,use_post,deviations=False,seed=0,par=0)

# Since we set deviations to false post has coefficient samples and can simply be post-multiplied to
# get samples of `f(["x0"])` - importantly, post has a different shape than X_f, so we need to account for that
post_f = X_f[:,use_post] @ post

# Note: samples are also on scale of linear predictor!
plt.plot(new_dat["x0"],f0,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2)

plt.show()
References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n_ps (int,optional) – Number of samples to obtain from posterior.

  • use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.

  • deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False

  • seed (int,optional) – A seed to use for the sampling, defaults to None

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves/predictions.

Return type:

np.ndarray

class mssm.models.GAMMLSS(formulas: list[Formula], family: GAMLSSFamily)

Bases: GSMM

Class to fit Generalized Additive Mixed Models of Location Scale and Shape (see Rigby & Stasinopoulos, 2005).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now define the model and fit!
model = GAMMLSS(formulas,family)
model.fit()

# Get total coef vector & split them
coef = model.coef
split_coef = np.split(coef,model.coef_split_idx)

# Get coef associated with the mean
coef_m = split_coef[0]
# and with the scale parameter
coef_s = split_coef[1]

# Similarly, `preds` holds linear predictions for m & s
pred_m = model.preds[0]
pred_s = model.preds[1]

# While `mu` holds the estimated fitted parameters
# (i.e., `preds` after applying the inverse of the link function of each parameter)
mu_m = model.mus[0]
mu_s = model.mus[1]
References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • formulas ([Formula]) – A list of formulas for the GAMMLS model

  • family (GAMLSSFamily) – A GAMLSSFamily. Currently GAUMLSS, MULNOMLSS, and GAMMALS are supported.

Variables:
  • formulas ([Formula]) – The list of formulas passed to the constructor.

  • lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.

  • coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.

  • preds ([[float]]) – The linear predictors for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • mus ([[float]]) – The predicted means for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to hessian - diag*eps if self.info.eps > 0 after fitting). Initialized with None.

  • edf (float) – The model estimated degrees of freedom as a float. Initialized with None.

  • edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. See the examples. Initialized after fitting!

  • overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

  • res (np.ndarray) – The working residuals of the model (If applicable). Initialized with None.

fit(max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int = 2, restart: bool = False, method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), should_keep_drop: bool = True, prefit_grad: bool = True, repara: bool = True, progress_bar: bool = True, n_cores: int = 10, seed: int = 0, init_lambda: list[float] | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster estimation set method='Chol'.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now define the model and fit!
model = GAMMLSS(formulas,family)
model.fit()

# Now fit again via Cholesky
model.fit(method="Chol")
Parameters:
  • max_outer (int,optional) – The maximum number of fitting iterations.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.

  • min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to max_inner.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models involving heavily penalized functions. Disabled by default.

  • extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.

  • control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.

  • restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. Defaults to “QR/Chol”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.

  • piv_tol (float,optional) – Deprecated.

  • should_keep_drop (bool,optional) – Only used when method in ["QR/Chol","LU/Chol","Direct/Chol"]. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.

  • prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.

  • repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True.

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0

  • init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None

get_llk(penalized: bool = True) float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:

penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

Returns:

llk score

Return type:

float or None

get_mmat(use_terms: list[int] | None = None, par: int | None = None) list[csc_array] | csc_array

Returns a list containing exaclty the model matrices used for fitting as a scipy.sparse.csc_array. Will raise an error when fitting was not completed before calling this function.

Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting par to the desired index, instead of None. Additionally, all columns not corresponding to terms for which the indices are provided via use_terms can optionally be zeroed.

Parameters:
  • use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

  • par (int or None, optional) – The index corresponding to the parameter of the distribution for which to obtain the model matrix. Setting this to None means all matrices are returned in a list, defaults to None.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Model matrices \(\mathbf{X}\) used for fitting - one per parameter of self.family or a single model matrix for a specific parameter.

Return type:

[scp.sparse.csc_array] or scp.sparse.csc_array

get_pars() ndarray

Returns a list containing all coefficients estimated for the model. Use self.coef_split_idx to split the vector into separate subsets per distribution parameter.

Will return None if called before fitting was completed.

Returns:

Model coefficients - before splitting!

Return type:

[float] or None

get_reml() float

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid(**kwargs) ndarray

Returns standarized residuals for GAMMLSS models (Rigby & Stasinopoulos, 2005).

The computation of the residual vector will differ between different GAMMLSS models and is thus implemented as a method by each GAMMLSS family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking \(N\) independent samples from \(N(0,1)\).

Additional arguments required by the specific GAMLSSFamily.get_resid() method can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:
  • NotImplementedError – An error is raised in case the residuals are to be computed for a Multinomial GAMMLSS model, which is currently not supported.

  • ValueError – An error is raised in case the residuals are requested before the model has been fit.

Returns:

A np.ndarray of standardized residuals that should be \(\sim N(0,1)\) if the model is correct.

Returns:

Standardized residual vector as array of shape (-1,1)

Return type:

np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat using only the terms indexed by use_terms and for distribution parameter par.

Importantly, predictions and standard errors are always returned on the scale of the linear predictor. For the Gaussian GAMMLSS model, the predictions for the standard deviation will for example usually (i.e., for the default link choices) reflect the log of the standard deviation. To get the predictions on the standard deviation scale, one could then apply the inverse log-link function to the predictions and the CI-bounds on the scale of the respective linear predictor. See the examples below.

Examples:

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)})

# Mean predictions don't have to be transformed since the Identity link is used for this predictor.
mu_mean,_,b_mean = model.predict(None,new_dat,ci=True)

# These can be used for confidence intervals:
mean_upper_CI = mu_mean + b_mean
mean_lower_CI = mu_mean - b_mean

# Standard deviation predictions do have to be transformed - by default they are on the log-scale.
eta_sd,_,b_sd = model.predict(None,new_dat,ci=True,par=1)
mu_sd = model.family.links[1].fi(eta_sd) # Index to `links` is 1 because the sd is the second parameter!

# These can be used for approximate confidence intervals:
sd_upper_CI = model.family.links[1].fi(eta_sd + b_sd)
sd_lower_CI = model.family.links[1].fi(eta_sd - b_sd)
References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or None in which case all terms will be used.

  • n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0

Raises:

ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction.

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets and for distribution parameter par. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim9(500,1,seed=20)

# We include a tensor smooth in the model of the mean
formula_m = Formula(lhs("y"),
                    [i(),f(["x0","x1"],te=True)],
                    data=GAUMLSSDat)

# The model of the standard deviation remains the same
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"])],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

# Now we want to know whether the effect of x0 is different for two values of x1:
new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.25 for _ in range(30)]})

new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.75 for _ in range(30)]})

# Now we can get the predicted difference of the effect of x0 for the two values of x1:
pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0)

# mssmViz also has a convenience function to visualize it:
plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)
References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

  • get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:
  • dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.

  • use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0

Raises:

ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference. The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False

  • edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the distribution (see argument par), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)})

# Now obtain the estimate for `f(["x0"],nk=10)` and the model matrix corresponding to it!
# Note, that we set `use_terms = [1]` - so all columns in X_f not belonging to `f(["x0"],nk=10)`
# (e.g., the first one, belonging to the offset) are zeroed.
mu_f,X_f,_ = model.predict([1],new_dat,ci=True)

# Now we can sample from the posterior of `f(["x0"],nk=10)` in the model of the mean:
post = model.sample_post(10000,None,deviations=False,seed=0,par=0)

# Since we set deviations to false post has coefficient samples and can simply be post-multiplied to
# get samples of `f(["x0"],nk=10)` 
post_f = X_f @ post

# Plot the estimated effect and 50 posterior samples
plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2)

plt.show()

# In this case, we are not interested in the offset, so we can omit it during the sampling step (i.e., to not sample coefficients
# for it):

# `use_post` identifies only coefficients related to `f(["x0"],nk=10)`
use_post = X_f.sum(axis=0) != 0
use_post = np.arange(0,X_f.shape[1])[use_post]
print(use_post)

# `use_post` can now be passed to `sample_post`:
post2 = model.sample_post(10000,use_post,deviations=False,seed=0,par=0)

# Importantly, post2 now has a different shape - which we have to take into account when multiplying.
post_f2 = X_f[:,use_post] @ post2

plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f2[:,sidx],alpha=0.2)

plt.show()
References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n_ps (int,optional) – Number of samples to obtain from posterior.

  • use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter par, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.

  • deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False

  • seed (int,optional) – A seed to use for the sampling, defaults to None

  • par (int) – The index corresponding to the distribution parameter for which to make the prediction (e.g., 0 = mean)

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves.

Return type:

np.ndarray

class mssm.models.GSMM(formulas: list[Formula], family: GSMMFamily)

Bases: object

Class to fit General Smooth/Mixed Models (see Wood, Pya, & Säfken; 2016). Estimation is possible via exact Newton method for coefficients of via L-qEFS update (see Krause et al., (submitted) and example below).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

class NUMDIFFGENSMOOTHFamily(GSMMFamily):
    # Implementation of the ``GSMMFamily`` class that uses finite differencing to obtain the
    # gradient of the likelihood to estimate a Gaussian GAMLSS via the general smooth code and
    # the L-qEFS update by Krause et al. (in preparation).

    # References:
    #    - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
    #    - Nocedal & Wright (2006). Numerical Optimization. Springer New York.


    def __init__(self, pars: int, links:[Link]) -> None:
        super().__init__(pars, links)

    def llk(self, coef, coef_split_idx, ys, Xs):
        # Likelihood for a Gaussian GAM(LSS) - implemented so
        # that the model can be estimated using the general smooth code.
        y = ys[0]
        split_coef = np.split(coef,coef_split_idx)
        eta_mu = Xs[0]@split_coef[0]
        eta_sd = Xs[1]@split_coef[1]

        mu_mu = self.links[0].fi(eta_mu)
        mu_sd = self.links[1].fi(eta_sd)

        family = GAUMLSS(self.links)
        llk = family.llk(y,mu_mu,mu_sd)
        return llk

# Simulate 500 data points
sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False)

# We need to model the mean: \mu_i
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                    data=sim_dat)

# And for sd - here constant
formula_sd = Formula(lhs("y"),
                    [i()],
                    data=sim_dat)

# Collect both formulas
formulas = [formula_m,formula_sd]
links = [Identity(),LOGb(-0.001)]

# Now define the general family + model and fit!
gsmm_fam = NUMDIFFGENSMOOTHFamily(2,links)
model = GSMM(formulas=formulas,family=gsmm_fam)

# Fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(method='qEFS',bfgs_options=bfgs_opt)

# Extract all coef
coef = model.coef

# Now split them to get separate lists per parameter of the log-likelihood (here mean and scale)
# split_coef[0] then holds the coef associated with the first parameter (here the mean) and so on
split_coef = np.split(coef,model.coef_split_idx)
References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • formulas ([Formula]) – A list of formulas, one per parameter of the likelihood that is to be modeled as a smooth model

  • family (GSMMFamily) – A GSMMFamily family.

Variables:
  • formulas ([Formula]) – The list of formulas passed to the constructor.

  • lvi (scp.sparse.csc_array | None) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix - or None, in case the L-BFGS-B optimizer was used and form_VH was set to False when calling model.fit(). Initialized with None.

  • lvi_linop (scp.sparse.linalg.LinearOperator) – A scipy.sparse.linalg.LinearOperator of the conditional model coefficient covariance matrix (not the root) - or None. Only available in case the L-BFGS-B optimizer was used and form_VH was set to False when calling model.fit().

  • coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.

  • preds ([[float]]) – The linear predictors for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • mus ([[float]]) – The predicted means for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.

  • hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to hessian - diag*eps if self.info.eps > 0 after fitting). Initialized with None.

  • edf (float) – The model estimated degrees of freedom as a float. Initialized with None.

  • edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.

  • term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.

  • penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.

  • coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. See the examples. Initialized after fitting!

  • overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.

  • info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

fit(init_coef: ndarray | None = None, max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int | None = None, restart: bool = False, optimizer: str = 'Newton', method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), progress_bar: bool = True, n_cores: int = 10, seed: int = 0, drop_NA: bool = True, init_lambda: list[float] | None = None, form_VH: bool = True, use_grad: bool = False, build_mat: list[bool] | None = None, should_keep_drop: bool = True, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = False, prefit_grad: bool = True, repara: bool = None, init_bfgs_options: dict | None = None, bfgs_options: dict | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see examples below.

Parameters:
  • init_coef (np.ndarray,optional) – An initial estimate for the coefficients. Must be a numpy array of shape (-1,1). Defaults to None.

  • max_outer (int,optional) – The maximum number of fitting iterations.

  • max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.

  • min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to max_inner.

  • conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.

  • extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models with heavily penalized functions. Disabled by default.

  • extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.

  • control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed. Set to 2 by default if method != 'qEFS' and otherwise to 1.

  • restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.

  • optimizer (str,optional) – Deprecated. Defaults to “Newton”

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “QR/Chol”.

  • check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.

  • piv_tol (float,optional) – Deprecated.

  • progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.

  • n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.

  • seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0

  • drop_NA (bool,optional) – Whether to drop rows in the model matrices and observations vectors corresponding to NAs in the observation vectors. Set this to False if you want to handle NAs yourself in the likelihood function. Defaults to True.

  • init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None

  • form_VH (bool,optional) – Whether to explicitly form matrix V - the estimated inverse of the negative Hessian of the penalized likelihood - and H - the estimate of the Hessian of the log-likelihood - when using the qEFS method. If set to False, only V is returned - as a scipy.sparse.linalg.LinearOperator - and available in self.lvi. Additionally, self.hessian will then be equal to None. Note, that this will break default prediction/confidence interval methods - so do not call them. Defaults to True

  • use_grad (bool,optional) – Deprecated.

  • build_mat ([bool], optional) – An (optional) list, containing one bool per mssm.src.python.formula.Formula in self.formulas - indicating whether the corresponding model matrix should be built. Useful if multiple formulas specify the same model matrix, in which case only one needs to be built. Only the matrices actually built are then passed down to the likelihood/gradient/hessian function in Xs. Defaults to None, which means all model matrices are built.

  • should_keep_drop (bool,optional) – Only used when method in ["QR/Chol","LU/Chol","Direct/Chol"]. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.

  • gamma (float,optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models! Defaults to 1.

  • qEFSH (str,optional) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS') . Defaults to ‘SR1’.

  • overwrite_coef (bool,optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS'. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect when qEFS_init_converge=True. Defaults to True.

  • max_restarts (int,optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when method='qEFS'. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more. Defaults to 0.

  • qEFS_init_converge (bool,optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS'. Defaults to False.

  • prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.

  • repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True if method != 'qEFS' else False.

  • init_bfgs_options (dict,optional) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem. If this is None, it will be set to a copy of bfgs_options. Only has an effect when qEFS_init_converge=True. Defaults to None.

  • bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'. If none are provided, the gtol argument will be initialized to conv_tol. Note also, that in any case the maxiter argument is automatically set to max_inner. Defaults to None.

Raises:

ValueError – Will throw an error when optimizer is not ‘Newton’.

get_llk(penalized: bool = True, drop_NA: bool = True) float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:
  • penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True

  • drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped, defaults to True

Returns:

llk score

Return type:

float or None

get_mmat(use_terms: list[int] | None = None, drop_NA: bool = True, par: int | None = None) list[csc_array] | csc_array

By default, returns a list containing exactly the model matrices used for fitting as a scipy.sparse.csc_array. Will raise an error when fitting was not completed before calling this function.

Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting par to the desired index, instead of None. Additionally, all columns not corresponding to terms for which the indices are provided via use_terms are zeroed in case use_terms is not None.

Parameters:
  • use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

  • drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True

  • par (int or None, optional) – The index corresponding to the parameter of the log-likelihood for which to obtain the model matrix. Setting this to None means all matrices are returned in a list, defaults to None.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Model matrices \(\mathbf{X}\) used for fitting - one per parameter of self.family or a single model matrix for a specific parameter.

Return type:

[scp.sparse.csc_array] or scp.sparse.csc_array

get_pars() ndarray

Returns a list containing all coefficients estimated for the model. Use self.coef_split_idx to split the vector into separate subsets per parameter of the log-likelihood.

Will return None if called before fitting was completed.

Returns:

Model coefficients - before splitting!

Return type:

[float] or None

get_reml(drop_NA: bool = True) float

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped when computing the log-likelihood, defaults to True

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid(drop_NA: bool = True, **kwargs) ndarray

The computation of the residual vector will differ between different GSMM models and is thus implemented as a method by each GSMMFamily family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking independent samples from \(N(0,1)\).

Additional arguments required by the specific GSMMFamily.get_resid() method can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped from the model matrices, defaults to True

Raises:

ValueError – An error is raised in case the residuals are requested before the model has been fit.

Returns:

vector of standardized residuals of shape (-1,1). Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation).

Return type:

np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat using only the terms indexed by use_terms and for parameter par of the log-likelihood.

Importantly, predictions and standard errors are always returned on the scale of the linear predictor.

See the GAMMLSS.predict() function for code examples.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0

Raises:

ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction.

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets and for parameter par of the log-likelihood. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

See the GAMMLSS.predict_diff() function for code examples.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

  • get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:
  • dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.

  • dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.

  • use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).

  • whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.

  • n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

  • par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0

Raises:

ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference. The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False

  • edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the log-likelihood (see argument par), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}_m\), set deviations to false.

see sample_MVN() for more details and the GAMMLSS.sample_post() function for code examples.

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n_ps (int,optional) – Number of samples to obtain from posterior.

  • use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter par, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.

  • deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False

  • seed (int,optional) – A seed to use for the sampling, defaults to None

  • par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. If use_post is None, len(use_post) will match the number of coefficients associated with parameter par of the log-likelihood instead. Can simply be post-multiplied with (the subset of columns indicated by use_post of) the model matrix \(\mathbf{X}^m\) associated with the parameter \(m\) of the log-likelihood to generate posterior sample curves.

Return type:

np.ndarray

mssm.src.python.compact_rep module

mssm.src.python.compact_rep.computeH(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the quasi-Newton approximation to the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)) from the L-BFGS-B optimizer info.

Relies on equations 2.16 in Byrd, Nocdeal & Schnabel (1992).

References:
  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).

  • H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).

  • explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of four update matrices.

Returns:

H, either as np.ndarray (explicit=='True') or represented implicitly via four update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeHSR1(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, omega: float = 1, make_psd: bool = False, make_pd: bool = False, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]

Computes, (explicitly or implicitly) the symmetric rank one (SR1) approximation of the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)).

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the make_psd and make_pd arguments.

References:
  • Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7

  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).

  • H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).

  • omega (float, optional) – Multiple of the identity matrix used as initial estimate.

  • make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.

  • make_pd (bool, optional) – Whether to enforce numeric positive definiteness, not just PSD. Ignored if make_psd=False. By default set to False.

  • explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

H, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeV(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the quasi-Newton approximation to the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)) from the L-BFGS-B optimizer info.

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992).

References:
  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).

  • V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).

  • explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

V, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeVSR1(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, omega: float = 1, make_psd: bool = False, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the symmetric rank one (SR1) approximation of the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)).

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the make_psd argument.

References:
  • Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7

  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).

  • rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).

  • V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).

  • omega (float, optional) – Multiple of the identity matrix used as initial estimate.

  • make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.

  • explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

V, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compare module

mssm.src.python.compare.compare_CDL(model1: GAMM | GAMMLSS | GSMM, model2: GAMM | GAMMLSS | GSMM, correct_V: bool = True, correct_t1: bool | None = None, perform_GLRT: bool = False, nR: int = 250, n_c: int = 1, alpha: int = 0.05, grid: str | None = None, a: float = 1e-07, b: float = 10000000.0, df: int = 40, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', seed: int | None = None, only_expected_edf: bool | None = None, Vp_fidiff: bool = False, use_importance_weights: bool | None = None, prior: Callable | None = None, recompute_H: bool | None = None, compute_Vcc: bool | None = None, bfgs_options: dict = {}) dict

Computes the AIC difference and (optionally) performs an approximate GLRT on twice the difference in unpenalized likelihood between models model1 and model2 (see Wood et al., 2016).

For the GLRT to be appropriate model1 should be set to the model containing more effects and model2 should be a nested, simpler, variant of model1. For the degrees of freedom for the test, the expected degrees of freedom (EDF) of each model are used (i.e., this is the conditional test discussed in Wood (2017: 6.12.4)). The difference between the models in EDF serves as DoF for computing the Chi-Square statistic. In addition, correct_t1 should be set to True, when computing the GLRT.

To get the AIC for each model, 2*edf is added to twice the negative (conditional) likelihood (see Wood et al., 2016).

By default (correct_V=True), mssm will attempt to correct the edf for uncertainty in the estimated \(\lambda\) parameters. Which correction is computed depends on the choice for the grid argument. Approximately the analytic solution for the correction proposed by Wood, Pya, & Säfken (2016) is computed when grid='JJJ1' (the default) - which is exact for strictly Gaussian and some canonical Generalized additive models. This is too costly for very large sparse multi-level models and not exact for more generic models. The MC based alternative available via grid = 'JJJ2' addresses the first problem (Important, set: use_importance_weights=False and only_expected_edf=True.). The second MC based alternative available via grid_type = 'JJJ3' is most appropriate for more generic models (The prior argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set: use_importance_weights=True and only_expected_edf=False). For more details consult the mssm.src.python.utils.correct_VB() function, the examples below, and Krause et al. (submitted).

In case any of those correction strategies is too expensive, it might be better to rely on hypothesis tests for individual smooths, confidence intervals, and penalty-based selection approaches instead (see Marra & Wood, 2011 for details on the latter).

In case correct_t1=True the EDF will be set to the (smoothness uncertainty corrected in case correct_V=True) smoothness bias corrected exprected degrees of freedom (t1 in section 6.1.2 of Wood, 2017), for the GLRT (based on recomendation given in section 6.12.4 in Wood, 2017). The AIC (Wood, 2017) of both models will still be based on the regular (smoothness uncertainty corrected) edf.

The computation here is different to the one performed by the compareML function in the R-package itsadug - which rather performs a version of the marginal GLRT (also discussed in Wood, 2017: 6.12.4) - and more similar to the anova.gam implementation provided by mgcv (particularly if grid='JJJ1'). The returned p-value is approximate - very **very** much so if ``correct_V=False (this should really never be done). Also, the GLRT should not be used to compare models differing in their random effect structures - the AIC is more appropriate for this (see Wood, 2017: 6.12.4).

Examples:

### Model comparison and smoothness uncertainty correction for strictly additive model

# Simulate some data
sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),f(["x0"],nk=20,rp=1),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model = GAMM(sim_fit_formula,Gaussian())
sim_fit_model.fit()

sim_fit_formula2 = Formula(lhs("y"),
                            [i(),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model2 = GAMM(sim_fit_formula2,Gaussian())
sim_fit_model2.fit()


# And perform a smoothness uncertainty corrected comparisons
cor_result1 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22)

# To perform a GLRT and correct the edf for smoothness bias as well (e.g., Wood, 2017) run:
cor_result2 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22,perform_GLRT=True,correct_t1=True)

### Model comparison and smoothness uncertainty correction for very large strictly additive model

# If the models are quite large (many coefficients) the following (this is the first MC strategy discussed in
# section 5.2 of Krause et al. (submitted)) can be much faster:
nR = 250 # Number of samples to use for the numeric integration
cor_result3 = compare_CDL(sim_fit_model,sim_fit_model2,nR=nR,n_c=10,correct_t1=False,grid='JJJ2',
                          seed=22,only_expected_edf=True,use_importance_weights=False)

### Model comparison and smoothness uncertainty correction for more generic smooth model (GAMM, GAMMLSS, etc.)
# We can still rely on grid='JJJ1' (which is why it is the default) but this will be approximate.
# See section 5.1 in the manuscript by Krause et al. (submitted) for justification or section 3.4.3 in the book
# by Wood (2017)). An alternative is the second MC strategy discussed in section 5.3 of Krause et al. (submitted).
# The code below shows how to get mssm to rely on this strategy:

# Simulate some data
sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gamma(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),f(["x0"],nk=20,rp=1),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_formula_sd = Formula(lhs("y"),
                            [i()],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model = GAMMLSS([sim_fit_formula,copy.deepcopy(sim_fit_formula_sd)],family = GAMMALS([LOG(),LOGb(-0.01)]))
sim_fit_model.fit()

sim_fit_formula2 = Formula(lhs("y"),
                            [i(),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model2 = GAMMLSS([sim_fit_formula2,copy.deepcopy(sim_fit_formula_sd)],family = GAMMALS([LOG(),LOGb(-0.01)]))
sim_fit_model2.fit()

# Set up a uniform prior from log(1e-7) to log(1e12) for each regularization parameter
prior = DummyRhoPrior(b=np.log(1e12))

# Now correct for uncertainty in regularization parameters using the second MC strategy discussed by Krause et al. (submitted):
# You can also set prior to ``None`` in which case the proposal distribution (by default a T-distribution with 40 degrees of freedom) is used as prior.
cor_result_gs_1 = compare_CDL(sim_fit_model,sim_fit_model2,n_c=10,grid='JJJ3',seed=22,only_expected_edf=False,use_importance_weights=True,prior=prior,recompute_H=True)
References:
Parameters:
  • model1 (GAMM | GAMMLSS | GSMM) – GAMM, GAMMLSS, or GSMM 1.

  • model2 (GAMM | GAMMLSS | GSMM) – GAMM, GAMMLSS, or GSMM 2.

  • correct_V (bool, optional) – Whether or not to correct for smoothness uncertainty. Defaults to True

  • correct_t1 (bool | None, optional) – Whether or not to also correct the smoothness bias corrected edf for smoothness uncertainty. Defaults to None - meaning that mssm will select an appropriate value.

  • perform_GLRT (bool, optional) – Whether to perform both a GLRT and to compute the AIC or to only compute the AIC. Defaults to True.

  • nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250

  • n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 1

  • alpha (float, optional) – alpha level of the GLRT. Defaults to 0.05

  • grid (str | None, optional) – How to compute the smoothness uncertainty correction, defaults to None - meaning that mssm will select an appropriate value.

  • a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate

  • b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate

  • df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample/propose the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40

  • verbose (bool, optional) – Whether to print progress information or not, defaults to False

  • drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.

  • seed (int,optional) – Seed to use for random parts of the correction. Defaults to None

  • only_expected_edf (bool|None, optional) – Whether to compute edf. by explicitly forming covariance matrix (only_expected_edf=False) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense when grid_type!='JJJ1'. Defaults to None - meaning that mssm will select an appropriate value.

  • Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}_{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)

  • use_importance_weights (bool | None,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}_{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to None - meaning that mssm will select an appropriate value.

  • prior (any, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None

  • recompute_H (bool | None, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to None - meaning that mssm will select an appropriate value.

  • compute_Vcc (bool | None, optional) – Whether to compute the second correction term when grid=’JJJ1’ (or when computing the lower-bound for the remaining grids) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to None - meaning that mssm will select an appropriate value.

  • bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'. If none are provided, the gtol argument will be initialized to conv_tol. Note also, that in any case the maxiter argument is automatically set to max_inner. Defaults to None.

Raises:
  • ValueError – If both models are from different families.

  • ValueError – If perform_GLRT=True and model1 has fewer coef than model2 - i.e., model1 has to be the notationally more complex one.

Returns:

A dictionary with outcomes of all tests. Key H1 will be a bool indicating whether Null hypothesis was rejected or not, p will be the p-value, test_stat will be the test statistic used, Res. DOF will be the degrees of freedom used by the test, aic1 and aic2 will be the aic scores for both models.

Return type:

dict

mssm.src.python.custom_types module

class mssm.src.python.custom_types.ConstType(*values)

Bases: Enum

Custom Constraint data type used by internal functions.

DIFF = 3
DROP = 1
QR = 2
class mssm.src.python.custom_types.Constraint(Z: ndarray | int | None = None, type: ConstType | None = None)

Bases: object

Constraint storage. Z, either holds the Qr-based correction matrix that needs to be multiplied with \(\mathbf{X}\), \(\mathbf{S}\), and \(\mathbf{D}\) (where \(\mathbf{D}\mathbf{D}^T = \mathbf{S}\)) to make terms subject to the conventional sum-to-zero constraints applied also in mgcv (Wood, 2017), the column/row that should be dropped from those - then \(\mathbf{X}\) can also no longer take on a constant, or None indicating that the model should be “difference re-coded” to enable sparse sum-to-zero constraints. The latter two are available in mgcv’s smoothCon function by setting the sparse.cons argument to 1 or 2 respectively.

The QR-based approach is described in detail by Wood (2017) and is similar to just mean centering every basis function involved in the smooth and then dropping one column from the corresponding centered model matrix. The column-dropping approach is self-explanatory. The difference re-coding re-codes bases functions to correspond to differences of bases functions. The resulting basis remains sparser than the alternatives, but this is not a true centering constraint: \(f(x)\) will not necessarily be orthogonal to the intercept, i.e., \(\mathbf{1}^T \mathbf{f(x)}\) will not necessarily be 0. Hence, confidence intervals will usually be wider when using ConstType.DIFF (also when using ConstType.DROP, for the same reason) instead of ConstType.QR (see Wood; 2017,2020)!

A final note regards the use of tensor smooths when te==False. Since the value of any constant estimated for a smooth depends on the type of constraint used, the marginal functions estimated for the “main effects” (\(f(x)\), \(f(z)\)) and “interaction effect” (\(f(x,z)\)) in a model: \(y = a + f(x) + f(z) + f(x,z)\) will differ depending on the type of constraint used. The “Anova-like” decomposition described in detail in Wood (2017) is achievable only when using ConstType.QR.

Thus, ConstType.QR is the default by all mssm functions, and the other two options should be considered experimental.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N. (2020). Inference and computation with generalized additive models and their extensions. TEST, 29(2), 307–339. https://doi.org/10.1007/s11749-020-00711-5

  • Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655

Z: ndarray | int | None = None
type: ConstType | None = None
class mssm.src.python.custom_types.Fit_info(lambda_updates: int = 0, iter: int = 0, code: int = 1, eps: float | None = None, K2: float | None = None, dropped: list[int] | None = None)

Bases: object

Holds information related to convergence (speed) for GAMMs, GAMMLSS, and GSMMs.

Variables:
  • lambda_updates (int) – The total number of lambda updates computed during estimation. Initialized with 0.

  • iter (int) – The number of outer iterations (a single outer iteration can involve multiple lambda updates) completed during estimation. Initialized with 0.

  • code (int) – Convergence status. Anything above 0 indicates that the model did not converge and estimates should be considered carefully. Initialized with 1.

  • eps (float) – The fraction added to the last estimate of the negative Hessian of the penalized likelihood during GAMMLSS or GSMM estimation. If this is not 0 - the model should not be considered as converged, irrespective of what code indicates. This most likely implies that the model is not identifiable. Initialized with None and ignored for GAMM estimation.

  • K2 (float) – An estimate for the condition number of matrix A, where A.T@A=H and H is the final estimate of the negative Hessian of the penalized likelihood. Only available if check_cond>0 when model.fit() is called for any model (i.e., GAMM, GAMMLSS, GSMM). Initialized with None.

  • dropped ([int]) – The final set of coefficients dropped during GAMMLSS/GSMM estimation when using method in ["QR/Chol","LU/Chol","Direct/Chol"] or None in which case no coefficients were dropped. Initialized with 0.

K2: float | None = None
code: int = 1
dropped: list[int] | None = None
eps: float | None = None
iter: int = 0
lambda_updates: int = 0
class mssm.src.python.custom_types.LambdaTerm(S_J: csc_array | None = None, S_J_emb: csc_array | None = None, D_J_emb: csc_array | None = None, rep_sj: int = 1, lam: float = 1.1, start_index: int | None = None, frozen: bool = False, type: PenType | None = None, rank: int | None = None, term: int | None = None, clust_series: list[int] | None = None, clust_weights: list[list[float]] | None = None, dist_param: int = 0, rp_idx: int | None = None, S_J_lam: csc_array | None = None)

Bases: object

\(\lambda\) storage term.

Usually model.overall_penalties holds a list of these.

Variables:
  • S_J (scp.sparse.csc_array) – The penalty matrix associated with this lambda term. Note, in case multiple penalty matrices share the same lambda value, the rep_sj argument determines how many diagonal blocks we need to fill with this penalty matrix to get S_J_emb. Initialized with None.

  • S_J_emb (scp.sparse.csc_array) – A zero-embedded version of the penalty matrix associated with this lambda term. Note, this matrix contains rep_sj diagonal sub-blocks each filled with S_J. Initialized with None.

  • D_J_emb (scp.sparse.csc_array) – Root of S_J_emb, so that D_J_emb@D_J_emb.T=S_J_emb. Initialized with None.

  • rep_sj (int) – How many sequential sub-blocks of S_J_emb need to be filled with S_J. Useful if all levels of a categorical variable for which a separate smooth is to be estimated are assumed to share the same lambda value. Initialized with 1.

  • lam (float) – The current estimate for \(\lambda\). Initialized with 1.1.

  • start_index (int) – The first row and column in the overall penalty matrix taken up by S_J. Initialized with None.

  • type (PenType) – The type of this penalty term. Initialized with None.

  • rank (int) – The rank of S_J. Initialized with None.

  • term (int) – The index of the term in a mssm.src.python.formula.Formula with which this penalty is associated. Initialized with None.

D_J_emb: csc_array | None = None
S_J: csc_array | None = None
S_J_emb: csc_array | None = None
S_J_lam: csc_array | None = None
clust_series: list[int] | None = None
clust_weights: list[list[float]] | None = None
dist_param: int = 0
frozen: bool = False
lam: float = 1.1
rank: int | None = None
rep_sj: int = 1
rp_idx: int | None = None
start_index: int | None = None
term: int | None = None
type: PenType | None = None
class mssm.src.python.custom_types.PenType(*values)

Bases: Enum

Custom Penalty data type used by internal functions.

COEFFICIENTS = 7
CUSTOM = 8
DERIVATIVE = 6
DIFFERENCE = 2
DISTANCE = 3
IDENTITY = 1
NULL = 5
REPARAM1 = 4
class mssm.src.python.custom_types.Reparameterization(Srp: csc_array | None = None, Drp: csc_array | None = None, C: csc_array | None = None, scale: float | None = None, IRrp: csc_array | None = None, rms1: float | None = None, rms2: float | None = None, rank: int | None = None)

Bases: object

Holds information necessary to re-parameterize a smooth term.

Variables:
  • Srp (scp.sparse.csc_array) – The transformed penalty matrix

  • Drp (scp.sparse.csc_array) – The root of the transformed penalty matrix

  • C (scp.sparse.csc_array) – Transformation matrix for model matrix and/or penalty.

C: csc_array | None = None
Drp: csc_array | None = None
IRrp: csc_array | None = None
Srp: csc_array | None = None
rank: int | None = None
rms1: float | None = None
rms2: float | None = None
scale: float | None = None
class mssm.src.python.custom_types.TermType(*values)

Bases: Enum

Custom Term data type used by internal functions.

IRSMOOTH = 1
LINEAR = 3
RANDINT = 4
RANDSLOPE = 5
SMOOTH = 2
class mssm.src.python.custom_types.VarType(*values)

Bases: Enum

Custom variable data type used by internal functions.

FACTOR = 2
NUMERIC = 1

mssm.src.python.exp_fam module

class mssm.src.python.exp_fam.Binomial(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Logit object>, n: int | list[int] = 1)

Bases: Family

Binomial family. For this implementation we assume that we have collected proportions of success, i.e., the dependent variables specified in the model Formula needs to hold observed proportions and not counts! If we assume that each observation \(y_i\) reflects a single independent draw from a binomial, (with \(n=1\), and \(p_i\) being the probability that the result is 1) then the dependent variable should either hold 1 or 0. If we have multiple independent draws from the binomial per observation (i.e., row in our data-frame), then \(n\) will usually differ between observations/rows in our data-frame (i.e., we observe \(k_i\) counts of success out of \(n_i\) draws - so that \(y_i=k_i/n_i\)). In that case, the Binomial() family accepts a vector for argument \(\mathbf{n}\) (which is simply set to 1 by default, assuming binary data), containing \(n_i\) for every observation \(y_i\).

In this implementation, the scale parameter is kept fixed/known at 1.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical logit link.

  • n (int or [int], optional) – Number of independent draws from a Binomial per observation/row of data-frame. For binary data this can simply be set to 1, which is the default.

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the model deviance

Return type:

np.ndarray

V(mu: ndarray) ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2) for the Binomial model. Variance is minimal for \(\mu=1\) and \(\mu=0\), maximal for \(\mu=0.5\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

Deviance of the model

Return type:

float

init_mu(y: ndarray) ndarray

Function providing initial \(\boldsymbol{\mu}\) vector for GAMM.

Estimation assumes proportions as dep. variable. According to: https://stackoverflow.com/questions/60526586/ the glm() function in R always initializes \(\mu\) = 0.75 for observed proportions (i.e., elements in \(\mathbf{y}\)) of 1 and \(\mu\) = 0.25 for proportions of zero. This can be achieved by adding 0.5 to the observed proportion of success (and adding one observation).

Parameters:

y (np.ndarray) – A numpy array containing each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing an initial estimate of the probability of success per observation

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

log-likelihood of the model

Return type:

float

lp(y: ndarray, mu: ndarray) ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective binomial with mean = \(\boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed proportion.

  • mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Family(link: Link, twopar: bool, scale: float = None)

Bases: object

Base class to be implemented by Exp. family member.

Parameters:
  • link (Link) – The link function to be used by the model of the mean of this family.

  • twopar (bool) – Whether the family has two parameters (mean,scale) to be estimated (i.e., whether the likelihood is a function of two parameters), or only a single one (usually the mean).

  • scale (float or None, optional) – Known/fixed scale parameter for this family. Setting this to None means the parameter has to be estimated. Must be set to 1 if the family has no scale parameter (i.e., when twopar = False)

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

  • mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the overall deviance.

Return type:

np.ndarray

V(mu: ndarray) ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2). Different exponential families allow for different relationships between the variance in our random response variable and the mean of it. For the normal model this is assumed to be constant.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

  • mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

Deviance of the model under this family

Return type:

float

init_mu(y: ndarray) ndarray | None

Convenience function to compute an initial \(\boldsymbol{\mu}\) estimate passed to the GAMM/PIRLS estimation routine.

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing an initial estimate of the mean

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, **kwargs) float

log-probability of \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\). Essentially sum over all elements in the vector returned by the lp() method.

Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e., two_par=True) must pass as key-word argument a scale parameter with a default value, e.g.,:

def llk(self, mu, scale=1):
   ...

You can check the implementation of the Gaussian Family for an example.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

  • mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

log-likelihood of the model under this family

Return type:

float

lp(y: ndarray, mu: ndarray, **kwargs) ndarray

Log-probability of observing every value in \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\).

Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e., two_par=True) must pass as key-word argument a scale parameter with a default value, e.g.,:

def lp(self, mu, scale=1):
   ...

You can check the implementation of the Gaussian Family for an example.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

  • mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAMLSSFamily(pars: int, links: list[Link])

Bases: object

Base-class to be implemented by families of Generalized Additive Mixed Models of Location, Scale, and Shape (GAMMLSS; Rigby & Stasinopoulos, 2005).

Apart from the required methods, three mandatory attributes need to be defined by the __init__() constructor of implementations of this class. These are required to evaluate the first and second (pure & mixed) derivative of the log-likelihood with respect to any of the log-likelihood’s parameters (alternatively the linear predictors of the parameters - see the description of the d_eta instance variable.). See the variables below.

Optionally, a mean_init_fam attribute can be defined - specfiying a Family member that is fitted to the data to get an initial estimate of the mean parameter of the assumed distribution.

References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • pars (int) – Number of parameters of the distribution belonging to the random variables assumed to have generated the observations, e.g., 2 for the Normal: mean and standard deviation.

  • links ([Link]) – Link functions for each of the parameters of the distribution.

Variables:
  • d_eta (bool) – A boolean indicating whether partial derivatives of llk are provided with respect to the linear predictor instead of parameters (i.e., the mean), defaults to False (derivatives are provided with respect to parameters)

  • d1 ([Callable]) – A list holding n_par functions to evaluate the first partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling __init__().

  • d2 ([Callable]) – A list holding n_par functions to evaluate the second (pure) partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling __init__().

  • d2m ([Callable]) – A list holding n_par*(n_par-1)/2 functions to evaluate the second mixed partial derivatives of llk with respect to each parameter of the llk in order: d2m[0] = \(\partial l/\partial \mu_1 \partial \mu_2\), d2m[1] = \(\partial l/\partial \mu_1 \partial \mu_3\), …, d2m[n_par-1] = \(\partial l/\partial \mu_1 \partial \mu_{n_{par}}\), d2m[n_par] = \(\partial l/\partial \mu_2 \partial \mu_3\), d2m[n_par+1] = \(\partial l/\partial \mu_2 \partial \mu_4\), … . Needs to be initialized when calling __init__().

get_resid(y: ndarray, *mus: list[ndarray], **kwargs) ndarray | None

Get standardized residuals for a GAMMLSS model (Rigby & Stasinopoulos, 2005).

Any implementation of this function should return a vector that looks like what could be expected from taking len(y) independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:
  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.

  • mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

a vector of shape (-1,1) containing standardized residuals under the current model or None in case residuals are not readily available.

Return type:

np.ndarray | None

init_coef(models: list[Callable]) ndarray

(Optional) Function to initialize the coefficients of the model.

Can return None , in which case random initialization will be used.

Parameters:

models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.

Returns:

A numpy array of shape (-1,1), holding initial values for all model coefficients.

Return type:

np.ndarray

init_lambda(penalties: list[Callable]) list[float]

(Optional) Function to initialize the smoothing parameters of the model.

Can return None , in which case random initialization will be used.

Parameters:

penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.

Returns:

A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.

Return type:

[float]

llk(y: ndarray, *mus: list[ndarray]) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.

  • mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, *mus: list[ndarray]) ndarray

Log-probability of observing every element in \(\mathbf{y}\) under their respective distribution parameterized by mus.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.

  • mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAMMALS(links: list[Link])

Bases: GAMLSSFamily

Family for a GAMMA GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family follows the Gamma family, in that we assume: \(Y_i \sim \Gamma(\mu_i,\phi_i)\). The difference to the Gamma family is that we now also model \(\phi\) as an additive combination of smooth variables and other parametric terms. The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code for lp() method of the Gamma family for details).

References:

  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

links ([Link]) – Link functions for the mean and standard deviation. Standard would be links=[LOG(),LOG()].

get_resid(y: ndarray, mu: ndarray, scale: ndarray) ndarray

Get standardized residuals for a Gamma GAMMLSS model (Rigby & Stasinopoulos, 2005).

Essentially, to get a standaridzed residual vector we first have to account for the mean-variance relationship of our RVs (which we also have to do for the Gamma family) - for this we can simply compute deviance residuals again (see Wood, 2017). These should be \(\sim N(0,\phi_i)\) (where \(\phi_i\) is the element in scale for a specific observation) - so if we divide each of those by the observation-specific scale we can expect the resulting standardized residuals to be :math:` sim N(0,1)` if the model is correct.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

A list of standardized residuals that should be ~ N(0,1) if the model is correct.

Return type:

np.ndarray

init_coef(models: list[Callable]) ndarray

Function to initialize the coefficients of the model.

Fits a GAMM for the mean and initializes all coef. for the scale parameter to 1.

Parameters:

models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.

Returns:

A numpy array of shape (-1,1), holding initial values for all model coefficients.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, scale: ndarray) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: ndarray) ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\) and scale = \(\boldsymbol{\phi}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed value.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAUMLSS(links: list[Link])

Bases: GAMLSSFamily

Family for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family follows the Gaussian family, in that we assume: \(Y_i \sim N(\mu_i,\sigma_i)\). i.e., each of the \(N\) observations is still believed to have been generated from an independent normally distributed RV with observation-specific mean.

The important difference is that the scale parameter, \(\sigma\), is now also observation-specific and modeled as an additive combination of smooth functions and other parametric terms, just like the mean is in a Normal GAM. Note, that this explicitly models heteroscedasticity - the residuals are no longer assumed to be i.i.d samples from \(\sim N(0,\sigma)\), since \(\sigma\) can now differ between residual realizations.

References:

  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

links ([Link]) – Link functions for the mean and standard deviation. Standard would be links=[Identity(),LOG()].

get_resid(y: ndarray, mu: ndarray, sigma: ndarray) float

Get standardized residuals for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).

Essentially, each residual should reflect a realization of a normal with mean zero and observation-specific standard deviation. After scaling each residual by their observation-specific standard deviation we should end up with standardized residuals that can be expected to be i.i.d \(\sim N(0,1)\) - assuming that our model is correct.

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

A list of standardized residuals that should be ~ N(0,1) if the model is correct.

Return type:

np.ndarray

init_coef(models: list[Callable]) ndarray

Function to initialize the coefficients of the model.

Fits a GAMM for the mean and initializes all coef. for the standard deviation to 1.

Parameters:

models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.

Returns:

A numpy array of shape (-1,1), holding initial values for all model coefficients.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, sigma: ndarray) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, sigma: ndarray) ndarray

Log-probability of observing every proportion in y under their respective Normal with observation-specific mean and standard deviation.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed value.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GSMMFamily(pars: int, links: list[Link], *llkargs)

Bases: object

Base-class for General Smooth “families” as discussed by Wood, Pya, & Säfken (2016). For estimation of mssm.models.GSMM models via L-qEFS (Krause et al., submitted) it is sufficient to implement llk(). gradient() and hessian() can then simply return None. For exact estimation via Newton’s method, the latter two functions need to be implemented and have to return the gradient and hessian at the current coefficient estimate respectively.

Additional parameters needed for likelihood, gradient, or hessian evaluation can be passed along via the llkargs. They are then made available in self.llkargs.

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • pars (int) – Number of parameters of the likelihood.

  • links ([Link]) – List of Link functions for each parameter of the likelihood, e.g., links=[Identity(),LOG()].

Variables:

extra_coef (int, optional) – Number of extra coefficients required by specific family or None. By default set to None and changed to int by specific families requiring this.

get_resid(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array], **kwargs) ndarray | None

Get standardized residuals for a GSMM model.

Any implementation of this function should return a vector that looks like what could be expected from taking independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

a vector of shape (-1,1) containing standardized residuals under the current model (Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation)) or None in case residuals are not readily available.

Return type:

np.ndarray | None

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray

Function to evaluate the gradient of the llk at current coefficient estimate coef.

By default relies on numerical differentiation as implemented in scipy to approximate the Gradient from the implemented log-likelihood function. See the link in the references for more details.

References:
Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array | None

Function to evaluate the hessian of the llk at current coefficient estimate coef.

Only has to be implemented if full Newton is to be used to estimate coefficients. If the L-qEFS update by Krause et al. (in preparation) is to be used insetad, this method does not have to be implemented.

References:
Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

init_coef(models: list[Callable]) ndarray

(Optional) Function to initialize the coefficients of the model.

Can return None , in which case random initialization will be used.

Parameters:

models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.

Returns:

A numpy array of shape (-1,1), holding initial values for all model coefficients.

Return type:

np.ndarray

init_lambda(penalties: list[Callable]) list[float]

(Optional) Function to initialize the smoothing parameters of the model.

Can return None , in which case random initialization will be used.

Parameters:

penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.

Returns:

A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.

Return type:

np.ndarray

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float

log-probability of data under given model.

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

class mssm.src.python.exp_fam.Gamma(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float = None)

Bases: Family

Gamma Family.

We assume: \(Y_i \sim \Gamma(\mu_i,\phi)\). The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code for lp() method for details).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.

  • scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) ndarray

Variance function for the Gamma family.

The variance of random variable \(Y\) is proportional to it’s mean raised to the second power.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean

Returns:

mu raised to the power of 2

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, scale: float = 1) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: float = 1) ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed value.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Gaussian(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Identity object>, scale: float = None)

Bases: Family

Normal/Gaussian Family.

We assume: \(Y_i \sim N(\mu_i,\sigma)\) - i.e., each of the \(N\) observations is generated from a normally distributed RV with observation-specific mean and shared scale parameter \(\sigma\). Equivalent to the assumption that the observed residual vector - the difference between the model prediction and the observed data - should look like what could be expected from drawing \(N\) independent samples from a Normal with mean zero and standard deviation equal to \(\sigma\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical identity link.

  • scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) ndarray

Variance function for the Normal family.

Not really a function since the link between variance and mean of the RVs is assumed constant for this model.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean

Returns:

a N-dimensional vector of 1s

Return type:

np.ndarray

Returns:

a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, sigma: float = 1) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • sigma (float, optional) – The (estimated) sigma parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, sigma: float = 1) ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Normal with mean = \(\boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • sigma (float, optional) – The (estimated) sigma parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Identity

Bases: Link

Identity Link function. \(\boldsymbol{\mu}=\boldsymbol{\eta}\) and so this link is trivial.

dy1(mu: ndarray) ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) ndarray

Canonical link for normal distribution with \(\boldsymbol{\eta} = \boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) ndarray

For the identity link, \(\boldsymbol{\eta} = \boldsymbol{\mu}\), so the inverse is also just the identity. see Faraway (2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.InvGauss(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float | None = None)

Bases: Family

Inverse Gaussian Family.

We assume: \(Y_i \sim IG(\mu_i,\phi)\). The Inverse Gaussian distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and scale parameter - called \(\nu\) and \(\lambda\) respectively (see the scipy implementation). We can simply set \(\nu=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017). Wood (2017) shows that \(\phi=1/\lambda\), so this provides \(\lambda=1/\phi\)

References:

Parameters:
  • link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.

  • scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) ndarray

Variance function for the Inverse Gaussian family.

The variance of random variable \(Y\) is proportional to it’s mean raised to the third power.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean

Returns:

mu raised to the power of 3

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, scale: float = 1) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: float = 1) ndarray

Log-probability of observing every value in \(\mathbf{y}\) under their respective inverse Gaussian with mean = \(\boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed value.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.LOG

Bases: Link

Log Link function. \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).

dy1(mu: ndarray) ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) ndarray

Non-canonical link for Gamma distribution with \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) ndarray

For the log link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu})\), so \(exp(\boldsymbol{\eta})=\boldsymbol{\mu}\). see Faraway (2016)

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.LOGb(b: float)

Bases: Link

Log + b Link function. \(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).

Parameters:

b (float) – The constant to add to \(\mu\) before taking the log.

dy1(mu: ndarray) ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) ndarray

\(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) ndarray

For the logb link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu} + b)\), so \(exp(\boldsymbol{\eta})-b =\boldsymbol{\mu}\)

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

Bases: object

Link function base class. To be implemented by any link functiion used for GAMMs and GAMMLSS models. Only links used by GAMLSS models require implementing the dy2 function. Note, that care must be taken that every method returns only valid values. Specifically, no returned element may be numpy.nan or numpy.inf.

dy1(mu: ndarray) ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\) Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) ndarray

Link function \(f()\) mapping mean \(\boldsymbol{\mu}\) of an exponential family to the model prediction \(\boldsymbol{\eta}\), so that \(f(\boldsymbol{\mu}) = \boldsymbol{\eta}\). see Wood (2017, 3.1.2) and Faraway (2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) ndarray

Inverse of the link function mapping \(\boldsymbol{\eta} = f(\boldsymbol{\mu})\) to the mean \(fi(\boldsymbol{\eta}) = fi(f(\boldsymbol{\mu})) = \boldsymbol{\mu}\). see Faraway (2016) and the Link.f function.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.Logit

Bases: Link

Logit Link function, which is canonical for the binomial model. \(\boldsymbol{\eta}\) = log-odds of success.

dy1(mu: ndarray) ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017):

\[ \begin{align}\begin{aligned}f(\mu) = log(\mu / (1 - \mu))\\f(\mu) = log(\mu) - log(1 - \mu)\\\partial f(\mu)/ \partial \mu = 1/\mu - 1/(1 - \mu)\end{aligned}\end{align} \]

Faraway (2016) simplifies this to: \(\partial f(\mu)/ \partial \mu = 1 / (\mu - \mu^2) = 1/ ((1-\mu)\mu)\)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) ndarray

Canonical link for binomial distribution with \(\boldsymbol{\mu}\) holding the probabilities of success, so that the model prediction \(\boldsymbol{\eta}\) is equal to the log-odds.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) ndarray

For the logit link and the binomial model, \(\boldsymbol{\eta}\) = log-odds, so the inverse to go from \(\boldsymbol{\eta}\) to \(\boldsymbol{\mu}\) is \(\boldsymbol{\mu} = exp(\boldsymbol{\eta}) / (1 + exp(\boldsymbol{\eta}))\). see Faraway (2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.MULNOMLSS(pars: int)

Bases: GAMLSSFamily

Family for a Multinomial GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family assumes that each observation \(y_i\) corresponds to one of \(K\) classes (labeled as 0, …, \(K\)) and reflects a realization of an independent RV \(Y_i\) with observation-specific probability mass function defined over the \(K\) classes. These \(K\) probabilities - that \(Y_i\) takes on class 1, …, \(K\) - are modeled as additive combinations of smooth functions of covariates and other parametric terms.

As an example, consider a visual search experiment where \(K-1\) distractors are presented on a computer screen together with a single target and subjects are instructed to find the target and fixate it. With a Multinomial model we can estimate how the probability of looking at each of the \(K\) stimuli on the screen changes (smoothly) over time and as a function of other predictor variables of interest (e.g., contrast of stimuli, dependening on whether parfticipants are instructed to be fast or accurate).

References:

  • Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

pars (int) – K-1, i.e., 1- Number of classes or the number of linear predictors.

get_resid(y: ndarray, *mus: list[ndarray]) None

Placeholder function for residuals of a Multinomial model - yet to be implemented.

Parameters:
  • y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.

  • mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

Currently None - since no residuals are implemented

llk(y: ndarray, *mus: list[ndarray])

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.

  • mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, *mus: list[ndarray]) ndarray

Log-probability of observing class k under current model.

Our DV consists of K classes but we essentially enforce a sum-to zero constraint on the DV so that we end up modeling only K-1 (non-normalized) probabilities of observing class k (for all k except k==K) as an additive combination of smooth functions of our covariates and other parametric terms. The probability of observing class K as well as the normalized probabilities of observing each other class can readily be computed from these K-1 non-normalized probabilities. This is explained quite well on Wikipedia (see refs).

Specifically, the probability of the outcome being class k is simply:

\(p(Y_i == k) = \mu_k / (1 + \sum_j^{K-1} \mu_j)\) where \(\mu_k\) is the aforementioned non-normalized probability of observing class \(k\) - which is simply set to 1 for class \(K\) (this follows from the sum-to-zero constraint; see Wikipedia).

So, the log-prob of the outcome being class k is:

\(log(p(Y_i == k)) = log(\mu_k) - log(1 + \sum_j^{K-1} \mu_j)\)

References:

Parameters:
  • y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.

  • mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Poisson(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>)

Bases: Family

Poisson Family.

We assume: \(Y_i \sim P(\lambda)\). We can simply set \(\lambda=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017) and treat the scale parameter of a GAMM (\(\phi\)) as fixed/known at 1.

References:

Parameters:

link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.

D(y: ndarray, mu: ndarray) ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) ndarray

Variance function for the Poisson family.

The variance of random variable \(Y\) is proportional to it’s mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean

Returns:

mu

Return type:

np.ndarray

dVy1(mu: ndarray) ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray) float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

init_mu(y: ndarray) ndarray

Function providing initial \(\boldsymbol{\mu}\) vector for Poisson GAMM.

We shrink extreme observed counts towards mean.

Parameters:

y (np.ndarray) – A numpy array containing each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing an intial estimate of the mean of the response variables

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray) float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observation.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

  • scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray) ndarray

Log-probability of observing every value in \(\mathbf{y}\) under their respective Poisson with mean = \(\boldsymbol{\mu}\).

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – A numpy array containing each observed value.

  • mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.PropHaz(ut: ndarray, r: ndarray)

Bases: GSMMFamily

Family for proportional Hazard model - a type of General Smooth model as discussed by Wood, Pya, & Säfken (2016).

Based on Supplementary materials G in Wood, Pya, & Säfken (2016). The dependent variable passed to the mssm.src.python.formula.Formula needs to hold delta indicating whether the event was observed or not (i.e., only values in {0,1}).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False)

# Prep everything for prophaz model
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Can plot the estimated effects on the scale of the linear predictor (i.e., log hazard) via mssmViz
plot(model)
References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:
  • ut (np.ndarray) – Unique event time vector (each time represnted as int) as described by WPS (2016), holding unique event times in decreasing order.

  • r (np.ndarray) – Index vector as described by WPS (2016), holding for each data-point (i.e., for each row in Xs[0]) the index to it’s corresponding event time in ut.

get_baseline_hazard(coef: ndarray, delta: ndarray, Xs: list[csc_array]) ndarray

Get the cumulative baseline hazard function as defined by Wood, Pya, & Säfken (2016).

The function is evaluated for all k unique event times that were available in the data.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False)

# Prep everything for prophaz model
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Now get cumulative baseline hazard estimate
H = PropHaz_fam.get_baseline_hazard(model.coef,sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat],model.get_mmat())

# And plot it
plt.plot(ut,H)
plt.xlabel("Time")
plt.ylabel("Cumulative Baseline Hazard")
References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).

  • Xs ([scp.sparse.csc_array]) – The list model matrices (here holding a single model matrix) obtained from mssm.models.GAMMLSS.get_mmat().

  • delta (np.ndarray) – Dependent variable passed to mssm.src.python.formula.Formula(), holds (for each row in Xs[0]) a value in {0,1}, indicating whether for that observation the event was observed or not.

Returns:

numpy array, holding k baseline hazard function estimates

Return type:

np.ndarray

get_resid(coef, coef_split_idx, ys, Xs, resid_type: str = 'Martingale', reorder: ndarray | None = None) ndarray

Get Martingale or Deviance residuals for a proportional Hazard model.

See the PropHaz.get_survival() function for examples.

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.

  • ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.

  • Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

  • resid_type (str, optional) – The type of residual to compute, supported are “Martingale” and “Deviance”.

  • reorder (np.ndarray) – A flattened np.ndarray containing for each data point the original index in the data-set before sorting. Used to re-order the residual vector into the original order. If this is set to None, the residual vector is not re-ordered and instead returned in the order of the sorted data-frame passed to the model formula.

Returns:

The residual vector of shape (-1,1)

Return type:

np.ndarray

get_survival(coef: ndarray, Xs: list[csc_array], delta: ndarray, t: int, x: ndarray | csc_array, V: csc_array, compute_var: bool = True) tuple[ndarray, ndarray | None]

Compute survival function + variance at time-point t, given k optional covariate vector(s) x as defined by Wood, Pya, & Säfken (2016).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False)

# Prep everything for prophaz model

# Create index variable for residual ordering
sim_dat["index"] = np.arange(sim_dat.shape[0])

# Now sort
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))
res_idx = np.argsort(sim_dat["index"].values)

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Now get estimate of survival function and see how it changes with x0
new_dat = pd.DataFrame({"x0":np.linspace(0,1,5),
                        "x1":np.linspace(0,1,5),
                        "x2":np.linspace(0,1,5),
                        "x3":np.linspace(0,1,5)})

# Get model matrix using only f0
_,Xt,_ = model.predict(use_terms=[0],n_dat=new_dat)

# Now iterate over all time-points and obtain the predicted survival function + standard error estimate
# for all 5 values of x0:
S = np.zeros((len(ut),Xt.shape[0]))
VS = np.zeros((len(ut),Xt.shape[0]))
for idx,ti in enumerate(ut):

   # Su and VSu are of shape (5,1) here but will generally be of shape (Xt.shape[0],1)
   Su,VSu = PropHaz_fam.get_survival(model.coef,model.get_mmat(),sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat],
                                    ti,Xt,model.lvi.T@model.lvi)
   S[idx,:] = Su.flatten()
   VS[idx,:] = VSu.flatten()

# Now we can plot the estimated survival functions + approximate cis:
for xi in range(Xt.shape[0]):

   plt.fill([*ut,*np.flip(ut)],
            [*(S[:,xi] + 1.96*VS[:,xi]),*np.flip(S[:,xi] - 1.96*VS[:,xi])],alpha=0.5)
   plt.plot(ut,S[:,xi],label=f"x0 = {new_dat["x0"][xi]}")
plt.legend()
plt.xlabel("Time")
plt.ylabel("Survival")
plt.show()

# Note how the main effect of x0 is reflected in the plot above:
plot(model,which=[0])

# Residual plots can be created via `plot_val` from `mssmViz` - by default Martingale residuals are returned (see Wood, 2017)
fig = plt.figure(figsize=(10,3),layout='constrained')
axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2})
# Note the use of `gsmm_kwargs_pred={}` to ensure that the re-ordering is not applied to the plot against predicted values
plot_val(model,gsmm_kwargs={"reorder":res_idx},gsmm_kwargs_pred={},ar_lag=25,axs=axs)

# Can also get Deviance residuals:
fig = plt.figure(figsize=(10,3),layout='constrained')
axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2})

plot_val(model,gsmm_kwargs={"reorder":res_idx,"resid_type":"Deviance"},gsmm_kwargs_pred={"resid_type":"Deviance"},ar_lag=25,axs=axs)
References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).

  • Xs ([scp.sparse.csc_array]) – The list model matrices (here holding a single model matrix) obtained from mssm.models.GAMMLSS.get_mmat().

  • delta (np.ndarray) – Dependent variable passed to mssm.src.python.formula.Formula(), holds (for each row in Xs[0]) a value in {0,1}, indicating whether for that observation the event was observed or not.

  • t (int) – Time-point at which to evaluate the survival function.

  • x (np.ndarray or scp.sparse.csc_array) – Optional vector (or matrix - can also be sparse) of covariate values. Needs to be of shape (k,len(coef)).

  • V (scp.sparse.csc_array) – Estimated Co-variance matrix of posterior for coef

  • compute_var (bool, optional) – Whether to compue the variance estimate of the survival as well. Otherwise None will be returned as the second argument.

Returns:

Two arrays, the first holds k survival function estimates, the latter holds k variance estimates for each of the survival function estimates. The second argument will be None instead if compute_var = False.

Return type:

tuple[np.ndarray, np.ndarray | None]

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray

Gradient as defined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.

  • ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.

  • Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array

Hessian as defined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.

  • ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.

  • Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

init_coef(models: list[Callable]) ndarray

Function to initialize the coefficients of the model.

Parameters:

models ([mssm.models.GAMM]) – A list of GAMMs, - each based on one of the formulas provided to a model.

Returns:

A numpy array of shape (-1,1), holding initial values for all model coefficients.

Return type:

np.ndarray

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float

Log-likelihood function as defined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.

  • ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.

  • Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

mssm.src.python.exp_fam.est_scale(res: ndarray, rows_X: int, total_edf: float) float

Scale estimate from Wood & Fasiolo (2017).

Refereces:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.

Parameters:
  • res (np.ndarray) – A numpy array containing the difference between the model prediction and the (pseudo) data.

  • rows_X (int) – The number of observations collected.

  • total_edf (float) – The expected degrees of freedom for the model.

mssm.src.python.file_loading module

mssm.src.python.file_loading.clear_cache(cache_dir: str, should_cache: bool) None

Clear up cache for row-subsets of model matrix.

Parameters:
  • cache_dir (str) – path to cache directory

  • should_cache (bool) – whether or not the directory should actually be created

mssm.src.python.file_loading.read_cor_cov_single(y: str, x: str, file: str, file_loading_kwargs: dict) ndarray

Read values of covariate x from file correcting for NaNs in y.

Parameters:
  • y (str) – name of covariate potentially having NaNs

  • x (str) – covariate name

  • file (str) – file name

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x for which y is not NaN

Return type:

np.ndarray

mssm.src.python.file_loading.read_cov(y: str, x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray

Read values of covariate x from files correcting for NaNs in y.

Parameters:
  • y (str) – name of covariate potentially having NaNs

  • x (str) – covariate name

  • files (list[str]) – list of file names

  • nc (int) – Number of cores to use to read in parallel

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x for which y is not NaN

Return type:

np.ndarray

mssm.src.python.file_loading.read_cov_no_cor(x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray

Read values of covariate x from files.

Parameters:
  • x (str) – covariate name

  • files (list[str]) – list of file names

  • nc (int) – Number of cores to use to read in parallel

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x

Return type:

np.ndarray

mssm.src.python.file_loading.read_dtype(column: str, file: str, file_loading_kwargs: dict) dtype

Read datatype of variable column in file.

Parameters:
  • column (str) – Name of covariate

  • file (str) – file name

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

Datatype (numpy) of colum

Return type:

np.dtype

mssm.src.python.file_loading.read_no_cor_cov_single(x: str, file: str, file_loading_kwargs: dict) ndarray

Read values of covariate x from file.

Parameters:
  • x (str) – covariate name

  • file (str) – file name

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x

Return type:

np.ndarray

mssm.src.python.file_loading.read_unique(x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray

Read unique values of covariate x from files.

Parameters:
  • x (str) – covariate name

  • files (list[str]) – list of file names

  • nc (int) – Number of cores to use to read in parallel

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding unique values

Return type:

np.ndarray

mssm.src.python.file_loading.read_unique_single(x: str, file: str, file_loading_kwargs: dict) ndarray

Read unique values of covariate x from file.

Parameters:
  • x (str) – covariate name

  • file (str) – file name

  • file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding unique values

Return type:

np.ndarray

mssm.src.python.file_loading.setup_cache(cache_dir: str, should_cache: bool) None

Set up cache for row-subsets of model matrix.

Parameters:
  • cache_dir (str) – path to cache directory

  • should_cache (bool) – whether or not the directory should actually be created

Raises:

ValueError – if the directory already exists

mssm.src.python.formula module

class mssm.src.python.formula.Formula(lhs: lhs, terms: list[GammTerm], data: DataFrame, series_id: str | None = None, codebook: dict | None = None, print_warn: bool = True, keep_cov: bool = False, find_nested: bool = True, file_paths: list[str] = [], file_loading_nc: int = 1, file_loading_kwargs: dict = {'header': 0, 'index_col': False})

Bases: object

The formula of a regression equation.

Note: The class implements multiple get_* functions to access attributes stored in instance variables. The get functions always return a copy of the instance variable and the results are thus safe to manipulate.

Examples:

from mssm.models import *
from mssmViz.sim import *

from mssm.src.python.formula import build_penalties,build_model_matrix

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# Now with a tensor smooth
formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],data=Binomdat)

# Now with a tensor smooth anova style
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x0","x1"]),f(["x2"]),f(["x3"])],data=Binomdat)


######## Stream data from file and set up custom codebook #########

file_paths = [f'https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat_cond_{cond}.csv' for cond in ["a","b"]]

# Set up specific coding for factor 'cond'
codebook = {'cond':{'a': 0, 'b': 1}}

formula = Formula(lhs=lhs("y"), # The dependent variable - here y!
                  terms=[i(), # The intercept, a
                           l(["cond"]), # For cond='b'
                           f(["time"],by="cond"), # to-way interaction between time and cond; one smooth over time per cond level
                           f(["x"],by="cond"), # to-way interaction between x and cond; one smooth over x per cond level
                           f(["time","x"],by="cond"), # three-way interaction
                           fs(["time"],rf="sub")], # Random non-linear effect of time - one smooth per level of factor sub
                  data=None, # No data frame!
                  file_paths=file_paths, # Just a list with paths to files.
                  print_warn=False,
                  codebook=codebook)

# Alternative:
formula = Formula(lhs=lhs("y"),
                        terms=[i(),
                              l(["cond"]),
                              f(["time"],by="cond"),
                              f(["x"],by="cond"),
                              f(["time","x"],by="cond"),
                              fs(["time"],rf="sub")],
                        data=None,
                        file_paths=file_paths,
                        print_warn=False,
                        keep_cov=True, # Keep encoded data structure in memory
                        codebook=codebook)

########## preparing for ar1 model (with resets per time-series) and data type requirements ##########

dat = pd.read_csv('https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv')

# mssm requires that the data-type for variables used as factors is 'O'=object
dat = dat.astype({'series': 'O',
                  'cond':'O',
                  'sub':'O',
                  'series':'O'})

formula = Formula(lhs=lhs("y"),
                  terms=[i(),
                           l(["cond"]),
                           f(["time"],by="cond"),
                           f(["x"],by="cond"),
                           f(["time","x"],by="cond")],
                  data=dat,
                  print_warn=False,
                  series_id='series') # 'series' variable identifies individual time-series
Parameters:
  • lhs – The lhs object defining the dependent variable.

  • terms ([GammTerm]) – A list of the terms which should be added to the model. See mssm.src.python.terms for info on which terms can be added.

  • data (pd.DataFrame or None) – A pandas dataframe (with header!) of the data which should be used to estimate the model. The variable specified for lhs as well as all variables included for a term in terms need to be present in the data, otherwise the call to Formula will throw an error.

  • series_id (str, optional) – A string identifying the individual experimental units. Usually a unique trial identifier. Only necessary if approximate derivative computations are to be utilized for random smooth terms or if you need to estimate an ‘ar1’ model for multiple time-series data.

  • codebook (dict or None) – Codebook - keys should correspond to factor variable names specified in terms. Values should again be a dict, with keys for each of K levels of the factor and value corresponding to an integer in {0,K}.

  • print_warn (bool,optional) – Whether warnings should be printed. Useful when fitting models from terminal. Defaults to True.

  • keep_cov (bool,optional) – Whether or not the internal encoding structure of all predictor variables should be created when forming \(\mathbf{X}^T\mathbf{X}\) iteratively instead of forming \(\mathbf{X}\) directly. Can speed up estimation but increases memory footprint. Defaults to True.

  • find_nested (bool,optional) – Whether or not to check for nested smooth terms. This only has an effect if you include at least one smooth term with more than two variables. Additionally, this check is often not necessary if you correctly use the te key-word of smooth terms and ensure that the marginals used to construct ti smooth terms have far fewer basis functions than the “main effect” univariate smooths. Thus, if you know what you’re doing and you’re working with large models, you might want to disable this (i.e., set to False) because this check can get quite expensive for larger models. Defaults to True.

  • file_paths ([str],optional) – A list of paths to .csv files from which \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Setting this to a non-empty list will prevent fitting X as a whole. data should then be set to None. Defaults to an empty list.

  • file_loading_nc (int,optional) – How many cores to use to a) accumulate \(\mathbf{X}\) in parallel (if data is not None and file_paths is an empty list) or b) to accumulate \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) (and \(\mathbf{\eta}\) during estimation) (if data is None and file_paths is a non-empty list). For case b, this should really be set to the maimum number of cores available. For a this only really speeds up accumulating \(\mathbf{X}\) if \(\mathbf{X}\) has many many columns and/or rows. Defaults to 1.

  • file_loading_kwargs (dict,optional) – Any key-word arguments to pass to pandas.read_csv when \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively (if data is None and file_paths is a non-empty list). Defaults to {"header":0,"index_col":False}.

Variables:
  • lhs (lhs) – The left-hand side object of the regression formula passed to the constructor. Initialized at construction.

  • terms ([GammTerm]) – The list of terms passed to the constructor. Initialized at construction.

  • data (pd.DataFrame) – The dataframe passed to the constructor. Initialized at construction.

  • coef_per_term ([int]) – A list containing the number of coefficients corresponding to each term included in terms. Initialized at construction.

  • coef_names ([str]) – A list containing a named identifier (e.g., “Intercept”) for each coefficient estimated by the model. Initialized at construction.

  • n_coef (int) – The number of coefficients estimated by the model in total. Initialized at construction.

  • unpenalized_coef (int) – The number of un-penalized coefficients estimated by the model. Initialized at construction.

  • y_flat (np.ndarray or None) – An array, containing all values on the dependent variable (i.e., specified by lhs.variable) in order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.

  • cov_flat (np.ndarray or None) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.

  • NOT_NA_flat (np.ndarray or None) – An array, containing an indication (as bool) for each value on the dependent variable (i.e., specified by lhs.variable) whether the corresponding value is not a number (“NA”) or not. In order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.

encode_data(data: DataFrame, prediction: bool = False) tuple[ndarray | None, ndarray, ndarray | None, list[ndarray] | None, list[ndarray] | None, list[ndarray] | None, ndarray | None]

Encodes data, which needs to be a pd.DataFrame and by default (if prediction==False) builds an index of which rows in data are NA in the column of the dependent variable described by self.lhs.

Parameters:
  • data (pd.DataFrame) – The data to encode.

  • prediction (bool, optional) – Whether or not a NA index and a column for the dependent variable should be generated.

Returns:

A tuple with 7 (optional) entries: the dependent variable described by self.lhs, the encoded predictor variables as a (N,k) array (number of rows matches the number of rows of the first entry returned, the number of columns matches the number of k variables present in the formula), an indication for each row whether the dependent variable described by self.lhs is NA, like the first entry but split into a list of lists by self.series_id, like the second entry but split into a list of lists by self.series_id, ike the third entry but split into a list of lists by self.series_id, start and end points for the splits used to split the previous three elements (identifying the start and end point of every level of self.series_id).

Return type:

(np.ndarray|None, np.ndarray, np.ndarray|None, list[np.ndarray]|None, list[np.ndarray]|None, list[np.ndarray]|None, np.ndarray|None)

get_coding_factors() dict

Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

get_data() DataFrame

Get a copy of the data specified for this formula.

get_depvar() ndarray

Get a copy of the encoded dependent variable (defined via self.lhs).

get_factor_codings() dict

Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the levels (str) of the factor and the values to their encoded levels (int).

get_factor_levels() dict

Get a copy of the factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

get_has_intercept() bool

Does this formula include an intercept or not.

get_ir_smooth_term_idx() list[int]

Get a copy of the list of indices that identify impulse response terms in self.terms.

get_lhs() lhs

Get a copy of the lhs specified for this formula.

get_linear_term_idx() list[int]

Get a copy of the list of indices that identify linear terms in self.terms.

get_n_coef() int

Get the number of coefficients that are implied by the formula.

get_notNA() ndarray

Get a copy of the encoded ‘not a NA’ vector for the dependent variable (defined via self.lhs).

get_random_term_idx() list[int]

Get a copy of the list of indices that identify random terms in self.terms.

get_smooth_term_idx() list[int]

Get a copy of the list of indices that identify smooth terms in self.terms.

get_subgroup_variables() list

Returns a copy of sub-group variables for factor smooths.

get_term_names() list[str]

Returns a copy of the list with the names of the terms specified for this formula.

get_terms() list[GammTerm]

Get a copy of the terms specified for this formula.

get_var_map() dict

Get a copy of the var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix returned by self.encode_data.

get_var_maxs() dict

Get a copy of the var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in self.data for continuous variables or None for categorical variables.

get_var_mins() dict

Get a copy of the var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on in self.data for continuous variables or None for categorical variables.

get_var_mins_maxs() tuple[dict, dict]

Get a tuple containing copies of both the mins and maxs directory. See self.get_var_mins and self.get_var_maxs.

get_var_types() dict

Get a copy of the var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

has_ir_terms() bool

Does this formula include impulse response terms or not.

mssm.src.python.formula.build_model_matrix(formula: Formula, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) csc_array

Function to build the model matrix implied by formula.

Important: A small selection of smooth terms, requires that the penalty matrices are built at least once before the model matrix can be build. For this reason, you generally must call build_penalties(formula) before calling build_model_matrix(formula) (interally, mssm checks whether formula.built_penalties==True.). See the example below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

from mssm.src.python.formula import build_penalties,build_model_matrix

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# First extract the penalties
penalties = build_penalties(formula)

# Then the model matrix:
X = build_model_matrix(formula)
Parameters:
  • formula (Formula) – A Formula

  • pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None

  • use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If None, then all terms are build. Terms not built are set to zero columns, defaults to None

  • tol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0

Raises:
  • ValueError – If formula.built_penalties == False - i.e., it is required that build_penalties(formula) was called before calling build_model_matrix(formula).

  • NotImplementedError – If the formula was set up to read data from file, rather than from a pd.Dataframe.

Returns:

The model matrix implied by a Formula and cov_flat.

Return type:

scp.sparse.csc_array

mssm.src.python.formula.build_penalties(formula) list[LambdaTerm]

Function to build all penalty matrices required by a Formula.

The function is called whenever it is needed, but the example below shows you how to use it in case you want to extract the penalties directly.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssm.src.python.formula import build_penalties

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# Now extract the penalties
penalties = build_penalties(formula)

print(penalties)
Parameters:

formula (Formula) – A Formula

Raises:
  • KeyError – If an un-penalized irf term is included in the formula after penalized terms.

  • KeyError – If an un-penalized smooth term is included in the formula after penalized terms.

  • ValueError – If no start index has been defined by the formula. For testing only.

Returns:

A list of all penalties (encoded as LambdaTerm) required by the formula

Return type:

list[LambdaTerm]

mssm.src.python.formula.build_sparse_matrix_from_formula(terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat: ndarray, cov: ndarray | None, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) csc_array

Build model matrix from formula properties.

This function is used internally to construct model matrices from Formula objects. For greater convenience see the build_model_matrix() function.

Important, make sure to only ever call this when formula.built_penalties==True - see the build_model_matrix() function description.

Parameters:
  • terms (list[GammTerm]) – List of terms of a Formula

  • has_intercept (bool) – Indicator of whether the Formula has an intercept or not

  • ltx (list[int]) – Linear term indices

  • irstx (list[int]) – Impulse response function term indices

  • stx (list[int]) – Smooth term indices

  • rtx (list[int]) – Random term indices

  • var_types (dict) – Dictionary holding variable types

  • var_map (dict) – Dictionary mapping variable names to column indices in the encoded data

  • var_mins (dict) – Dictionary with variable minimums

  • var_maxs (dict) – Dictionary with variable maximums

  • factor_levels (dict) – Dictionary with levels associated with each factor

  • cov_flat (np.ndarray) – Encoded data

  • cov (np.ndarray | None, optional) – Encoded data split by levels of the factor in Formula.series_id

  • pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None

  • use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If None, then all terms are build. Terms not built are set to zero columns, defaults to None

  • tol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0

Returns:

The model matrix implied by a Formula and cov_flat.

Return type:

scp.sparse.csc_array

class mssm.src.python.formula.lhs(variable: str, f: Callable = None)

Bases: object

The Left-hand side of a regression equation.

See the Formula class for examples.

Parameters:
  • variable (str) – The name of the dependent/response variable in the dataframe passed to a Formula. Can point to continuous and categorical variables. For mssm..models.GSMM models, the variable can also be set to any placeholder variable in the data, since not every Formula will be associated with a particular response variable.

  • f (Callable, optional) – A function that will be applied to the variable before fitting. For example: np.log(). By default no function is applied to the variable.

mssm.src.python.gamm_solvers module

mssm.src.python.gamm_solvers.PIRLS_newton_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) tuple[ndarray, ndarray, ndarray]

Internal function. Compute pseudo-data and newton weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1 and 3.1.2)

Calculation reflects full Newton scoring!

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – vector of observations

  • mu (np.ndarray) – vector of mean estimates

  • eta (np.ndarray) – vector of linear predictors

  • family (Family) – Family of model

Raises:

ValueError – If not a single observation provided information for newton weights.

Returns:

the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations

Return type:

tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.PIRLS_pdat_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) tuple[ndarray, ndarray, ndarray]

Internal function. Compute pseudo-data and weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1)

Calculation is based on a(mu) = 1, so reflects Fisher scoring!

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – vector of observations

  • mu (np.ndarray) – vector of mean estimates

  • eta (np.ndarray) – vector of linear predictors

  • family (Family) – Family of model

Raises:

ValueError – If not a single observation provided information for Fisher weights.

Returns:

the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations

Return type:

tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.apply_eigen_perm(Pr: list[int], InvCholXXSP: csc_array) csc_array

Internal function. Unpivots columns of InvCholXXSP (usually the inverse of a Cholesky factor) and returns the unpivoted version.

Parameters:
  • Pr (list[int]) – List of column indices

  • InvCholXXSP (scp.sparse.csc_array) – Pivoted matrix

Returns:

Unpivoted matrix

Return type:

scp.sparse.csc_array

mssm.src.python.gamm_solvers.back_track_alpha(coef: ndarray, step: ndarray, llk_fun: Callable, grad_fun: Callable, *llk_args, alpha_max: float = 1, c1: float = 0.0001, max_iter: int = 100) float | None

Simple step-size backtracking function that enforces Armijo condition (Nocedal & Wright, 2004)

References:
  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:
  • coef (np.ndarray) – coefficient estimate

  • step (np.ndarray) – step to take to update coefficients

  • llk_fun (Callable) – llk function

  • grad_fun (Callable) – function to evaluate gradient of llk

  • alpha_max (float, optional) – Parameter by Nocedal & Wright, defaults to 1

  • c1 (float, optional) – 2nd Parameter by Nocedal & Wright, defaults to 1e-4

  • max_iter (int, optional) – Number of maximum iterations, defaults to 100

Returns:

The step-length meeting the Armijo condition or None in case none such was found

Return type:

float | None

mssm.src.python.gamm_solvers.calculate_edf(LP: csc_array | None, Pr: list[int], InvCholXXS: csc_array | LinearOperator | None, penalties: list[LambdaTerm], lgdetDs: list[float] | None, colsX: int, n_c: int, drop: list[int] | None, S_emb: csc_array) tuple[float, list[float], list[csc_array]]

Internal function. Follows steps outlined by Wood & Fasiolo (2017) to compute total degrees of freedom by the model.

Generates the B matrix also required for the derivative of the log-determinant of X.T@X+S_lambda. This is either done exactly - as described by Wood & Fasiolo (2017) - or approximately. The latter is much faster.

Also implements the L-qEFS trace computations described by Krause et al. (submitted) based on a quasi-newton approximation to the negative hessian of the log-likelihood.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None

  • Pr (list[int]) – Permutation list of LP

  • InvCholXXS (scp.sparse.csc_array | scp.sparse.linalg.LinearOperator | None) – Unpivoted Inverse of LP, or a quasi-newton approximation of it (for the L-qEFS update), or None

  • penalties (list[LambdaTerm]) – list of penalties

  • lgdetDs (list[float]) – list of Derivatives of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.

  • colsX (int) – Number of columns of model matrix

  • n_c (int) – Number of cores to use for computations

  • drop (list[int]) – List of dropped coefficients - can be None

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

Returns:

A tuple containing the total estimated degrees of freedom, the amount of parameters penalized away by individual penalties in a list, and a list of the aforementioned B matrices

Return type:

tuple[float,list[float],list[scp.sparse.csc_array]]

mssm.src.python.gamm_solvers.calculate_term_edf(penalties: list[LambdaTerm], param_penalized: list[float]) list[float]

Internal function. Computes the smooth-term (and random term) specific estimated degrees of freedom.

See Wood (2017) for a definition and Wood, S. N., & Fasiolo, M. (2017). for the computations.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • penalties (list[LambdaTerm]) – List of penalties

  • param_penalized (list[float]) – List holding the amount of parameters penalized away by individual penalties - obtained from calculate_edf().

Returns:

A list holding the estimated degrees of freedom per smooth/random term in the model

Return type:

list[float]

mssm.src.python.gamm_solvers.check_drop_valid_gammlss(y: ndarray, coef: ndarray, coef_split_idx: list[int], Xs: list[csc_array], S_emb: csc_array, keep: list[int], family: GAMLSSFamily) tuple[bool, float]

Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.

Parameters:
  • y (np.ndarray) – Vector of response variable

  • coef (np.ndarray) – Vector of coefficientss

  • coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution

  • Xs (list[scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • keep (list[int]) – List of coefficients to retain

  • family (GAMLSSFamily) – Model family

Returns:

tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set.

Return type:

tuple[bool,float]

mssm.src.python.gamm_solvers.check_drop_valid_gensmooth(ys: list[ndarray], coef: ndarray, Xs: list[csc_array], S_emb: csc_array, keep: list[int], family: GSMMFamily) tuple[bool, float | None]

Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.

Parameters:
  • ys (list[np.ndarray]) – List holding vectors of observations

  • coef (np.ndarray) – Vector of coefficients

  • Xs (list[scp.sparse.csc_array]) – List of model matrices - one per parameter

  • S_emb (scp.sparse.csc_array) – Total Penalty matrix

  • keep (list[int]) – List of coefficients to retain

  • family (GSMMFamily) – Model family

Returns:

tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set.

Return type:

tuple[bool,float|None]

mssm.src.python.gamm_solvers.compute_S_emb_pinv_det(col_S: int, penalties: list[LambdaTerm], pinv: str, root: bool = False) tuple[csc_array, csc_array, csc_array | None, list[bool]]

Internal function. Compute the total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and determines for which EFS updates the rank rather than the generalized inverse should be used.

Parameters:
  • col_S (int) – Number of columns of total penalty matrix

  • penalties (list[LambdaTerm]) – List of penalties

  • pinv (str) – Strategy to use to compute the generalized inverse. Set this to ‘svd’.

  • root (bool, optional) – Whther to compute a root of the generalized inverse, defaults to False

Returns:

A tuple holding total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and a list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used

Return type:

tuple[scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array|None, list[bool]]

mssm.src.python.gamm_solvers.compute_eigen_perm(Pr: list[int]) csc_array

Internal function. Computes column permutation matrix obtained from Eigen.

Parameters:

Pr (list[int]) – List of column indices

Returns:

Permutation matrix as sparse array

Return type:

scp.sparse.csc_array

mssm.src.python.gamm_solvers.compute_lgdetD_bsb(rank: int | None, cLam: float, gInv: csc_array, emb_SJ: csc_array, cCoef: ndarray) tuple[float, float]

Internal function. Computes derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.

See Wood, Shaddick, & Augustin, (2017) and Wood & Fasiolo (2017), and Wood (2017), and Wood (2011)

References:
  • Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • rank (int | None) – Known rank of penalty matrix or None (should only be set to int for single penalty terms)

  • cLam (float) – Current lambda value

  • gInv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix

  • emb_SJ (scp.sparse.csc_array) – Embedded penalty matrix

  • cCoef (np.ndarray) – coefficient vector

Returns:

Tuple, first element is aforementioned derivative, second is cCoef.T@emb_SJ@cCoef

Return type:

tuple[float,float]

mssm.src.python.gamm_solvers.computetrVS3(t1: ndarray | None, t2: ndarray | None, t3: ndarray | None, lTerm: LambdaTerm, V0: csc_array) float

Internal function. Compute tr(V@lTerm.S_j) from linear operator of V obtained from L-BFGS-B optimizer.

Relies on equation 3.13 in Byrd, Nocdeal & Schnabel (1992). Adapted to ensure positive semi-definitiness required by EFS update.

References:
  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • t1 (np.ndarray or None) – nCoef*2m matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then V is treated like an identity matrix.

  • t2 (np.ndarray or None) – 2m*2m matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then V is treated like an identity matrix.

  • t3 (np.ndarray or None) – 2m*nCoef matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then t1 is treated like an identity matrix.

  • lTerm (LambdaTerm) – Current lambda term for which to compute the trace.

  • V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian fo the negative penalized likelihood.

Returns:

trace

Return type:

float

mssm.src.python.gamm_solvers.correct_coef_step(coef: ndarray, n_coef: ndarray, dev: float, pen_dev: float, c_dev_prev: float, family: Family, eta: ndarray, mu: ndarray, y: ndarray, X: csc_array, n_pen: float, S_emb: csc_array, formula: Formula, n_c: int, offset: float | ndarray) tuple[float, float, ndarray, ndarray, ndarray]

Internal function. Performs step-length control on the coefficient vector.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • coef (np.ndarray) – Current coefficient estimate

  • n_coef (np.ndarray) – New coefficient estimate

  • dev (float) – new deviance

  • pen_dev (float) – new penalized deviance

  • c_dev_prev (float) – previous penalized deviance

  • family (Family) – Family of model

  • eta (np.ndarray) – vector of linear predictors - under new coefficient estimate

  • mu (np.ndarray) – vector of mean estimates - under new coefficient estimate

  • y (np.ndarray) – vector of observations of the working model

  • X (scp.sparse.csc_array) – Model matrix of working model

  • n_pen (float) – total penalty under new coefficient estimate

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • formula (Formula) – Formula of model

  • n_c (int) – Number of cores

  • offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

Updated versions of dev,pen_dev,mu,eta,coef

Return type:

tuple[float,float,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.correct_coef_step_gammlss(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], float, float, float]

Apply step size correction to Newton update for GAMLSS models, as discussed by WPS (2016).

References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • family (GAMLSSFamily) – Family of model

  • y (np.ndarray) – Vector of observations

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • coef (np.ndarray) – Current coefficient estimate

  • next_coef (np.ndarray) – Updated coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • c_llk (float) – Current log likelihood

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • a (float) – Step length for gradient descent update

Returns:

A tuple containing the corrected coefficient estimate next_coef,``next_coef`` split via coef_split_idx,next mus,next etas,next llk,nex penalized llk, updated step length fro next gradient update

Return type:

tuple[np.ndarray,list[np.ndarray],list[np.ndarray],list[np.ndarray],float,float,float]

mssm.src.python.gamm_solvers.correct_coef_step_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) tuple[ndarray, float, float, float]

Apply step size correction to Newton update for general smooth models, as discussed by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • family (GSMMFamily) – Model family

  • ys (list[np.ndarray]) – List of vectors of observations

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • coef (np.ndarray) – Coefficient estimate

  • next_coef (np.ndarray) – Proposed next coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • c_llk (float) – Current log likelihood

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • a (float) – Step length for gradient descent update

Returns:

A tuple containing the corrected coefficient estimate next_coef,next llk, next penalized llk, updated step length for next gradient update

Return type:

tuple[np.ndarray,float,float,float]

mssm.src.python.gamm_solvers.correct_lambda_step(y: ndarray, yb: ndarray, z: ndarray, Wr: csc_array, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, coef: ndarray, Lrhoi: csc_array | None, family: Family, col_S: int, S_emb: csc_array, penalties: list[LambdaTerm], was_extended: list[bool], pinv: str, lam_delta: ndarray, extend_by: dict, o_iter: int, dev_check: float, n_c: int, control_lambda: int, extend_lambda: bool, exclude_lambda: bool, extension_method_lam: str, formula: Formula, form_Linv: bool, method: str, offset: float | ndarray, max_inner: int) tuple[ndarray, csc_array, ndarray, csc_array, ndarray, ndarray, ndarray, csc_array, csc_array | None, float, list[float], float, ndarray, ndarray, dict, list[LambdaTerm], list[bool], csc_array, int, list[int] | None, list[int] | None]

Performs step-length control for lambda.

Lambda update is based on EFS update by Wood & Fasiolo (2017), step-length control is partially based on Wood et al. (2017) - Krause et al. (submitted) has the specific implementation.

References:
  • Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – vector of observations

  • yb (np.ndarray) – vector of observations of the working model

  • z (np.ndarray) – pseudo-data (can have NaNs for invalid observations)

  • Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights

  • rowsX (int) – Rows of model matrix

  • colsX (int) – Cols of model matrix

  • X (scp.sparse.csc_array) – Model matrix

  • Xb (scp.sparse.csc_array) – Model matrix of working model

  • coef (np.ndarray) – Current coefficient estimate

  • Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

  • family (Family) – Model family

  • col_S (int) – Columns of total penalty matrix

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • penalties (list[LambdaTerm]) – List of penalties

  • was_extended (bool) – List holding indication per lambda parameter whether it was extended or not

  • pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!

  • lam_delta (np.ndarray) – Proposed update to lambda parameters

  • extend_by (dict) – Extension info dictionary

  • o_iter (int) – Outer iteration index

  • dev_check (float) – Multiple of previous deviance used for convergence check

  • n_c (int) – Number of cores to use

  • control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.

  • extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.

  • exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.

  • extension_method_lam (str) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.

  • formula (Formula) – Formula of model

  • form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not

  • method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)

  • offset (float | np.ndarray) – Offset (fixed effect) to add to eta

  • max_inner (int) – Maximum number of iterations to use to update the coefficient estimate

Returns:

Tuple containing updated values for yb, Xb, z, Wr, eta, mu, n_coef, the Cholesky fo the penalzied hessian CholXXS, the inverse of the former InvCholXXS, total edf, term-wse edfs, updated scale, working residuals, accepted update to lambda, extend_by, penalties, was_extended, updated S_emb, number of lambda updates, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop

Return type:

tuple[np.ndarray, scp.sparse.csc_array, np.ndarray, scp.sparse.csc_array, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array|None, float, list[float], float, np.ndarray, np.ndarray, dict, list[LambdaTerm], list[bool], scp.sparse.csc_array, int, list[int]|None, list[int]|None]

mssm.src.python.gamm_solvers.correct_lambda_step_gamlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs: list[csc_array], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, list[int], list[int], csc_array, list[LambdaTerm], float, list[float], ndarray]

Updates and performs step-length control for the vector of lambda parameters of a GAMMLSS model. Essentially completes the steps described in section 3.3 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GAMLSSFamily) – Family of model

  • mus (list[np.ndarray]) – List of estimated means

  • y (np.ndarray) – Vector of observations

  • Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution

  • S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).

  • n_coef (int) – Number of coefficients

  • form_n_coef (list[int]) – List of number of coefficients per formula

  • form_up_coef (list[int]) – List of un-penalized number of coefficients per formula

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • gamlss_pen (list[LambdaTerm]) – List of penalties

  • lam_delta (np.ndarray) – Update to vector of lambda parameters

  • extend_by (dict) – Extension info dictionary

  • was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not

  • c_llk (float) – Current llk

  • fit_info (Fit_info) – A Fit_info object

  • outer (int) – Index of outer iteration

  • max_inner (int) – Maximum number of inner iterations

  • min_inner (int) – Minimum number of inner iterations

  • conv_tol (float) – Convergence tolerance

  • method (str) – Method to use to estimate coefficients

  • piv_tol (float) – Deprecated

  • keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None

  • extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary

  • extension_method_lam (str) – Which method to use to extend lambda proposals.

  • control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML.

  • repara (bool) – Whether to apply a stabilizing re-parameterization to the model

  • n_c (int) – Number of cores to use

Returns:

coef estimate under corrected lambda, split version of next coef estimate, next mus, next etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop, the new total penalty matrix, the new list of penalties, total edf, term-wise edfs, the update to the lambda vector

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, list[int], list[int], scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]

mssm.src.python.gamm_solvers.correct_lambda_step_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, gamma: float, method: str, qEFSH: str, overwrite_coef: bool, qEFS_init_converge: bool, optimizer: str, __old_opt: LinearOperator | None, use_grad: bool, __neg_pen_llk: Callable, __neg_pen_grad: Callable, piv_tol: float, keep_drop: list[list[int], list[int]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int, init_bfgs_options: dict, bfgs_options: dict) tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, csc_array | None, float, float, LinearOperator | None, list[int], list[int], csc_array, list[LambdaTerm], float, list[float], ndarray]

Updates and performs step-length control for the vector of lambda parameters of a GSMM model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GSMMFamily) – Model family

  • ys (list[np.ndarray]) – List of observation vectors

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).

  • n_coef (int) – Number of coefficients

  • form_n_coef (list[int]) – List of number of coefficients per formula

  • form_up_coef (list[int]) – List of un-penalized number of coefficients per formula

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • smooth_pen (list[LambdaTerm]) – List of penalties

  • lam_delta (np.ndarray) – Update to vector of lambda parameters

  • extend_by (dict) – Extension info dictionary

  • was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not

  • c_llk (float) – Current llk

  • fit_info (Fit_info) – A Fit_info object

  • outer (int) – Index of outer iteration

  • max_inner (int) – Maximum number of inner iterations

  • min_inner (int) – Minimum number of inner iterations

  • conv_tol (float) – Convergence tolerance

  • gamma (float) – Weight factor determining whether we should look for smoother or less smooth models

  • method (str) – Method to use to estimate coefficients (and lambda parameter)

  • qEFSH (str) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS')

  • overwrite_coef (bool) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS'. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect when qEFS_init_converge=True.

  • qEFS_init_converge (bool) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS'.

  • optimizer (str) – Deprecated

  • __old_opt (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood

  • use_grad (bool) – Deprecated

  • __neg_pen_llk (Callable) – Function to evaluate negative penalized log-likelihood

  • __neg_pen_grad (Callable) – Function to evaluate gradient of negative penalized log-likelihood

  • piv_tol (float) – Deprecated

  • keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None

  • extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary

  • extension_method_lam (str) – Which method to use to extend lambda proposals.

  • control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed.

  • repara (bool) – Whether to apply a stabilizing re-parameterization to the model

  • n_c (int) – Number of cores to use

  • init_bfgs_options (dict) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem. Only has an effect when qEFS_init_converge=True.

  • bfgs_options (dict) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'.

Returns:

coef estimate under corrected lambda, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former (or another instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation), covariance matrix of coefficients, next llk, next penalized llk, if the L-qEFS update is used to estimate coefficients/lambda parameters a scp.sparse.linalg.LinearOperator holding the previous quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop, new total penalty matrix, new list of penalties, total edf, term-wise edfs, the update to the lambda vector

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, scp.sparse.csc_array|None, float, float, scp.sparse.linalg.LinearOperator|None, list[int], list[int], scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]

mssm.src.python.gamm_solvers.deriv_transform_eta_beta(d1eta: list[ndarray], d2eta: list[ndarray], d2meta: list[ndarray], Xs, only_grad=False)

Further transforms derivatives of llk with respect to eta to get derivatives of llk with respect to coefficients Based on section 3.2 and Appendix A in Wood, Pya, & Säfken (2016)

References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

mssm.src.python.gamm_solvers.deriv_transform_mu_eta(y: ndarray, means: list[ndarray], family: GAMLSSFamily) tuple[list[ndarray], list[ndarray], list[ndarray]]

Compute derivatives (first and second order) of llk with respect to each linear predictor based on their respective mean for all observations following steps outlined by Wood, Pya, & Säfken (2016)

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • y (np.ndarray) – Vector of observations

  • means (list[np.ndarray]) – List holding vectors of mean estimates

  • family (GAMLSSFamily) – Family of the model

Returns:

A tuple containing a list containing the first order partial derivatives with respect to each parameter, the same for pure second derivatives, and a list containing mixed derivatives

Return type:

tuple[list[np.ndarray],list[np.ndarray],list[np.ndarray]]

mssm.src.python.gamm_solvers.drop_terms_S(penalties: list[LambdaTerm], keep: list[int]) list[LambdaTerm]

Zeros out rows and cols of penalty matrices corresponding to dropped terms. Roots are re-computed as well.

Parameters:
  • penalties (list[LambdaTerm]) – List of Lambda terms included in the model formula

  • keep (list[int]) – List of columns/rows to keep.

Returns:

List of updated penalties - a copy is made.

Return type:

list[LambdaTerm]

mssm.src.python.gamm_solvers.drop_terms_X(Xs: list[csc_array], keep: list[int]) tuple[list[csc_array], list[int]]

Drops cols of model matrices corresponding to dropped terms.

Parameters:
  • Xs (list[scp.sparse.csc_array]) – List of model matrices included in the model formula.

  • keep (list[int]) – List of columns to keep.

Returns:

Tuple, containing a list of updated model matrices - a copy is made - and a new list conatining the indices by which to split the coefficient vector.

Return type:

tuple[list[scp.sparse.csc_array],list[int]]

mssm.src.python.gamm_solvers.extend_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str) tuple[float, dict, bool]

Internal function. Performs an update to the lambda parameter, ideally extending the step aken without overshooting the objective.

Parameters:
  • lti (int) – Penalty index

  • lam (float) – Current lamda value

  • dLam (float) – The lambda update

  • extend_by (dict) – Extension info dictionary

  • was_extended (bool) – List holding indication per lambda parameter whether it was extended or not

  • method (str) – Extension method to use.

Raises:

ValueError – If requested method is not implemented

Returns:

Updated values for dLam,extend_by,was_extended

Return type:

tuple[float,dict,bool]

mssm.src.python.gamm_solvers.form_cross_prod_mp(should_cache: bool, cache_dir: str, file: str, fi: int, y_flat: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) tuple[csc_array, ndarray]

Computes X.T@X and X.T@y based on the data in file.

Parameters:
  • should_cache (bool) – whether or not the directory should actually be created

  • cache_dir (str) – path to cache directory

  • file (str) – File name

  • fi (int) – File index in all files

  • y_flat (np.ndarray) – Observation vector

  • terms (list[GammTerm]) – List of terms in model formula

  • has_intercept (bool) – Whether the formula has an intercept or not

  • ltx (list[int]) – Linear term indices

  • irstx (list[int]) – Impulse response function term indices

  • stx (list[int]) – Smooth term indices

  • rtx (list[int]) – Random term indices

  • var_types (dict) – Dictionary holding variable types

  • var_map (dict) – Dictionary mapping variable names to column indices in the encoded data

  • var_mins (dict) – Dictionary with variable minimums

  • var_maxs (dict) – Dictionary with variable maximums

  • factor_levels (dict) – Dictionary with levels associated with each factor

  • cov_flat_file (np.ndarray) – Encoded data based on file

  • cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray]

mssm.src.python.gamm_solvers.form_eta_mp(should_cache: bool, cache_dir: str, file: str, fi: int, coef: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) ndarray

Computed X@coef, where X is model matrix for file.

Parameters:
  • should_cache (bool) – whether or not the directory should actually be created

  • cache_dir (str) – path to cache directory

  • file (str) – File name

  • fi (int) – File index in all files

  • coef (np.ndarray) – Current coefficient estimate

  • terms (list[GammTerm]) – _description_

  • terms – List of terms in model formula

  • has_intercept (bool) – Whether the formula has an intercept or not

  • ltx (list[int]) – Linear term indices

  • irstx (list[int]) – Impulse response function term indices

  • stx (list[int]) – Smooth term indices

  • rtx (list[int]) – Random term indices

  • var_types (dict) – Dictionary holding variable types

  • var_map (dict) – Dictionary mapping variable names to column indices in the encoded data

  • var_mins (dict) – Dictionary with variable minimums

  • var_maxs (dict) – Dictionary with variable maximums

  • factor_levels (dict) – Dictionary with levels associated with each factor

  • cov_flat_file (np.ndarray) – Encoded data based on file

  • cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

X@coef for this file

Return type:

np.ndarray

mssm.src.python.gamm_solvers.gd_coef_smooth(coef: ndarray, grad: ndarray, S_emb: csc_array, a: float) ndarray

Follows sections 3.1.2 and 3.14 in WPS (2016) to update the coefficients of a GAMLSS/GSMM model via a Gradient descent (ascent actually) step.

1) Computes gradient of the penalized likelihood (grad - S_emb@coef) 3) Uses this to compute update 4) Step size control - happens outside

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • coef (np.ndarray) – Current coefficient estimate

  • grad (np.ndarray) – gradient of llk with respect to coef

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • a (float) – Step length for gradient descent update

Returns:

An updated estimate of the coefficients

Return type:

np.ndarray

mssm.src.python.gamm_solvers.grad_lambda(lgdet_deriv: float, ldet_deriv: float, bSb: float, scale: float) ndarray

Internal function. Computes gradient of REML criterion with respect to all lambda paraemters.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.

  • ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.

  • bSb (float) – cCoef.T@emb_SJ@cCoef where cCoef is current coefficient estimate

  • scale (float) – Optional scale parameter (or 1)

Returns:

The gradient of the reml criterion

Return type:

np.ndarray

mssm.src.python.gamm_solvers.handle_drop_gammlss(family: GAMLSSFamily, y: ndarray, coef: ndarray, keep: list[int], Xs: list[csc_array], S_emb: csc_array) tuple[ndarray, list[ndarray], list[int], list[csc_array], csc_array, list[ndarray], list[ndarray], float, float]

Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.

Parameters:
  • family (GAMLSSFamily) – Model family

  • y (np.ndarray) – Vector of observations

  • coef (np.ndarray) – Vector of coefficients

  • keep (list[int]) – List of parameter indices to keep.

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • S_emb (scp.sparse.csc_array) – Total penalty matrix.

Returns:

A tuple holding: reduced coef vector, split version of the reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated etas, mus, llk, and penalzied llk

Return type:

tuple[np.ndarray, list[np.ndarray], list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, list[np.ndarray], list[np.ndarray], float, float]

mssm.src.python.gamm_solvers.handle_drop_gsmm(family: GSMMFamily, ys: list[ndarray], coef: ndarray, keep: list[int], Xs: list[csc_array], S_emb: csc_array) tuple[ndarray, list[int], list[csc_array], csc_array, float, float]

Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.

Parameters:
  • family (GSMMFamily) – Model family

  • ys (list[np.ndarray]) – List with vector of observations

  • coef (np.ndarray) – Vector of coefficients

  • keep (list[int]) – List of parameter indices to keep.

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • S_emb (scp.sparse.csc_array) – Total penalty matrix.

Returns:

A tuple holding: reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated llk, and penalized llk

Return type:

tuple[np.ndarray, list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, float, float]

mssm.src.python.gamm_solvers.identify_drop(H: csc_array, S_scaled: csc_array, method: str = 'QR') tuple[list[int] | None, list[int] | None]

Routine to (approximately) identify the rank of the scaled negative hessian of the penalized likelihood based on a rank revealing QR decomposition or the methods by Foster (1986) and Gotsman & Toledo (2008).

If method=="QR", a rank revealing QR decomposition is performed for the scaled penalized Hessian. The latter has to be transformed to a dense matrix for this. This is essentially the approach by Wood et al. (2016) and is the most accurate. Alternatively, we can rely on a variant of Foster’s method. This is done when method=="LU" or method=="Direct". method=="LU" requires p LU decompositions - where p is approximately the Kernel size of the matrix. Essentially continues to find vectors forming a basis of the Kernel of the balanced penalzied Hessian from the upper matrix of the LU decomposition and successively drops columns corresponding to the maximum absolute value of the Kernel vectors (see Foster, 1986). This is repeated until we can form a cholesky of the scaled penalized hessian which as an acceptable condition number. If method=="Direct", the same procedure is completed, but Kernel vectors are found directly based on the balanced penalized Hessian, which can be less precise.

References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models. - Foster (1986). Rank and null space calculations using matrix decomposition without column interchanges. - Gotsman & Toledo (2008). On the Computation of Null Spaces of Sparse Rectangular Matrices. - mgcv source code, in particular: https://github.com/cran/mgcv/blob/master/R/gam.fit4.r

Parameters:
  • H (scp.sparse.csc_array) – Estimate of the hessian of the log-likelihood.

  • S_scaled (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).

  • method (str, optional) – Which method to use to check for rank deficiency, defaults to ‘QR’

Returns:

A tuple containing lists of the coefficients to keep and to drop, both of which are None when we don’t need to drop any.

Return type:

tuple[list[int]|None,list[int]|None]

mssm.src.python.gamm_solvers.init_step_gam(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, family: Family, col_S: int, penalties: list[LambdaTerm], pinv: str, n_c: int, formula: Formula, form_Linv: bool, method: str, offset: float | ndarray, Lrhoi: csc_array | None) tuple[float, float, ndarray, ndarray, ndarray, csc_array, csc_array, float, list[float], float, ndarray, ndarray, csc_array]

Internal function. Gets initial estimates for a GAM model for coefficients and proposes first lambda update.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • y (np.ndarray) – vector of observations

  • yb (np.ndarray) – vector of observations of the working model

  • mu (np.ndarray) – vector of mean estimates

  • eta (np.ndarray) – vector of linear predictors

  • rowsX (int) – Rows of model matrix

  • colsX (int) – Cols of model matrix

  • X (scp.sparse.csc_array) – Model matrix

  • Xb (scp.sparse.csc_array) – Model matrix of working model

  • family (Family) – Family of model

  • col_S (int) – Cols of penalty matrix

  • penalties (list[LambdaTerm]) – List of penalties

  • pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!

  • n_c (int) – Number of cores to use

  • formula (Formula) – Formula of the model

  • form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not

  • method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)

  • offset (float | np.ndarray) – Offset (fixed effect) to add to eta

  • Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

Returns:

A tuple containing the deviance dev, penalized deviance pen_dev,eta, mu, coef, CholXXS, InvCholXXS, total_edf, term_edfs, scale, wres, lam_delta, S_emb

Return type:

tuple[float, float, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, np.ndarray, np.ndarray, scp.sparse.csc_array]

mssm.src.python.gamm_solvers.initialize_extension(method: str, penalties: list[LambdaTerm]) dict

Internal function. Initializes a dictionary holding all the necessary information to compute the lambda extensions at every iteration of the fitting iteration.

Parameters:
  • method (str) – Which extension method to use

  • penalties (list[LambdaTerm]) – List of penalties

Returns:

extension info dictionary

Return type:

dict

mssm.src.python.gamm_solvers.keep_XTX(cov_flat: ndarray, y_flat: ndarray, formula: Formula, nc: int, progress_bar: bool) tuple[csc_array, ndarray]

Computes X.T@X and X.T@y in blocks.

Parameters:
  • cov_flat (np.ndarray) – Encoded data as np.array

  • y_flat (np.ndarray) – vector of observations

  • formula (Formula) – Formula of model

  • nc (int) – Number of cores to use

  • progress_bar (bool) – Whether to print progress or not

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray]

mssm.src.python.gamm_solvers.keep_eta(formula: Formula, coef: ndarray, nc: int) ndarray

Computes X@coef in parallel, where X is the overall model matrix and coef is current coefficient estimate.

Parameters:
  • formula (Formula) – Formula of model

  • coef (np.ndarray) – Current coefficient estimate

  • nc (int) – Number of cores to use

Returns:

X@coef

Return type:

np.ndarray

mssm.src.python.gamm_solvers.newton_coef_smooth(coef: ndarray, grad: ndarray, H: csc_array, S_emb: csc_array) tuple[ndarray, csc_array, csc_array, float]

Follows sections 3.1.2 and 3.14 in Wood, Pya, & Säfken (2016) to update the coefficients of a GAMLSS/GSMM model via a newton step.

  1. Computes gradient of the penalized likelihood (grad - S_emb@coef)

  2. Computes negative Hessian of the penalized likelihood (-1*H + S_emb) and it’s inverse.

  3. Uses these two to compute the Netwon step.

  4. Step size control - happens outside

References:
Parameters:
  • coef (np.ndarray) – Current coefficient estimate

  • grad (np.ndarray) – gradient of llk with respect to coef

  • H (scp.sparse.csc_array) – hessian of the llk

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

Returns:

A tuple containing an estimate of the coefficients, the un-pivoted cholesky of the penalized negative hessian, the inverse of the former, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible

Return type:

tuple[np.ndarray,scp.sparse.csc_array,scp.sparse.csc_array,float]

mssm.src.python.gamm_solvers.read_XTX(file: str, formula: Formula, nc: int) tuple[csc_array, ndarray, int]

Computes X.T@X and X.T@y for this file in parallel, reading data from file.

Parameters:
  • file (str) – File name

  • formula (Formula) – Formula of model

  • nc (int) – Number of cores to use

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray,int]

mssm.src.python.gamm_solvers.read_eta(file, formula: Formula, coef: ndarray, nc: int) ndarray

Computes X@coef in parallel, where X is the model matrix based on this file and coef is the current coefficient estimate.

Parameters:
  • file (str) – File name

  • formula (Formula) – Formula of model

  • coef (np.ndarray) – Current coefficient estimate

  • nc (int) – Number of cores to use

Returns:

X@coef

Return type:

np.ndarray

mssm.src.python.gamm_solvers.read_mmat(should_cache: bool, cache_dir: str, file: str, fi: int, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) csc_array

Creates model matrix for that dataset. The model-matrix is either cached or not. If the former is the case, the matrix is read in on subsequent calls to this function.

Parameters:
  • should_cache (bool) – whether or not the directory should actually be created

  • cache_dir (str) – path to cache directory

  • file (str) – File name

  • fi (int) – File index in all files

  • terms (list[GammTerm]) – List of terms in model formula

  • has_intercept (bool) – Whether the formula has an intercept or not

  • ltx (list[int]) – Linear term indices

  • irstx (list[int]) – Impulse response function term indices

  • stx (list[int]) – Smooth term indices

  • rtx (list[int]) – Random term indices

  • var_types (dict) – Dictionary holding variable types

  • var_map (dict) – Dictionary mapping variable names to column indices in the encoded data

  • var_mins (dict) – Dictionary with variable minimums

  • var_maxs (dict) – Dictionary with variable maximums

  • factor_levels (dict) – Dictionary with levels associated with each factor

  • cov_flat_file (np.ndarray) – Encoded data based on file

  • cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

model matrix associated with this file

Return type:

scp.sparse.csc_array

mssm.src.python.gamm_solvers.restart_coef(coef: ndarray, c_llk: float, c_pen_llk: float, n_coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array], S_emb: csc_array, family: GSMMFamily, outer: int, restart_counter: int) tuple[ndarray, float, float]

Shrink coef towards random vector to restart algorithm if it get’s stuck.

Parameters:
  • coef (np.ndarray) – Coefficient estimate

  • c_llk (float) – Current llk

  • c_pen_llk (float) – Current penalized llk

  • n_coef (np.ndarray) – Number of coefficients

  • coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution

  • ys (list[np.ndarray]) – List of observation vectors

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • family (GSMMFamily) – Model family

  • outer (int) – Outer iteration index

  • restart_counter (int) – Number of restarts already handled previously

Returns:

Updates for coef, c_llk, c_pen_llk

Return type:

tuple[np.ndarray, float, float]

mssm.src.python.gamm_solvers.restart_coef_gammlss(coef: ndarray, split_coef: list[ndarray], c_llk: float, c_pen_llk: float, etas: list[ndarray], mus: list[ndarray], n_coef: int, coef_split_idx: list[int], y: ndarray, Xs: list[csc_array], S_emb: csc_array, family: GAMLSSFamily, outer: int, restart_counter: int) tuple[ndarray, list[ndarray], float, float, list[ndarray], list[ndarray]]

Shrink coef towards random vector to restart algorithm if it get’s stuck.

Parameters:
  • coef (np.ndarray) – Coefficient estimate

  • split_coef (list[np.ndarray]) – Split of coefficient estimate

  • c_llk (float) – Current llk

  • c_pen_llk (float) – Current penalized llk

  • etas (list[np.ndarray]) – List of linear predictors

  • mus (list[np.ndarray]) – List of estimated means

  • n_coef (int) – Number of coefficients

  • coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution

  • y (np.ndarray) – Vector of observations

  • Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • family (GAMLSSFamily) – Model family

  • outer (int) – Outer iteration index

  • restart_counter (int) – Number of restarts already handled previously

Returns:

Updates for coef, split_coef, c_llk, c_pen_llk, etas, mus

Return type:

tuple[np.ndarray, list[np.ndarray], float, float, list[np.ndarray], list[np.ndarray]]

mssm.src.python.gamm_solvers.solve_gamm_sparse(mu_init: ndarray, y: ndarray, X: csc_array, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, max_inner: int = 100, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, method: str = 'Chol', check_cond: int = 2, progress_bar: bool = False, n_c: int = 10, offset: int = 0, Lrhoi: csc_array | None = None) tuple[ndarray, ndarray, ndarray, csc_array, csc_array, float, csc_array, float, list[float], float, Fit_info]

Estimates a Generalized Additive Mixed model. Implements the algorithms discussed in section 3.2 of the paper by Krause et al. (submitted).

Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017).

References:
  • Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • mu_init (np.ndarray) – Initial values for means

  • y (np.ndarray) – vector of observations

  • X (scp.sparse.csc_array) – Model matrix

  • penalties (list[LambdaTerm]) – List of penalties

  • col_S (int) – Columns of total penalty matrix

  • family (Family) – Family of model

  • maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10

  • max_inner (int, optional) – Maximum number of iterations for inner algorithm updating coefficients, defaults to 100

  • pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”

  • conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7

  • extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.

  • control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.

  • exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.

  • extension_method_lam (str, optional) – _description_, defaults to “nesterov”

  • form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True

  • method (str, optional) – Which method to use to solve for the coefficients (“Chol” or “Qr”), defaults to “Chol”

  • check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). When check_cond=2, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosen method., defaults to 2

  • progress_bar (bool, optional) – Whether to print progress or not, defaults to False

  • n_c (int, optional) – Number of cores to use, defaults to 10

  • offset (int, optional) – Offset (fixed effect) to add to eta, defaults to 0

  • Lrhoi (scp.sparse.csc_array | None, optional) – Optional covariance matrix of an ar1 model, defaults to None

Raises:
  • ArithmeticError – _description_

  • ArithmeticError – _description_

  • ArithmeticError – _description_

  • ArithmeticError – _description_

  • warnings.warn – _description_

Returns:

An estimate of the coefficients coef,the linear predictor eta, the working residuals wres, the root of the Fisher weights as matrix Wr, the matrix with Newton weights at convergence WN, an estimate of the scale parameter, an inverse of the cholesky of the penalized negative hessian InvCholXXS, total edf, term-wise edf, total penalty, a Fit_info object

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, scp.sparse.csc_array, float, list[float], float, Fit_info]

mssm.src.python.gamm_solvers.solve_gamm_sparse2(formula: Formula, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, progress_bar: bool = False, n_c: int = 10) tuple[ndarray, ndarray, ndarray, csc_array, float, csc_array | None, float, list[float], float, Fit_info]

Estimates an Additive Mixed model. Implements the algorithms discussed in section 3.1 of the paper by Krause et al. (submitted).

Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017). In addition, this function builds the products involving the model matrix only once (iteratively) as described by Wood et al. (2015).

References:
  • Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744

  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N., Goude, Y., & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(1), 139–155. https://doi.org/10.1111/rssc.12068

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • formula (Formula) – Formula of the model

  • penalties (list[LambdaTerm]) – List of penalties

  • col_S (int) – Columns of total penalty matrix

  • family (Family) – Family of model

  • maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10

  • pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”

  • conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7

  • extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.

  • control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.

  • exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.

  • extension_method_lam (str, optional) – Which method to use to extend lambda proposals., defaults to “nesterov”

  • form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True

  • progress_bar (bool, optional) – Whether to print progress or not, defaults to False

  • n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

An estimate of the coefficients coef, the linear predictor eta, the working residuals wres, the negative hessian, the estimated scale, an inverse of the cholesky of the negative penalized hessian, total edf, term-wise edfs, total penalty, a Fit_info object

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, float, scp.sparse.csc_array|None, float, list[float], float, Fit_info]

mssm.src.python.gamm_solvers.solve_gammlss_sparse(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 30, min_inner: int = 1, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10) tuple[ndarray, list[ndarray], list[ndarray], ndarray, csc_array, csc_array, float, list[float], float, list[LambdaTerm], Fit_info]

Fits a GAMLSS model - essentially completes the steps discussed in section 3.3 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016)

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GAMLSSFamily) – Model family

  • y (np.ndarray) – Vector of observations

  • Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution

  • form_n_coef (list[int]) – List of number of coefficients per formula

  • form_up_coef (list[int]) – List of un-penalized number of coefficients per formula

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • gamlss_pen (list[LambdaTerm]) – List of penalties

  • max_outer (int, optional) – Maximum number of outer iterations, defaults to 50

  • max_inner (int, optional) – Maximum number of inner iterations, defaults to 30

  • min_inner (int, optional) – Minimum number of inner iterations, defaults to 1

  • conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7

  • extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True

  • extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”

  • control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML., defaults to 1

  • method (str, optional) – Method to use to estimate coefficients, defaults to “Chol”

  • check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition())., defaults to 1

  • piv_tol (float, optional) – Deprecated, defaults to 0.175

  • repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True

  • should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True

  • prefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients., defaults to False

  • progress_bar (bool, optional) – Whether progress should be displayed, defaults to True

  • n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

coef estimate, etas, mus, working residuals, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, total edf, term-wise edfs, total penalty, final list of penalties, a Fit_info object

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, list[LambdaTerm], Fit_info]

mssm.src.python.gamm_solvers.solve_generalSmooth_sparse(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 50, min_inner: int = 50, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, optimizer: str = 'Newton', method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, form_VH: bool = True, use_grad: bool = False, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10, init_bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}, bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}) tuple[ndarray, csc_array | None, csc_array | LinearOperator, LinearOperator | None, float, list[float], float, list[LambdaTerm], Fit_info]

Fits a general smooth model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016). An even more general version of :func:solve_gammlss_sparse that can use the L-qEFS update by Krause et al. (submitted) to estimate the coefficients and lambda parameters. The update requires only a function to compute the log-likelihood and a function to compute the gradient of said likelihood with respect to the coefficients. Alternatively full Newton can be used - requiring a function to compute the hessian as well.

References:

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GSMMFamily) – Model family

  • ys (list[np.ndarray]) – List of observation vectors

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • form_n_coef (list[int]) – List of number of coefficients per formula

  • form_up_coef (list[int]) – List of un-penalized number of coefficients per formula

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • smooth_pen (list[LambdaTerm]) – List of penalties

  • max_outer (int, optional) – Maximum number of outer iterations, defaults to 50

  • max_inner (int, optional) – Maximum number of inner iterations, defaults to 50

  • min_inner (int, optional) – Minimum number of inner iterations, defaults to 50

  • conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7

  • extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True

  • extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”

  • control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed, defaults to 1

  • optimizer (str, optional) – Deprecated, defaults to “Newton”

  • method (str, optional) – Which method to use to estimate the coefficients (and lambda parameters), defaults to “Chol”

  • check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()), defaults to 1

  • piv_tol (float, optional) – Deprecated, defaults to 0.175

  • repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True

  • should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True

  • form_VH (bool, optional) – Whether to explicitly form matrix V - the estimated inverse of the negative Hessian of the penalized likelihood - and H - the estimate of the Hessian of the log-likelihood - when using the qEFS method, defaults to True

  • use_grad (bool, optional) – Deprecated, defaults to False

  • gamma (float, optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models, defaults to 1

  • qEFSH (str, optional) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'), defaults to ‘SR1’

  • overwrite_coef (bool, optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS', defaults to True

  • max_restarts (int, optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when method='qEFS'. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more, defaults to 0

  • qEFS_init_converge (bool, optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS', defaults to True

  • prefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients, defaults to False

  • progress_bar (bool, optional) – Whether progress should be printed or not, defaults to True

  • n_c (int, optional) – Number of cores to use, defaults to 10

  • init_bfgs_options (_type_, optional) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem, defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}

  • bfgs_options (_type_, optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS', defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}

Returns:

coef estimate, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, if method=='qEFS' an instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation, total edf, term-wise edfs, total penalty, final list of penalties, a Fit_info object

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, scp.sparse.linalg.LinearOperator|None, float, list[float], float, list[LambdaTerm], Fit_info]

mssm.src.python.gamm_solvers.step_fellner_schall_sparse(lgdet_deriv: float, ldet_deriv: float, bSb: float, cLam: float, scale: float) float

Internal function. Compute a generalized Fellner Schall update step for a lambda term. This update rule is discussed in Wood & Fasiolo (2017).

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.

  • ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.

  • bSb (float) – cCoef.T@emb_SJ@cCoef where cCoef is current coefficient estimate

  • cLam (float) – Current lambda value

  • scale (float) – Optional scale parameter (or 1)

Returns:

The additive update to cLam

Return type:

float

mssm.src.python.gamm_solvers.test_SR1(sk: ndarray, yk: ndarray, rho: ndarray, sks: ndarray, yks: ndarray, rhos: ndarray) bool

Test whether SR1 update is well-defined for both V and H.

Relies on steps discussed by Byrd, Nocdeal & Schnabel (1992).

References:
  • Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:
  • sk (np.ndarray) – New update vector sk

  • yk (np.ndarray) – New update vector yk

  • rho (np.ndarray) – New rho

  • sks (np.ndarray) – Previous update vectors sk

  • yks (np.ndarray) – Previous update vector sks

  • rhos (np.ndarray) – Previous rhos

Returns:

Check whether SR1 update is well-defined for both V and H.

Return type:

bool

mssm.src.python.gamm_solvers.undo_extension_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str, family: Family) tuple[float, float]

Internal function. Deals with resetting any extension terms.

Parameters:
  • lti (int) – Penalty index

  • lam (float) – Current lamda value

  • dLam (float) – The lambda update

  • extend_by (dict) – Extension info dictionary

  • was_extended (bool) – List holding indication per lambda parameter whether it was extended or not

  • method (str) – Extension method to use.

  • family (Family) – model family

Raises:

ValueError – If requested method is not implemented

Returns:

Updated values for lam and dlam

Return type:

tuple[float,float]

mssm.src.python.gamm_solvers.update_PIRLS(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, X: csc_array, Xb: csc_array, family: Family, Lrhoi: csc_array | None) tuple[ndarray, csc_array, ndarray | None, csc_array | None]

Internal function. Updates the pseudo-weights and observation vector yb and model matrix Xb of the working model.

Note: Dimensions of yb and Xb might not match those of y and X since rows of invalid pseudo-data observations are dropped here.

Parameters:
  • y (np.ndarray) – vector of observations

  • yb (np.ndarray) – vector of observations of the working model

  • mu (np.ndarray) – vector of mean estimates

  • eta (np.ndarray) – vector of linear predictors

  • X (scp.sparse.csc_array) – Model matrix

  • Xb (scp.sparse.csc_array) – Model matrix of working model

  • family (Family) – Family of model

  • Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

Returns:

Updated observation vector yb and model matrix Xb of the working model, pseudo-weights, and a diagonal sparse matrix holding the root of the Fisher weights. Latter two are None for strictly additive models.

Return type:

tuple[np.ndarray,scp.sparse.csc_array,np.ndarray|None,scp.sparse.csc_array|None]

mssm.src.python.gamm_solvers.update_coef(yb: ndarray, X: csc_array, Xb: csc_array, family: Family, S_emb: csc_array, S_root: csc_array | None, n_c: int, formula: Formula | None, offset: float | ndarray) tuple[ndarray, ndarray, ndarray, list[int], csc_array, csc_array, list[int] | None, list[int] | None]

Internal function. Estimates the coefficients of the model and updates the linear predictor and mean estimates.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • yb (np.ndarray) – vector of observations of the working model

  • X (scp.sparse.csc_array) – Model matrix

  • Xb (scp.sparse.csc_array) – Model matrix of working model

  • family (Family) – Family of Model

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None

  • n_c (int) – Number of cores

  • formula (Formula | None) – Formula of model or None

  • offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

A tuple containing the linear predictor eta, the estimated means mu, the estimated coefficients, the column permutation indices Pr, the column permutation matirx P, the cholesky of the pivoted penalized negative hessian, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, list[int], scp.sparse.csc_array, scp.sparse.csc_array, list[int]|None, list[int]|None]

mssm.src.python.gamm_solvers.update_coef_and_scale(y: ndarray, yb: ndarray, z: ndarray, Wr: csc_array, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, Lrhoi: csc_array | None, family, S_emb: csc_array, S_root: csc_array | None, S_pinv: csc_array, FS_use_rank: list[bool], penalties: list[LambdaTerm], n_c: int, formula: Formula, form_Linv: bool, offset: float | ndarray) tuple[ndarray, ndarray, ndarray, csc_array | None, list[float], list[float], float, list[float], list[csc_array], float, ndarray, list[int] | None, list[int] | None]

Internal function to update the coefficients and (optionally) scale parameter of the model.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – vector of observations

  • yb (np.ndarray) – vector of observations of the working model

  • z (np.ndarray) – vector of pseudo-data (can contain NaNs for invalid observations)

  • Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights

  • rowsX (int) – Rows of model matrix

  • colsX (int) – Cols of model matrix

  • X (scp.sparse.csc_array) – Model matrix

  • Xb (scp.sparse.csc_array) – Model matrix of working model

  • Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

  • family (Family) – Family of model

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None

  • S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix

  • FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used

  • penalties (list[LambdaTerm]) – List of penalties

  • n_c (int) – Number of cores

  • formula (Formula) – Formula of the model

  • form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not

  • offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

A tuple containing the linear predictor eta, the estimated means mu, the estimated coefficients, the unpivoted cholesky of the penalized negative hessian, the inverse of the former (optional), derivative of \(log(|\mathbf{S}_\lambda|_+)\) with respect to lambdas, cCoef.T@emb_SJ@cCoef for each SJ, total edf, termwise edf, Bs, scale estimate, working residuals, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array|None, list[float], list[float], float, list[float], list[scp.sparse.csc_array], float, np.ndarray, list[int]|None, list[int]|None]

mssm.src.python.gamm_solvers.update_coef_gammlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs, coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array, FS_use_rank: list[bool], gammlss_penalties: list[LambdaTerm], c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, list[int] | None, list[int] | None]

Repeatedly perform Newton update with step length control to the coefficient vector - essentially implements algorithm 3 from the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016). Checks for rank deficiency when method != "Chol".

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GAMLSSFamily) – Family of model

  • mus (list[np.ndarray]) – List of estimated means

  • y (np.ndarray) – Vector of observations

  • Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • S_emb – Total penalty matrix - normalized/scaled for rank checks

  • S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix

  • FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used

  • gammlss_penalties (list[LambdaTerm]) – List of penalties

  • c_llk (float) – Current llk

  • outer (int) – Index of outer iteration

  • max_inner (int) – Maximum number of inner iterations

  • min_inner (int) – Minimum number of inner iterations

  • conv_tol (float) – Convergence tolerance

  • method (str) – Method to use to estimate coefficients

  • piv_tol (float) – Deprecated

  • keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None

Returns:

A tuple containing an estimate of all coefficients, a split version of the former, updated values for mus, etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, list[int] | None, list[int] | None]

mssm.src.python.gamm_solvers.update_coef_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array, FS_use_rank: list[bool], smooth_pen: list[LambdaTerm], c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None, opt_raw: LinearOperator | None) tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, float, float, float, list[int] | None, list[int] | None]

Repeatedly perform Newton/Gradient/L-qEFS update with step length control to the coefficient vector - essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:
  • family (GSMMFamily) – Model family

  • ys (list[np.ndarray]) – List of observation vectors

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • coef (np.ndarray) – Coefficient estimate

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter of llk.

  • S_emb (scp.sparse.csc_array) – Total penalty matrix

  • S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).

  • S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix

  • FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used

  • smooth_pen (list[LambdaTerm]) – List of penalties

  • c_llk (float) – Current llk

  • outer (int) – Index of outer iteration

  • max_inner (int) – Maximum number of inner iterations

  • min_inner (int) – Minimum number of inner iterations

  • conv_tol (float) – Convergence tolerance

  • method (str) – Method to use to estimate coefficients

  • piv_tol (float) – Deprecated

  • keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None

  • opt_raw (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood

Returns:

A tuple containing an estimate of all coefficients, the negative hessian of the log-likelihood,cholesky of negative hessian of the penalized log-likelihood,inverse of the former (or another instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation), new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, float, float, float, list[int]|None, list[int]|None]

mssm.src.python.gamm_solvers.update_scale_edf(y: ndarray, z: ndarray, eta: ndarray, Wr: csc_array, rowsX: int, colsX: int, LP: csc_array | None, InvCholXXSP: csc_array | None, Pr: list[int], lgdetDs: list[float], Lrhoi: csc_array | None, family: Family, penalties: list[LambdaTerm], keep: list[int] | None, drop: list[int], n_c: int) tuple[ndarray, csc_array | None, float, list[float], list[csc_array], float]

Internal function. Updates the scale of the model. For this the edf are computed as well - they are returned as well because they are needed for the lambda step.

References:
  • Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • y (np.ndarray) – vector of observations

  • z (np.ndarray) – vector of pseudo-data (can contain NaNs for invalid observations)

  • eta (np.ndarray) – vector of linear predictors

  • Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights

  • rowsX (int) – Rows of model matrix

  • colsX (int) – Cols of model matrix

  • LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None

  • InvCholXXSP (scp.sparse.csc_array | None) – Inverse of LP, or None

  • Pr (list[int]) – Permutation list of LP

  • lgdetDs (list[float]) – List of derivatives of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambdas.

  • Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

  • family (Family) – Family of model

  • penalties (list[LambdaTerm]) – List of penalties

  • keep (list[int] | None) – List of coefficients to keep, can be None -> keep all

  • drop (list[int]) – List of coefficients to drop

  • n_c (int) – Number of cores to use

Returns:

a tuple containing the working residuals, optionally the unpivoted inverse of LP, total edf, term-wise edf, Bs, scale estimate

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, float, list[float], list[scp.sparse.csc_array], float]

mssm.src.python.matrix_solvers module

mssm.src.python.matrix_solvers.compute_B(L: csc_array, P: csc_array, lTerm: LambdaTerm, n_c: int = 10, drop: list[int] | None = None) float | tuple[float, float]

Solves L @ B = P @ lTerm.D_J_emb for B, then returns B.power(2).sum() or two approximations of this (for very big factor smooth models).

Parameters:
  • L (scp.sparse.csc_array) – Lower triangular sparse matrix

  • P (scp.sparse.csc_array) – Permuation matrix

  • lTerm (LambdaTerm) – Current penalty term

  • n_c (int, optional) – Number of cores, defaults to 10

  • drop (list[int] | None, optional) – Any parameters (columns/rows of lTerm.D_J_emb) to drop, defaults to None

Returns:

sum(B.power(2).sum() or sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights) with cluster weights obtained from mssm.src.python.formula.__cluster_discretize().

Return type:

float | tuple[float, float]

mssm.src.python.matrix_solvers.compute_Linv(L: csc_array, n_c: int = 10) csc_array

Solves L @ inv(L) = I for inv(L) optionally parallelizing over column blocks of I.

Parameters:
  • L (scp.sparse.csc_array) – Lower triangular sparse matrix

  • n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

inv(L)

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.compute_block_B_shared(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array) float

Solves L @ B = T for B via forward solving and based on shared memory for L, then computes and returns B.power(2).sum().

Parameters:
  • address_dat (str) – Address to data array of L

  • address_ptr (str) – Address to pointer array of L

  • address_idx (str) – Address to indices array of L

  • shape_dat (tuple) – Shape of data array of L

  • shape_ptr (tuple) – Shape of pointer array of L

  • rows (int) – Number of rows of L

  • cols (int) – Number of cols of L

  • nnz (int) – Number of non-zero elements in L

  • T (scp.sparse.csc_array) – Target matrix

Returns:

B.power(2).sum()

Return type:

float

mssm.src.python.matrix_solvers.compute_block_B_shared_cluster(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array, cluster_weights: list[float]) tuple[float, float]

Solves L @ B = T for B via forward solving and based on shared memory for L, then computes and returns sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights).

Parameters:
  • address_dat (str) – Address to data array of L

  • address_ptr (str) – Address to pointer array of L

  • address_idx (str) – Address to indices array of L

  • shape_dat (tuple) – Shape of data array of L

  • shape_ptr (tuple) – Shape of pointer array of L

  • rows (int) – Number of rows of L

  • cols (int) – Number of cols of L

  • nnz (int) – Number of non-zero elements in L

  • T (scp.sparse.csc_array) – Target matrix

  • cluster_weights (list[float]) – Cluster weights obtained from mssm.src.python.formula.__cluster_discretize().

Returns:

sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights)

Return type:

tuple[float,float]

mssm.src.python.matrix_solvers.compute_block_linv_shared(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array) csc_array

Solves L@B = T where L is available in shared memory and T is a column subset of the identity matrix.

Parameters:
  • address_dat (str) – Address to data array of L

  • address_ptr (str) – Address to pointer array of L

  • address_idx (str) – Address to indices array of L

  • shape_dat (tuple) – Shape of data array of L

  • shape_ptr (tuple) – Shape of pointer array of L

  • rows (int) – Number of rows of L

  • cols (int) – Number of cols of L

  • nnz (int) – Number of non-zero elements in L

  • T (scp.sparse.csc_array) – Target matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_backsolve_tr(A: csc_array, C: csc_array) csc_array

Solves A@B=C, where A is sparse and upper triangular. This can be utilized to obtain B = inv(A), when C is the identity.

Parameters:
  • A (scp.sparse.csc_array) – Lower triangluar sparse matrix

  • C (scp.sparse.csc_array) – Sparse potentially rectangular matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_chol(A: csc_array) tuple[csc_array, int]

Computes Cholesky of A.

Parameters:

A (scp.sparse.csc_array) – Some square symmetric matrix

Returns:

Returns Cholesky and code indicating success

Return type:

tuple[scp.sparse.csc_array,int]

mssm.src.python.matrix_solvers.cpp_cholP(A: csc_array) tuple[csc_array, list[int], int]

Computes pivoted Cholesky of A.

Parameters:

A (scp.sparse.csc_array) – Some square symmetric matrix

Returns:

Returns pivoted Cholesky, pivoted column order, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_dqrr(A: ndarray) tuple[list[int], int]

Computes pivoted QR decomposition of dense matrix A.

Parameters:

A (np.ndarray) – Some matrix

Returns:

column pivot order for rank estimation, estimated rank

Return type:

tuple[list[int],int]

mssm.src.python.matrix_solvers.cpp_qr(A: csc_array) tuple[csc_array, csc_array, list[int], int]

Computes pivoted QR decomposition of A.

Parameters:

A (scp.sparse.csc_array) – Some matrix

Returns:

Matrices Q, R, pivoted column order, and code indicating success

Return type:

tuple[scp.sparse.csc_array,scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_qrr(A: csc_array) tuple[csc_array, list[int], int, int]

Computes pivoted QR decomposition of A and returns rank estimate

Parameters:

A (scp.sparse.csc_array) – Some matrix

Returns:

Matrices Q, R, pivoted column order, estimated rank, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],int,int]

mssm.src.python.matrix_solvers.cpp_solve_L(X: csc_array, S: csc_array) tuple[csc_array, list[int], int]

Solves (X.T@X + S)@B=I for B, where (X.T@X + S) is sparse, symmetric, and full rank and I is an identity matrix of suitable dimension via Cholesky decomposition.

Parameters:
  • X (scp.sparse.csc_array) – Some rectangular sparse matrix

  • S (scp.sparse.csc_array) – Sparse square matrix

Returns:

B (inverse of pivoted X.T@X + S), list of pivot indices, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_solve_LXX(A: csc_array) tuple[csc_array, list[int], int]

Solves A@B=I for B, where A is sparse, symmetric, and full rank and I is an identity matrix of suitable dimension via Cholesky decomposition.

Parameters:

A (scp.sparse.csc_array) – Some sparse symmetric matrix

Returns:

B (inverse of pivoted A), list of pivot indices, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_solve_am(y: ndarray, X: csc_array, S: csc_array) tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition and computes inverse of pivoted Cholesky of X.T@X + S.

Parameters:
  • y (np.ndarray) – vector of observations

  • X (scp.sparse.csc_array) – Some rectangular sparse matrix

  • S (scp.sparse.csc_array) – Sparse square matrix

Returns:

Inverse of pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coef(y: ndarray, X: csc_array, S: csc_array) tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition.

Parameters:
  • y (np.ndarray) – vector of observations

  • X (scp.sparse.csc_array) – Some rectangular sparse matrix

  • S (scp.sparse.csc_array) – Sparse square matrix

Returns:

Pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coefXX(Xy: ndarray, XXS: csc_array) tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition with (X.T@X + S) and X.T@y pre-computed.

Parameters:
  • Xy (np.ndarray) – Holds X.T@y

  • XXS (scp.sparse.csc_array) – Holds (X.T@X + S)

Returns:

Pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coef_pqr(y: ndarray, X: csc_array, E: csc_array) tuple[csc_array, list[int], list[int], ndarray, int, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse QR decomposition, where E.T@E=S.

Does not form ``X.T@X + S`` for solve. Potentially pivots twice - once for sparsity (always) and then once more whenever algorithm detects a diagonal element that is too small.

Examples:

# Solve
RP,Pr1,Pr2,coef,rank,code = cpp_solve_coef_pqr(yb,Xb,S_root.T.tocsc())

# Need to get overall pivot...
P1 = compute_eigen_perm(Pr1)
P2 = compute_eigen_perm(Pr2)
P = P2.T@P1.T

# Need to insert zeroes in case of rank deficiency - first insert nans to that we
# can then easily find dropped coefs.
if rank < S_emb.shape[1]:
   coef = np.concatenate((coef,[np.nan for _ in range(S_emb.shape[1]-rank)]))

# Can now unpivot coef
coef = coef @ P

# And identify which coef was dropped
idx = np.arange(len(coef))
drop = idx[np.isnan(coef)]
keep = idx[np.isnan(coef)==False]

# Now actually set dropped ones to zero
coef[drop] = 0

# Convert R so that rest of code can just continue as with Chol (i.e., L)
LP = RP.T.tocsc()

# Keep only columns of Pr/P that belong to identifiable params. So P.T@LP is Cholesky of negative penalized Hessian
# of model without unidentifiable coef. Important: LP and Pr/P no longer match dimensions of embedded penalties
# after this! So we need to keep track of that in the appropriate functions (i.e., `calculate_edf` which calls
# `compute_B` when called with only LP and not Linv).
P = P[:,keep]
_,Pr,_ = translate_sparse(P.tocsc())
P = compute_eigen_perm(Pr)
Parameters:
  • y (np.ndarray) – vector of observations

  • X (scp.sparse.csc_array) – Some rectangular sparse matrix

  • E (scp.sparse.csc_array) – Sparse square matrix

Returns:

Pivoted Cholesky of X.T@X + S, first column pivot indices in a list, second column pivot indices in a list, b, estimated rank, and code indicating success.

Return type:

tuple[scp.sparse.csc_array,list[int],list[int],np.ndarray,int,int]

mssm.src.python.matrix_solvers.cpp_solve_qr(A: csc_array) tuple[csc_array, int, int]

Solves A@B=I for B, where A is sparse, square, and full rank and I is an identity matrix of suitable dimension via QR decomposition.

Parameters:

A (scp.sparse.csc_array) – Some sparse square matrix

Returns:

B (inverse of A), estimated rank, and code indicating success

Return type:

tuple[scp.sparse.csc_array,int,int]

mssm.src.python.matrix_solvers.cpp_solve_tr(A: csc_array, C: csc_array) csc_array

Solves A@B=C, where A is sparse and lower triangular. This can be utilized to obtain B = inv(A), when C is the identity.

Parameters:
  • A (scp.sparse.csc_array) – Lower triangluar sparse matrix

  • C (scp.sparse.csc_array) – Sparse potentially rectangular matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_symqr(A: csc_array, tol: float) tuple[csc_array, list[int], list[int], int, int]

Computes pivoted QR decomposition of symmetric matrix A.

Parameters:
  • A (scp.sparse.csc_array) – Some symmetric matrix

  • tol (float) – tolerance for rank estimation

Returns:

Matrix R, column pivot order for sparsity, column pivot order for rank estimation, rank estimate, code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],list[int],int,int]

mssm.src.python.matrix_solvers.est_condition(L: csc_array, Linv: csc_array, seed: int | None = 0, verbose: bool = True) tuple[float, float, float, int]

Estimate the condition number K - the ratio of the largest to smallest singular values - of matrix A, where A.T@A = L@L.T.

L and Linv can either be obtained by Cholesky decomposition, i.e., A.T@A = L@L.T or by QR decomposition A=Q@R where R=L.T.

If verbose=True (default), separate warnings will be issued in case K>(1/(0.5*sqrt(epsilon))) and K>(1/(0.5*epsilon)). If the former warning is raised, this indicates that computing L via a Cholesky decomposition is likely unstable and should be avoided. If the second warning is raised as well, obtaining L via QR decomposition (of A) is also likely to be unstable (see Golub & Van Loan, 2013).

References:
  • Cline et al. (1979). An Estimate for the Condition Number of a Matrix.

  • Golub & Van Loan (2013). Matrix computations, 4th edition.

Parameters:
  • L (scp.sparse.csc_array) – Cholesky or any other root of A.T@A as a sparse matrix.

  • Linv (scp.sparse.csc_array) – Inverse of Choleksy (or any other root) of A.T@A.

  • seed (int or None or numpy.random.Generator) – The seed to use for the random parts of the singular value decomposition. Defaults to 0.

  • verbose (bool) – Whether or not warnings should be printed. Defaults to True.

Returns:

A tuple, containing the estimate of condition number K, an estimate of the largest singular value of A, an estimate of the smallest singular value of A, and a code. The latter will be zero in case no warning was raised, 1 in case the first warning described above was raised, and 2 if the second warning was raised as well.

Return type:

tuple[float,float,float,int]

mssm.src.python.matrix_solvers.map_csc_to_eigen(X: csc_array) tuple[int, int, int, ndarray, ndarray, ndarray]

Pybind11 comes with copy overhead for sparse matrices, so instead of passing the sparse matrix to c++, I pass the data, indices, and indptr arrays as buffers to c++. see: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html.

An Eigen mapping can then be used to refer to these, without requiring an extra copy. see: https://eigen.tuxfamily.org/dox/classEigen_1_1Map_3_01SparseMatrixType_01_4.html

The mapping needs to assume compressed storage, since then we can use the indices, indptr, and data arrays directly for the valuepointer, innerPointer, and outerPointer fields of the sparse array map constructor. see: https://eigen.tuxfamily.org/dox/group__TutorialSparse.html (section sparse matrix format).

I got this idea from the NumpyEigen project, which also uses such a map! see: https://github.com/fwilliams/numpyeigen/blob/master/src/npe_sparse_array.h#L74

Parameters:

X (scp.sparse.csc_array) – Some sparse matrix

Returns:

Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)

Return type:

tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.matrix_solvers.map_csr_to_eigen(X: csr_array) tuple[int, int, int, ndarray, ndarray, ndarray]

see: map_csc_to_eigen()

Parameters:

X (scp.sparse.csr_array) – Some sparse matrix

Returns:

Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)

Return type:

tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.matrix_solvers.translate_sparse(mat: csc_array) tuple[ndarray, ndarray, ndarray]

Translate canonical sparse csc matrix representation into data, row, col representation

See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_array.html#scipy.sparse.csc_array

Parameters:

mat (scp.sparse.csc_array) – sparse matrix

Returns:

data, rows, cols of sparse matrix

Return type:

tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.penalties module

class mssm.src.python.penalties.DifferencePenalty

Bases: Penalty

Difference Penalty class. Generates penalty matrices for smooth terms.

Variables:

pen_type (PenType.DIFFERENCE) – Type of the penalty matrix.

constructor(n: int, constraint: ConstType | None, m: int = 2) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates difference (order=m) n*n penalty matrix + root of the penalty. Based on code in Eilers & Marx (1996) and Wood (2017).

References:
  • Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • n (int) – Dimension of square penalty matrix

  • constraint (ConstType|None) – Any contraint to absorb by the penalty or None if no constraint is required

  • m (int, optional) – Differencing order to apply to the identity matrix to get the penalty (this will also be the dimension of the penalty’s Kernel), defaults to 2

Returns:

penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

class mssm.src.python.penalties.IdentityPenalty(pen_type: PenType)

Bases: Penalty

Difference Penalty class. Generates penalty matrices for smooth terms and random terms.

Parameters:

pen_type (PenType) – Type of the penalty matrix

Variables:

pen_type (PenType) – Type of the penalty matrix passed to init method.

constructor(n: int, constraint: ConstType | None, f: Callable | None = None) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates identity matrix penalty + root in case f is None.

Note: This penalty never absorbs marginal constraints. It always returns an identity matrix but just decreases n by 1 if constraint is not None to ensure that the returned penalty matrix is of suitable dimensions.

Parameters:
  • n (int) – Dimension of square penalty matrix

  • constraint (ConstType|None) – Any contraint to absorb by the penalty or None if no constraint is required

  • f (Callable|None, optional) – Any kind of function to apply to the diagonal elements of the penalty, defaults to None

Returns:

penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

class mssm.src.python.penalties.Penalty(pen_type: PenType)

Bases: object

Penalty base-class. Generates penalty matrices for smooth terms.

Parameters:

pen_type (PenType) – Type of the penalty matrix

Variables:

pen_type (PenType) – Type of the penalty matrix passed to the init method.

constructor(n: int, constraint: ConstType | None, *args, **kwargs) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates penalty matrix + root of the penalty and returns both in list form (data, row indices, col indices).

Parameters:
  • n (int) – Dimension of square penalty matrix

  • constraint (ConstType | None) – Any contraint to absorb by the penalty or None if no constraint is required

Returns:

penalty data, penalty row indices, penalty column indices, root of penalty data, root of penalty row indices, root of penalty column indices, rank of penalty

Return type:

tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

mssm.src.python.penalties.TP_pen(S_j: csc_array, D_j: csc_array, j: int, ks: list[int], constraint: ConstType | None) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Computes a tensor smooth penalty + root as defined in section 5.6 of Wood (2017) based on marginal penalty matrix S_j.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • S_j (scp.sparse.csc_array) – Marginal penalty matrix

  • D_j (scp.sparse.csc_array) – Root of marginal penalty matrix

  • j (int) – Index for current marginal

  • ks (list[int]) – List of number of basis functions of all marginals

  • constraint (ConstType | None) – Any constraint to absorb by the final penalty or None if no constraint is required

Returns:

penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

mssm.src.python.penalties.adjust_pen_drop(dat: list[float], rows: list[int], cols: list[int], drop: list[int], offset: int = 0) tuple[list[float], list[int], list[int], int]

Adjusts penalty matrix (represented via dat, rows, and cols) by dropping rows and columns indicated by drop.

Optionally, offset is added to the elements in rows and cols, which is useful when indices in drop do not start at zero.

Parameters:
  • dat ([float]) – List of elements in penalty matrix.

  • rows ([int]) – List of row indices of penalty matrix.

  • cols ([int]) – List of column indices of penalty matrix.

  • drop ([int]) – Rows and columns to drop from penalty matrix. Might actually contain indices corresponding to rows + offset and cols + offset, which can be corrected for via the offset argument.

  • offset (int, optional) – An optional offset to add to rows and cols to adjust for the indexing in drop, defaults to 0

Returns:

A tuple with 4 elements: the data, rows, and cols of the adjusted penalty matrix excluding dropped elements and the number of excluded elements.

Return type:

tuple[list[float],list[int],list[int],int]

mssm.src.python.penalties.embed_in_S_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], S_emb: csc_array | None, S_col: int, SJ_col: int, cIndex: int) tuple[csc_array, int]

Embed a term-specific penalty matrix SJ (provided as three lists: pen_data, pen_rows and pen_cols) into the total penalty matrix S_emb (see Wood, 2017)

Parameters:
  • pen_data (list[float]) – Data of SJ

  • pen_rows (list[int]) – Row indices of SJ

  • pen_cols (list[int]) – Column indices of SJ

  • S_emb (scp.sparse.csc_array | None) – Total penalty matrix or None in case S_emb will be initialized by the function.

  • S_col (int) – Columns of total penalty matrix

  • SJ_col (int) – Columns of SJ

  • cIndex (int) – Current row and column index indicating the top left cell of the (SJ_col * SJ_col) block SJ should take up in S_emb

Returns:

S_emb with SJ embedded, the updated cIndex (i.e., cIndex + SJ_col)

Return type:

tuple[scp.sparse.csc_array,int]

mssm.src.python.penalties.embed_in_Sj_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], Sj: csc_array | None, SJ_col: int) csc_array

Parameterize a term-specific penalty matrix SJ (provided as three lists: pen_data, pen_rows and pen_cols).

Parameters:
  • pen_data (list[float]) – Data of SJ

  • pen_rows (list[int]) – Row indices of SJ

  • pen_cols (list[int]) – Column indices of SJ

  • Sj (scp.sparse.csc_array | None) – A sparse matrix or None. In the latter case, SJ is simply initialized by the function. If not, then the function returns SJ + Sj. The latter is useful if a term penalty is a sum of individual penalty matrices.

  • SJ_col (int) – Columns of SJ

Returns:

SJ which might actually be SJ + Sj.

Return type:

scp.sparse.csc_array

mssm.src.python.penalties.embed_shared_penalties(shared_penalties: list[list[LambdaTerm]], formulas: list, extra_coef: int) list[LambdaTerm]

Embed penalties from individual formulas into overall penalties for GAMMLSS/GSMM models.

Parameters:
  • shared_penalties (list[list[LambdaTerm]]) – Nested list, with the inner one containing the penalties associated with an individual formula in formulas.

  • formulas (list) – List of mssm.src.python.formula.Formula objects

  • extra_coef (int) – Number of extra coefficients required by the model’s family. Will result in the shared penalties being padded by an extra block of extra_coef zeroes.

Returns:

A list of the embedded penalties required by a GAMMLSS or GSMM model.

Return type:

list[LambdaTerm]

mssm.src.python.repara module

mssm.src.python.repara.reparam(X: csc_array | None, S: list[LambdaTerm], cov: ndarray | None, option: int = 1, n_bins: int = 30, QR: bool = False, identity: bool = False, scale: bool = False) tuple

Options 1 - 3 are natural reparameterization discussed in Wood (2017; 5.4.2) with different strategies for the QR computation of \(\mathbf{X}\). Option 4 helps with stabilizing the REML computation and is from Appendix B of Wood (2011) and section 6.2.7 in Wood (2017):

  1. Form complete matrix \(\mathbf{X}\) based on entire covariate.

  2. Form matrix \(\mathbf{X}\) only based on unique covariate values.

  3. Form matrix \(\mathbf{X}\) on a sample of values making up covariate. Covariate is split up into n_bins equally wide bins. The number of covariate values per bin is then calculated. Subsequently, the ratio relative to minimum bin size is computed and each ratio is rounded to the nearest integer. Then ratio samples are obtained from each bin. That way, imbalance in the covariate is approximately preserved when forming the QR.

  4. Transform term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on Appendix B of Wood (2011) and section 6.2.7 in Wood (2017) so that they are full-rank and their log-determinant can be computed safely. In that case, only S needs to be provided and has to be a list holding the penalties to be transformed. If the transformation is to be applied to model matrices, coefficients, hessian, and covariance matrices X should be set to something other than None (does not matter what, can for example be the first model matrix.) The mssm.src.python.gamm_solvers.reparam_model() function can be used to apply the transformation and also returns the required transformation matrices to reverse it.

For Options 1-3:

If QR==True then \(\mathbf{X}\) is decomposed into \(\mathbf{Q}\mathbf{R}\) directly via QR decomposition. Alternatively, we first form \(\mathbf{X}^T\mathbf{X}\) and then compute the cholesky \(\mathbf{L}\) of this product - note that \(\mathbf{L}^T = \mathbf{R}\). Overall the latter strategy is much faster (in particular if option==1), but the increased loss of precision in \(\mathbf{L}^T = \mathbf{R}\) might not be ok for some.

After transformation S only contains elements on it’s diagonal and \(\mathbf{X}\) the transformed functions. As discussed in Wood (2017), the transformed functions are decreasingly flexible - so the elements on \(\mathbf{S}\) diagonal become smaller and eventually zero, for elements that are in the kernel of the original \(\mathbf{S}\) (un-penalized == not flexible).

For a similar transformation (based solely on \(\mathbf{S}\)), Wood et al. (2013) show how to further reduce the diagonally transformed \(\mathbf{S}\) to an even simpler identity penalty. As discussed also in Wood (2017) the same behavior of decreasing flexibility if all entries on the diagonal of \(\mathbf{S}\) are 1 can only be maintained if the transformed functions are multiplied by a weight related to their wiggliness. Specifically, more flexible functions need to become smaller in amplitude - so that for the same level of penalization they are removed earlier than less flexible ones. To achieve this Wood further post-multiply the transformed matrix \(\mathbf{X}'\) with a matrix that contains on it’s diagonal the reciprocal of the square root of the transformed penalty matrix (and 1s in the last cells corresponding to the kernel). This is done here if identity=True.

In mgcv the transformed model matrix and penalty can optionally be scaled by the root mean square value of the transformed model matrix (see the nat.param function in mgcv). This is done here if scale=True.

For Option 4:

Option 4 enforces re-parameterization of term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on section Wood (2011) and section 6.2.7 in Wood (2017). In mssm multiple penalties can be placed on individual terms (i.e., tensor terms, random smooths, Kernel penalty) but it is not always the case that the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) - i.e., the sum over all those individual penalties multiplied with their \(\lambda\) parameters, is of full rank. If we need to form the inverse of the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) this is problematic. It is also problematic, as discussed by Wood (2011), if the different \(\lambda\) are all of different magnitude in which case forming the term-specific \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|+)\) becomes numerically difficult.

The re-parameterization implemented by option 4, based on Appendix B in Wood (2011), solves these issues. After this re-parameterization a term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) has been formed that is full rank. And \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|)\) - no longer just a generalized determinant - can be computed without running into numerical problems.

The strategy by Wood (2011) could be applied to form an overall - not just term-specific - \(\mathbf{S}_{\boldsymbol{\lambda}}\) with these properties. However, this does not work for general smooth models as defined by Wood et al. (2016). Hence, mssm opts for the blockwise strategy. However, in mssm penalties currently cannot overlap, so this is not necessary at the moment.

References:
  • Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

  • Wood, S. N., Scheipl, F., & Faraway, J. J. (2013). Straightforward intermediate rank tensor product smoothing in mixed models.

  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • mgcv source code (accessed 2024). smooth.R file, nat.param function.

Parameters:
  • X (scp.sparse.csc_array | None) – Model/Term matrix or None

  • S (list[LambdaTerm]) – List of penalties

  • cov (np.ndarray | None) – covariate array associated with a specific term or None

  • option (int, optional) – Which re-parameterization to compute, defaults to 1

  • n_bins (int, optional) – Number of bins to use as part of option 3, defaults to 30

  • QR (bool, optional) – Whether to rely on a QR decomposition or not (then a Cholesky is used) as part of options 1-3, defaults to False

  • identity (bool, optional) – Whether the penalty matrix should be transformed to identity as part of options 1-3, defaults to False

  • scale (bool, optional) – Whether the penalty matrix and term matrix should be scaled as part of options 1-3, defaults to False

Returns:

Return object content depends on option but will usually hold informations to apply/undo the required re-parameterization as well as already re-parameterized objects.

Return type:

tuple

mssm.src.python.repara.reparam_model(dist_coef: list[int], dist_up_coef: list[int], coef: ndarray, split_coef_idx: list[int], Xs: list[csc_array], penalties: list[LambdaTerm], form_inverse: bool = True, form_root: bool = True, form_balanced: bool = True, n_c: int = 1) tuple[ndarray, list[csc_array], list[LambdaTerm], csc_array, csc_array | None, csc_array | None, csc_array | None, csc_array, list[csc_array]]

Relies on the transformation strategy from Appendix B of Wood (2011) to re-parameterize the model.

Coefficients, model matrices, and penalties are all transformed. The transformation is applied to each term separately as explained by Wood et al., (2016).

References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:
  • dist_coef ([int]) – List of number of coefficients per formula/linear predictor/distribution parameter of model.

  • dist_up_coef ([int]) – List of number of unpenalized (i.e., fixed effects, linear predictors/parameters) coefficients per formula/linear predictor/distribution parameter of model.

  • coef (numpy.array) – Vector of coefficients (numpy.array of dim (-1,1)).

  • split_coef_idx ([int]) – List with indices to split coef vector into separate versions per linear predictor.

  • Xs ([scp.sparse.csc_array]) – List of model matrices obtained for example via model.get_mmat().

  • penalties ([LambdaTerm]) – List of penalties for model.

  • form_inverse (bool, optional) – Whether or not an inverse of the transformed penalty matrices should be formed. Useful for computing the EFS update, defaults to True

  • form_root (bool, optional) – Whether or not to form a root of the total penalty, defaults to True

  • form_balanced (bool, optional) – Whether or not to form the “balanced” penalty as described by Wood et al. (2016) after the re-parameterization, defaults to True

  • n_c (int, optional) – Number of cores to use to ocmpute the inverse when form_inverse=True, defaults to 1

Raises:

ValueError – Raises a value error if one of the inverse computations fails.

Returns:

A tuple with 9 elements: the re-parameterized coefficient vector, a list with the re-parameterized model matrices, a list of the penalties after re-parameterization, the total re-parameterized penalty matrix, optionally the balanced version of the former, optionally a root of the re-parameterized total penalty matrix, optionally the inverse of the re-parameterized total penalty matrix, the transformation matrix Q so that Q.T@S_emb@Q = S_emb_rp where S_emb and S_emb_rp are the total penalty matrix before and after re-parameterization, a list of transformation matrices QD so that XD@QD=XD_rp where XD and XD_rp are the model matrix of the Dth linear predictor before and after re-parameterization.

Return type:

tuple[np.ndarray, list[scp.sparse.csc_array], list[LambdaTerm], scp.sparse.csc_array, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array, list[scp.sparse.csc_array]]

mssm.src.python.smooths module

mssm.src.python.smooths.B_spline_basis(cov: ndarray, event_onset: int | None, nk: int, min_c: float | None = None, max_c: float | None = None, drop_outer_k: bool = False, convolve: bool = False, deg: int = 3) ndarray

Computes B-spline basis of degree deg given knots.

Based on code and definitions in “Splines, Knots, and Penalties” by Eilers & Marx (2010) and adapted to allow for convolving B-spline bases.

References:
Parameters:
  • cov (np.ndarray) – Flattened covariate array (i.e., of shape (-1,))

  • event_onset (int | None) – Sample on which to place a dirac delta with which the B-spline bases should be convolved - ignored if convolve==False.

  • nk (int) – Number of basis functions to create

  • min_c (float | None, optional) – Minimum covariate value, defaults to None

  • max_c (float | None, optional) – Maximum covariate value, defaults to None

  • drop_outer_k (bool, optional) – Deprecated, defaults to False

  • convolve (bool, optional) – Whether basis functions should be convolved (i.e., time-shifted) with an impulse response function triggered at event_onset, defaults to False

  • deg (int, optional) – Degree of basis, defaults to 3

Returns:

An array of shape (-1,nk) holding the nk Basis functions evaluated over x and optionally convolved with an impulse response function triggered at event_onset

Return type:

np.ndarray

mssm.src.python.smooths.TP_basis_calc(cTP: ndarray, nB: ndarray) ndarray

Computes row-wise Kroenecker product between cTP and nB. Useful to create a Tensor smooth basis.

See Wood(2017) 5.6.1 and B.4.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • cTP (np.ndarray) – Marginal basis or partially accumulated tensor smooth basis

  • nB (np.ndarray) – Marginal basis to include in the tensor smooth

Returns:

The row-wise Kroenecker product between cTP and nB

Return type:

np.ndarray

mssm.src.python.smooths.bbase(x: ndarray, knots: ndarray, dx: float, deg: int) ndarray

Computes B-spline basis of degree deg given knots and interval spacing dx.

Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)

References:
Parameters:
  • x (np.ndarray) – Covariate

  • knots (np.ndarray) – knot location vector

  • dx (float) – Interval spacing (xr-xl) / ndx where xr and xl are max and min of x and ndx=nk-deg where nk is the number of basis functions.

  • deg (int) – Degree of basis

Returns:

numpy.array of shape (-1,``nk``)

Return type:

np.ndarray

mssm.src.python.smooths.convolve_event(f: ndarray, pulse_location: int) ndarray

Convolution of function f with dirac delta spike centered around sample pulse_locations.

Based on code by Wierda et al. 2012

References:
  • Wierda, S. M., van Rijn, H., Taatgen, N. A., & Martens, S. (2012). Pupil dilation deconvolution reveals the dynamics of attention at high temporal resolution. https://doi.org/10.1073/pnas.1201858109

Parameters:
  • f (np.ndarray) – Function evaluated over some samples

  • pulse_location (int) – Location of spike (in sample)

Returns:

Convolved function as array

Return type:

np.ndarray

mssm.src.python.smooths.tpower(x: ndarray, t: ndarray, p: int) ndarray

Computes truncated p-t power function of x.

Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)

References:
Parameters:
  • x (np.ndarray) – Covariate

  • t (np.ndarray) – knot location vector

  • p (int) – degrees of spline basis

Returns:

np.power(x - t,p) * (x > t)

Return type:

np.ndarray

mssm.src.python.terms module

class mssm.src.python.terms.GammTerm(variables: list[str], type: TermType, is_penalized: bool, penalty: list[Penalty], pen_kwargs: list[dict])

Bases: object

Base-class implemented by the terms passed to mssm.src.python.formula.Formula.

Parameters:
  • variables ([str]) – List of variables as strings.

  • type (TermType) – Type of term as enum

  • is_penalized (bool) – Whether the term is penalized/can be penalized or not

  • penalty ([Penalty]) – The default penalties associated with a term.

  • pen_kwargs ([dict]) – A list of dictionaries, each with key-word arguments passed to the construction of the corresponding Penalty in penalty.

build_matrix(*args, **kwargs)

Builds the design/term/model matrix associated with this term and returns it represented as a list of values, a list of row indices, and a list of column indices.

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term. Also returns the number of additional columnsthat would be added to the total model matrix by this term.

build_penalty(penalties: list[LambdaTerm], cur_pen_idx: int, *args, **kwargs) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

Returns:

Updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(*args, **kwargs)

Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

This method is implemented by every implementation of the GammTerm class.

mssm.src.python.terms.build_ir_smooth_series(irsterm: irf, s_cov: ndarray, s_event: int, var_map: dict, var_mins: dict, var_maxs: dict, by_levels: ndarray | None) ndarray

Function to build the impulse response martrix for a single time-series.

Parameters:
  • irsterm (irf) – Impulse response smooth term

  • s_cov (np.ndarray) – covariate array associated with irsterm

  • s_event (int) – Onset of impulse response function

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.

  • var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.

  • by_levels (np.ndarray | None) – Numpy array holding the levels of the factor associated with the irsterm term (via irsterm.by) or None

Returns:

The term matrix associated with the particular event at s_event

Return type:

np.ndarray

mssm.src.python.terms.build_linear_term(lTerm: l | rs, has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with a linear/random term and returns it represented as a list of values, a list of row indices, and a list of column indices.

Parameters:
  • lTerm – Linear or random slope term

  • has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.

  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

class mssm.src.python.terms.f(variables: list, by: str = None, by_cont: str = None, binary: tuple[str, str] | None = None, id: int = None, nk: int | list[int] = None, te: bool = False, rp: int = 0, constraint: ~mssm.src.python.custom_types.ConstType = ConstType.QR, identifiable: bool = True, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {}, is_penalized: bool = True, penalize_null: bool = False, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)

Bases: GammTerm

A univariate or tensor interaction smooth term. If variables only contains a single variable \(x\), this term will represent a univariate \(f(x)\) in a model:

\[\mu_i = a + f(x_i)\]

For example, the model below in mgcv:

bam(y ~ s(x,k=10) + s(z,k=20))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19)]),Gaussian())

If variables contains two variables \(x\) and \(z\), then this term will either represent the tensor interaction \(f(x,z)\) in model:

\[\mu_i = a + f(x_i) + f(z_i) + f(x_i,z_i)\]

or in model:

\[\mu_i = a + f(x_i,z_i)\]

The first behavior is achieved by setting te=False. In that case it is necessary to add ‘main effect’ f terms for \(x\) and \(y\). In other words, the behavior then mimicks the ti() term available in mgcv (Wood, 2017). If te=True, the term instead behaves like a te() term in mgcv, so no separate smooth effects for the main effects need to be included.

For example, the model below in mgcv:

bam(y ~ te(x,z,k=10))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x","z"],nk=9,te=True)]),Gaussian())

In addition, the model below in mgcv:

bam(y ~ s(x,k=10) + s(z,k=20) + ti(x,z,k=10))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19),f(["x","z"],nk=9,te=False)]),Gaussian())

By default a B-spline basis is used with nk=9 basis functions (after removing identifiability constrains). This is equivalent to mgcv’s default behavior of using 10 basis functions (before removing identifiability constrains). In case variables contains more then one variable nk can either bet set to a single value or to a list containing the number of basis functions that should be used to setup the spline matrix for every variable. The former implies that the same number of coefficients should be used for all variables. Keyword arguments that change the computation of the spline basis can be passed along via a dictionary to the basis_kwargs argument. Importantly, if multiple variables are present and a list is passed to nk, a list of dictionaries with keyword arguments of the same length needs to be passed to basis_kwargs as well.

Multiple penalties can be placed on every term by adding Penalty to the penalties argument. In case variables contains multiple variables a separate tensor penalty (see Wood, 2017) will be created for every penalty included in penalties. Again, key-word arguments that alter the behavior of the penalty creation need to be passed as dictionaries to pen_kwargs for every penalty included in penalties. By default, a univariate term is penalized with a difference penalty of order 2 (Eilers & Marx, 2010).

References:

  • Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125

  • Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.

  • by (str, optional) – A string corresponding to a factor in data passed to Formula. Separate f(variables) (and smoothness penalties) will be estimated per level of by.

  • by_cont (str, optional) – A string corresponding to a numerical variable in data passed to Formula. The model matrix for the estimated smooth term f(variables) will be multiplied by the column of this variable. Can be used to estimate ‘varying coefficient’ models but also to set up binary smooths or to only estimate a smooth term for specific levels of a factor (i.e., what is possible for ordered factors in R & mgcv).

  • binary ([str,str], optional) – A list containing two strings. The first string corresponds to a factor in data passed to Formula. A separate f(variables) will be estimated for the level of this factor corresponding to the second string.

  • id (int, optional) – Only useful in combination with specifying a by variable. If id is set to any integer the penalties placed on the separate f(variables) will share a single smoothness penalty.

  • nk (int or list[int], optional) – Number of basis functions to use. Even if identifiable is true, this number will reflect the final number of basis functions for this term (i.e., mssm acts like you would have asked for 10 basis functions if nk=9 and identifiable=True; the default).

  • te (bool, optional) – For tensor interaction terms only. If set to false, the term mimics the behavior of ti() in mgcv (Wood, 2017). Otherwise, the term behaves like a te() term in mgcv - i.e., the marginal basis functions are not removed from the interaction.

  • rp (int, optional) – Experimental - will currently break for tensor smooths or in case by is provided. Whether or not to re-parameterize the term - see mssm.src.python.formula.reparam() for details. Defaults to no re-parameterization.

  • constraint (mssm.src.constraints.ConstType, optional) – What kind of identifiability constraints should be absorbed by the terms (if they are to be identifiable). Either QR-based constraints (default, well-behaved), by means of column-dropping (no infill, not so well-behaved), or by means of difference re-coding (little infill, not so well behaved either).

  • identifiable (bool, optional) – Whether or not the constant should be removed from the space of functions this term can fit. Achieved by enforcing that \(\mathbf{1}^T \mathbf{X} = 0\) (\(\mathbf{X}\) here is the spline matrix computed for the observed data; see Wood, 2017 for details). Necessary in most cases to keep the model identifiable.

  • basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in mssm.src.smooths.B_spline_basis().

  • basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. Consult the docstring of the function computing the basis you want. For the default B-spline basis for example see the mss.src.smooths.B_spline_basis() function. The default arguments set by any basis function, should work for most cases though.

  • is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.

  • penalize_null (bool, optional) – Should a separate Null-space penalty (Marra & Wood, 2011) be placed on the term. By default, the term here will leave a linear f(variables) un-penalized! Thus, there is no option for the penalty to achieve f(variables) = 0 even if that would be supported by the data. Adding a Null-space penalty provides the penalty with that power. This can be used for model selection instead of Hypothesis testing and is the preferred way in mssm (see Marra & Wood, 2011 for details).

  • penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.

  • pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. Consult the docstring of the Penalty.constructor() method of the specific Penalty you want to use for details.

absorb_repara(rpidx, X, cov)

Computes all terms necessary to absorb a re-parameterization into the term and penalty matrix.

Parameters:
  • rpidx (int) – Index to specific reparam. obejct. There must be a 1 to 1 relationship between reparam. objects and the number of marginals required by this smooth (i.e., the number of variables).

  • X (scipy.sparse.csc_array) – Design matrix associated with this term.

  • cov (np.ndarray) – The covariate this term is a function of as a flattened numpy array.

Raises:

ValueError – If this method is called with rpidx exceeding the number of this term’s RP objects (i.e., when rpidx > (len(self.RP) - 1)) or if self.rp is equal to a value for which no reparameterisation is implemented.

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: list[int], cov_flat: ndarray, use_only: list[int], tol: int = 0) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for this smooth term.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.

  • var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

  • tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

  • penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • n_coef (int) – Number of coefficients associated with this term.

  • col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this smooth term, the number of unpenalized coefficients associated with this smooth term, and a list with names for each of the coefficients associated with this smooth term.

Parameters:

factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.fs(variables: list, rf: str = None, nk: int = 9, m: int = 1, rp: int = 1, by_cont: str | None = None, by_subgroup: tuple[str, str] | None = None, approx_deriv: dict | None = None, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {})

Bases: f

Essentially a f term with by=rf, id != None, penalize_null= True, pen_kwargs = [{"m":1}], and rp=1.

This term approximates the “factor-smooth interaction” basis “fs” with m= 1 available in mgcv (Wood, 2017). For example, the term below from mgcv:

s(x,sub,bs="fs"))

would approximately correspond to the following term in mssm:

fs(["x"],rf="sub")

They are however not equivalent (mgcv by default uses a different basis for which the m key-word has a different functionality).

Specifically, here m= 1 implies that the only function left unpenalized by the default (difference) penalty is the constant (Eilers & Marx, 2010). Thus, a linear basis is penalized by the same default penalty that also penalizes smoothness (and not by a separate penalty as is the case in mgcv when m=1 for the default basis)! Any constant basis is penalized by the null-space penalty (in both mgcv and mssm; see Marra & Wood, 2011) - the term thus shrinks towards zero (Wood, 2017).

The factor smooth basis in mgcv allows to let the penalty be different for different levels of an additional factor (by additionally specifying the by argument for a smooth with basis “fs”). I.e.,

s(Time,Subject,by='condition',bs='fs')

in mgcv would estimate a non-linear random smooth of “time” per level of the “subject” & “condition” interaction - with the same penalty being placed on all random smooth terms within the same “condition” level.

This can be achieved in mssm by adding multiple fs terms to the Formula and utilising the by_subgroup argument. This needs to be set to a list where the first element identifies the additional factor variable (e.g., “condition”) and the second element corresponds to a level of said factor variable. E.g., to approximate the aforementioned mgcv term we have to add:

*[fs(["Time"],rf="subject_cond",by_subgroup=["cond",cl]) for cl in np.unique(dat["cond"])]

to the Formula terms list. Importantly, “subject_cond” is the interaction of “subject” and “condition” - not just the “subject variable in the data.

Model estimation can become quite expensive for fs terms, when the factor variable for rf has many levels. (> 10000) In that case, approximate derivative evaluation can speed things up considerably. To enforce this, the approx_deriv argument needs to be specified with a dict, having the following structure: {"no_disc":[str],"excl":[str],"split_by":[str],"restarts":int,"seed":None or int}. “no_disc” should usually be set to an empty list, and should in general only contain names of continuous variables included in the formula. Any variable mentioned here will not be discretized before clustering - this will make the approximation a bit more accurate but also require more time. Similarly, “excl” specifies any continuous variables that should be excluded for clustering. “split_by” should generally be set to a list containing all categorical variables present in the formula. “restarts” indicates the number of times to re-produce the clustering (40 seems to be a good number). “seed” can either be set to None or to an integer - in the latter case, the random cluster initialization will use that seed, ensuring that the clustering outcome (and hence model fit) is replicable.

References:

  • Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125

  • Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models.Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.). Chapman and Hall/CRC.

Parameters:
  • variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.

  • rf (str, optional) – A string corresponding to a (random) factor in data passed to Formula. Separate f(variables) (but a shared smoothness penalty!) will be estimated per level of rf.

  • nk (int or list[int], optional) – Number of basis functions -1 to use. I.e., if nk=9 (the default), the term will use 10 basis functions. By default f() has identifiability constraints applied and we act as if nk``+ 1 coefficients were requested. The ``fs() term needs no identifiability constrains so if the same number of coefficients used for a f() term is requested (the desired approach), one coefficient is added to compensate for the lack of identifiability constraints. This is the opposite to how this is handled in mgcv: specifying nk=10 for “fixed” univariate smooths results in 9 basis functions being available. However, for a smooth in mgcv with basis=’fs’, 10 basis functions will remain available.

  • basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in mssm.src.smooths.B_spline_basis().

  • basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. For the B-spline basis the following arguments (with default values) are available: convolve``=``False, min_c``=``None, max_c``=``None, deg``=``3. See mssm.src.smooths.B_spline_basis() for details.

  • by_cont (str, optional) – A string corresponding to a numerical variable in data passed to Formula. The model matrix for the estimated smooth term will be multiplied by the column of this variable. Can be used as an alternative to estimate separate random smooth terms per level of another factor (wich is also possible with by_subgroup).

  • by_subgroup ([str,str], optional) – List including a factor variable and specific level of said variable. Allows for separate penalties as described above.

  • approx_deriv (dict, optional) – Dict holding important info for the clustering algorithm. Structure: {"no_disc":[str],"excl":[str],"split_by":[str],"restarts":int}

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int], tol: int = 0) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for this factor smooth term.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.

  • var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

  • tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this factor smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

  • penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this factor smooth term, the number of unpenalized coefficients associated with this factor smooth term, and a list with names for each of the coefficients associated with this factor smooth term.

Parameters:

factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

mssm.src.python.terms.get_linear_coef_info(lTerm: l | rs, has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with a linear or random term, the number of unpenalized coefficients associated with a linear or random and a list with names for each of the coefficients associated with a linear or random.

Parameters:
  • lTerm – Linear or random slope term

  • has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.

  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.i

Bases: GammTerm

An intercept/offset term. In a model

\[\mu_i = a + f(x_i)\]

it reflects \(a\).

build_matrix(ci: int, ti: int, ridx: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for an intercept term.

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

get_coef_info() tuple[int, int, list[str]]

Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.irf(variables: list[str], event_onset: list[int], basis_kwargs: list[dict], by: str = None, id: int = None, nk: int = 10, basis: ~collections.abc.Callable = <function B_spline_basis>, is_penalized: bool = True, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)

Bases: GammTerm

A simple impulse response term, designed to correct for events with overlapping responses in multi-level time-series modeling.

The idea (see Ehinger & Dimigen; 2019 for a detailed introduction to this kind of deconvolution analysis) is that some kind of event happens during each recorded time-series (e.g., stimulus onset, distractor display, mask onset, etc.) which is assumed to affect the recorded signal in the next X ms in some way. The moment of event onset can differ between recorded time-series. In other words, the event is believed to act like an impulse which triggers a delayed response on the signal. This term class can be used to estimate the shape of this impulse response. Multiple irf terms can be included in a Formula if multiple events happen, potentially with overlapping responses.

Example:

# Simulate time-series based on two events that elicit responses which vary in their overlap.
# The summed responses + a random intercept + noise is then the signal.
overlap_dat,onsets1,onsets2 = sim7(100,1,2,seed=20)

# Model below tries to recover the shape of the two responses in the 200 ms after event onset (max_c=200) + the random intercepts:
overlap_formula = Formula(lhs("y"),[irf(["time"],onsets1,nk=15,basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]),
                                    irf(["time"],onsets2,nk=15,basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]),
                                    ri("factor")],
                                    data=overlap_dat,
                                    series_id="series") # For models with irf terms, the column in the data identifying unique series need to be specified.

model = GAMM(overlap_formula,Gaussian())
model.fit()

Note, that care needs to be taken when predicting for models including irf terms, because the onset of events can differ between time-series. Hence, model predictions + standard errors should first be obtained for the entire data-set used also to train the model and then extract series-specific predictions from the model-matrix as follows:

# Get model matrix for entire data-set but only based on the estimated shape for first irf term:
_,pred_mat,ci_b = model.predict([0],overlap_dat,ci=True)

# Now extract the prediction + approximate ci boundaries for a single series:
s = 8
s_pred = pred_mat[overlap_dat["series"] == s,:]@model.coef
s_ci = ci_b[overlap_dat["series"] == s]

# Now the estimated response following the onset of the first event can be visualized + an approximate CI:
from matplotlib import pyplot as plt
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred,color='blue')
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred+s_ci,color='blue',linestyle='dashed')
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred-s_ci,color='blue',linestyle='dashed')

References:

  • Ehinger, B. V., & Dimingen, O. (2019). Unfold: an integrated toolbox for overlap correction, non-linear modeling, and regression-based EEG analysis. https://doi.org/10.7717/peerj.7838

Parameters:
  • variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.

  • event_onset ([int]) – A np.array containing, for each individual time-series, the index corresponding to the sample/time-point at which the event eliciting the response to be estimate by this term happened.

  • basis_kwargs (dict) – A list containing one or multiple dictionaries specifying how the basis should be computed. For irf terms, the convolve argument has to be set to True! Also, min_c and max_c must be specified. min_c corresponds to the assumed min. delay of the response after event onset and can usually be set to 0. max_c corresponds to the assumed max. delay of the response (in ms) after which the response is believed to have returned to a zero base-line.

  • by (str, optional) – A string corresponding to a factor in data passed to Formula. Separate irf(variables) (and smoothness penalties) will be estimated per level of by.

  • id (int, optional) – Only useful in combination with specifying a by variable. If id is set to any integer the penalties placed on the separate irff(variables) will share a single smoothness penalty.

  • nk (int, optional) – Number of basis functions to use. I.e., if nk=10 (the default), the term will use 10 basis functions (Note that these terms are not made identifiable by absorbing any kind of constraint).

  • basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in src.smooths.B_spline_basis.

  • is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.

  • penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.

  • pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. For the default difference penalty (Eilers & Marx, 2010) the only keyword argument (with default value) available is: m=2. This reflects the order of the difference penalty. Note, that while a higher m permits penalizing towards smoother functions it also leads to an increased dimensionality of the penalty Kernel (the set of f[variables] which will not be penalized). In other words, increasingly more complex functions will be left un-penalized for higher m (except if penalize_null is set to True). m=2 is usually a good choice and thus the default but see Eilers & Marx (2010) for details.

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov: list[ndarray], use_only: list[int], pool, tol: int = 0) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this impulse response smooth term and returns it represented as a list of values, a list of row indices, and a list of column indices.

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updated ci column index, reflecting how many additional columns would be added to the total model matrix.

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.

  • var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov ([np.ndarray]) – A list containing a separate array per time-series included in the data and indicated to the formula. The array contains, for the particular time-seriers, all (encoded, in case of categorical predictors) values on each predictor (each columns of the array corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

  • pool (Any) – A multiprocessing pool for parallel matrix construction parts

  • tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero but not absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this impulse response smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

  • penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(ti: int, factor_levels: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this impulse response smooth term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.l(variables: list)

Bases: GammTerm

Adds a parametric (linear) term to the model formula. The model \(\mu_i = a + b*x_i\) can for example be achieved by adding [i(), l(['x'])] to the term argument of a Formula. The coefficient \(b\) estimated for the term will then correspond to the slope of \(x\). This class can also be used to add predictors for categorical variables. If the formula includes an intercept, binary coding will be utilized to add reference-level adjustment coefficients for the remaining k-1 levels of any additional factor variable.

If more than one variable is included in variables the model will only add the the len(variables)-interaction to the model! Lower order interactions and main effects will not be included by default (see li() function instead, which automatically includes all lower-order interactions and main effects).

Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and acontinuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"]),l(["cond","x"])])

This formula will estimate the following model:

\[\mu_i = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]

Here, \(c\) is a binary predictor variable created so that it is 1 if “cond”=2 else 0 and \(b_3\) is the coefficient that is added because l(["cond","x"]) is included in the terms (i.e., the interaction effect).

To get a model with only main effects for “cond” and “x”, the following formula could be used:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])

This formula will estimate:

\[\mu_i = a + b_1*c_i + b_2*x_i\]
Parameters:

variables ([str]) – A list of the variables (strings) for which linear predictors should be included

build_matrix(has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this linear term and returns it represented as a list of values, a list of row indices, and a list of column indices.

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updated ci column index, reflecting how many additional columns would be added to the total model matrix.

Parameters:
  • has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.

  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

get_coef_info(has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this linear term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:
  • has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.

  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

mssm.src.python.terms.li(variables: list[str])

Behaves like the l class but automatically includes all lower-order interactions and main effects.

Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and acontinuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:

formula = Formula(lhs("y"),terms=[i(),*li(["cond","x"])])

Note, the use of the * operator to unpack the individual terms returned from li!

This formula will still (see l) estimate the following model:

\[\mu = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]

with: \(c\) corresponding to a binary predictor variable created so that it is 1 if “cond”=2 else 0.

To get a model with only main effects for “cond” and “x” li cannot be used and l needs to be used instead:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])

This formula will estimate:

\[\mu_i = a + b_1*c_i + b_2*x_i\]
Parameters:

variables (list[str]) – A list of the variables (strings) for which linear predictors should be included

class mssm.src.python.terms.ri(variable: str)

Bases: GammTerm

Adds a random intercept for the factor variable to the model. The random intercepts \(b_i\) are assumed to be i.i.d \(b_i \sim N(0,\sigma_b)\) i.e., normally distributed around zero - the simplest random effect supported by mssm.

Thus, this term achieves exactly what is achieved in mgcv by adding the term:

s(variable,bs="re")

The variable needs to identify a factor-variable in the data (i.e., the .dtype of the variable has to be equal to ‘O’). If you want to add more complex random effects to the model (e.g., random slopes for continuous variable “x” per level of factor variable) use the rs class.

Parameters:

variable (str) – The name (string) of a factor variable. For every level of this factor a random intercept will be estimated. The random intercepts are assumed to follow a normal distribution centered around zero.

build_matrix(ci: int, ti: int, var_map: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this random intercept term and returns it represented as a list of values, a list of row indices, and a list of column indices.

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updated ci column index, reflecting how many additional columns would be added to the total model matrix.

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this random intercept term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this random intercept term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:
  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.rs(variables: list[str], rf: str)

Bases: GammTerm

Adds random slopes for the effects of variables for each level of the random factor rf. The type of random slope created depends on the content of variables.

If len(variables)==1, and the string in variables identifies a categorical variable in the data, then a random offset adjustment (for every level of the categorical variable, so without binary coding!) will be estimated for every level of the random factor rf.

Example: The factor variable “cond”, with two levels “1” and “2” is assumed to have a general effect on the DV “y”. However, data was collected from multiple subjects (random factor rf = “subject”) and it is reasonable to assume that the effect of “cond” is slightly different for every subject (it is also assumed that all subjects took part in both conditions identified by “cond”). A model that accounts for this is estimated via:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),rs(["cond"],rf="subject")])

This formula will estimate the following model:

\[\mu = a + b_1*c_i + a_{j(i),cc(i)}\]

Here, \(c\) is again a binary predictor variable created so that it is 1 if “cond”=2 for observation i else 0, \(cc(i)\) indexes the level of “cond” at observation \(i\), \(j(i)\) indexes the level of “subject” at observation \(i\), and \(a_{j,cc(i)}\) identifies the random offset estimated for subject \(j\) at the level of “cond” indicated by \(cc(i)\). The \(a_{j,cc(i)}\) are assumed to be i.i.d \(\sim N(0,\sigma_a)\). Note that the fixed effect sturcture uses binary coding but the random effect structure does not!

Hence, rs(["cond"],rf="subject") in mssm corresponds to adding the term below to a mgcv model:

s(cond,subject,bs="re")

If all the strings in variables identify continuous variables in the data, then a random slope for the len(variables)-way interaction (will simplify to a slope for a single continuous variable if len(variables) == 1) will be estimated for every level of the random factor rf.

Example: The continuous variable “x” is assumed to have a general effect on the DV “y”. However, data was collected from multiple subjects (random factor rf =”subject”) and it is reasonable to assume that the effect of “x” is slightly different for every subject. A model that accounts for this is estimated via:

formula = Formula(lhs("y"),terms=[i(),l(["x"]),rs(["x"],rf="subject")])

This formula will estimate the following model:

\[\mu = a + b*x_i + b_{j(i)} * x_i\]

Where, \(j(i)\) again indexes the level of “subject” at observation \(i\), \(b_j(i)\) identifies the random slope (the subject-specific slope adjustment for \(b\)) for variable “x” estimated for subject \(j\) and the \(b_{j(i)}\) are again assumed to be i.i.d from a single \(\sim N(0,\sigma_b)\)

Note, lower-order interaction slopes (as well as main effects) are not pulled in by default! Consider the following formula:

formula = Formula(lhs("y"),terms=[i(),*li(["x","z"]),rs(["x","z"],rf="subject")])

with another continuous variable “z”. This corresponds to the model:

\[\mu = a + b_1*x_i + b_2*z_i + b_3*x_i*z_i + b_{j(i)}*x_i*z_i\]

With \(j(i)\) again indexing the level of “subject” at observation i, \(b_{j(i)}\) identifying the random slope (the subject-specific slope adjustment for \(b_3\)) for the interaction of variables \(x\) and \(z\) estimated for subject \(j\). The \(b_{j(i)}\) are again assumed to be i.i.d from a single \(\sim N(0,\sigma_b)\).

To add random slopes for the main effects of either \(x\) or \(z\) as well as an additional random intercept, additional rs and a ri terms would have to be added to the formula:

formula = Formula(lhs("y"),terms=[i(),*li(["x","z"]),
                                 ri("subject"),
                                 rs(["x"],rf="subject"),
                                 rs(["z"],rf="subject"),
                                 rs(["x","z"],rf="subject")])

If len(variables) > 1 and at least one string in variables identifies a categorical variable in the data then random slopes for the len(variables)-way interaction will be estimated for every level of the random factor rf. Separate distribution parameters (the \(\sigma\) of the Normal) will be estimated for every level of the resulting interaction.

Example: The continuous variable “x” and the factor variable “cond”, with two levels “1” and “2” are assumed to have a general interaction effect on the DV “y”. However, data was collected from multiple subjects (random factor rf =”subject”) and it is reasonable to assume that their interaction effect is slightly different for every subject. A model that accounts for this is estimated via:

formula = Formula(lhs("y"),terms=[i(),*li(["x","cond"]),rs(["x","cond"],rf="subject")])

This formula will estimate the following model:

\[\mu = a + b_1*c_i + b_2*x_i + b_3*x_i*c_i + b_{j(i),cc(i)}*x_i\]

With, \(c\) corresponding to a binary predictor variable created so that it is 1 if “cond”=2 for observation \(i\) else 0, \(cc(i)\) corresponds to the level of “cond” at observation \(i\), \(j(i)\) corresponds to the level of “subject” at observation \(i\), and \(b_{j(i),cc(i)}\) identifies the random slope for variable \(x\) at “cond” = \(cc(i)\) estimated for subject \(j\). That is: the \(b_{j,cc(i)}\) where \(cc(i)=1\) are assumed to be i.i.d realizations from normal distribution \(N(0,\sigma_{b_1})\) and the \(b_{j,cc(i)}\) where \(cc(i)=2\) are assumed to be i.i.d realizations from a separate normal distribution \(N(0,\sigma_{b_2})\).

Hence, adding rs(["x","cond"],rf="subject") to a mssm model, is equivalent to adding the term below to a mgcv model:

s(x,subject,by=cond,bs="re")

Correlations between random effects cannot be taken into account by means of parameters (this is possible for example in lme4).

Parameters:
  • variables ([str]) – A list of variables. Can point to continuous and categorical variables.

  • rf (str) – A factor variable. Identifies the random factor in the data.

build_matrix(ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this random slope term and returns it represented as a list of values, a list of row indices, and a list of column indices.

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updated ci column index, reflecting how many additional columns would be added to the total model matrix.

Parameters:
  • ci (int) – Current column index.

  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.

  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • ridx (np.ndarray) – Array of non NAN rows in the data.

  • cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.

  • use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this random slope term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:
  • ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.

  • penalties ([LambdaTerm]) – List of previosly created penalties.

  • cur_pen_idx (int) – Index of the last element in penalties.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]

Returns the total number of coefficients associated with this random slope term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:
  • var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

  • factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

  • coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

mssm.src.python.utils module

class mssm.src.python.utils.DummyRhoPrior(a=np.float64(-16.11809565095832), b=np.float64(16.11809565095832))

Bases: RhoPrior

Simple uniform prior for rho - the log-smoothing penalty parameters

logpdf(rho: ndarray) ndarray

Returns an array holding zeroes for all log(lambda) parameters within self.a and self.b, otherwise -np.inf.

Parameters:

rho (np.ndarray) – Array of log(lambda) parameters

Returns:

Log-density array as described above

Return type:

np.ndarray

class mssm.src.python.utils.GAMLSSGSMMFamily(pars: int, gammlss_family: GAMLSSFamily)

Bases: GSMMFamily

Implementation of the GSMMFamily class that uses only information about the likelihood to estimate any implemented GAMMLSS model.

Allows to estimate any GAMMLSS as a GSMM via the L-qEFS & Newton update. Example:

# Simulate 500 data points
sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False)

# We need to model the mean: mu_i
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                    data=sim_dat)

# And for sd - here constant
formula_sd = Formula(lhs("y"),
                    [i()],
                    data=sim_dat)

# Collect both formulas
formulas = [formula_m,formula_sd]
links = [Identity(),LOG()]

# Now define the general family + model
gsmm_fam = GAMLSSGSMMFamily(2,GAUMLSS(links))
model = GSMM(formulas=formulas,family=gsmm_fam)

# Fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(init_coef=None,method='qEFS',extend_lambda=False,
        control_lambda=0,max_outer=200,max_inner=500,min_inner=500,
        seed=0,qEFSH='SR1',max_restarts=5,overwrite_coef=False,qEFS_init_converge=False,prefit_grad=True,
        progress_bar=True,**bfgs_opt)

################### Or for a multinomial model: ###################

formulas = [Formula(lhs("y"),
                [i(),f(["x0"])],
                data=sim5(1000,seed=91)) for k in range(4)]

# Create family - again specifying K-1 pars - here 4!
family = MULNOMLSS(4)

# Collect both formulas
links = family.links

# Now again define the general family + model
gsmm_fam = GAMLSSGSMMFamily(4,family)
model = GSMM(formulas=formulas,family=gsmm_fam)

# And fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(init_coef=None,method='qEFS',extend_lambda=False,
        control_lambda=0,max_outer=200,max_inner=500,min_inner=500,
        seed=0,qEFSH='SR1',max_restarts=0,overwrite_coef=False,qEFS_init_converge=False,prefit_grad=True,
        progress_bar=True,**bfgs_opt)
References:
  • Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

  • Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:
  • pars (int) – Number of parameters of the likelihood.

  • gammlss_family (GAMLSSFamily) – Any implemented member of the GAMLSSFamily class. Available in self.llkargs[0].

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray

Function to evaluate gradient of GAMM(LSS) model when estimated via GSMM.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array) of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array

Function to evaluate Hessian of GAMM(LSS) model when estimated via GSMM.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float

Function to evaluate log-likelihood of GAMM(LSS) model when estimated via GSMM.

Parameters:
  • coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).

  • coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.

  • ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.

  • Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

mssm.src.python.utils.REML(llk: float, nH: csc_array, coef: ndarray, scale: float, penalties: list[LambdaTerm], keep: list[int] | None = None) float | ndarray

Based on Wood (2011). Exact REML for Gaussian GAM, Laplace approximate (Wood, 2016) for everything else. Evaluated after applying stabilizing reparameterization discussed by Wood (2011).

Important: the dimension of the output depend on the shape of coef. If coef is flattened, then the output will be a float. If coef is of shape (-1,1), the output will be [[float]].

References:
  • Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

  • Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models

Parameters:
  • llk (float) – log-likelihood of model

  • nH (scp.sparse.csc_array) – negative hessian of log-likelihood of model

  • coef (np.ndarray) – Estimated vector of coefficients of shape (-1,1)

  • scale (float) – (Estimated) scale parameter - can be set to 1 for GAMLSS or GSMMs.

  • penalties ([LambdaTerm]) – List of penalties that were part of the model.

  • keep (list[int]|None, optional) – Optional List of indices corresponding to identifiable coefficients. Coefficients not in this list (not identifiable) are dropped from the negative hessian of the penalized log-likelihood. Can also be set to None (default) in which case all coefficients are treated as identifiable.

Returns:

(Approximate) REML score

Return type:

float|np.ndarray

class mssm.src.python.utils.RhoPrior(*args, **kwargs)

Bases: object

Base class to demonstrate the functionlaity that any prior passed to the correct_VB function has to implement.

logpdf(rho: ndarray)

Compute log density for log smoothing penalty parameters included in rho under this prior.

Parameters:

rho (np.ndarray) – Numpy array of shape (nR,nrho) containing nR proposed candidate vectors for the nrho log-smoothing parameters.

mssm.src.python.utils.adjust_CI(model, n_ps: int, b: ndarray, predi_mat: csc_array, use_terms: list[int] | None, alpha: float, seed: int | None, par: int = 0) ndarray

Internal function to adjust point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016):

model.coef +- b gives point-wise interval, and for the interval to cover the whole-function, 1-alpha % of posterior samples should be expected to fall completely within these boundaries.

From section 6.10 in Wood (2017) we have that \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V})\). \(\mathbf{V}\) is the covariance matrix of this conditional posterior, and can be obtained by evaluating model.lvi.T @ model.lvi * model.scale (model.scale should be set to 1 for msssm.models.GAMMLSS and msssm.models.GSMM).

The implication of this result is that we can also expect the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\) to follow \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}} | \mathbf{y}, \boldsymbol{\lambda} \sim N(0,\mathbf{V})\). In line with the whole-function interval definition above, 1-alpha % of predi_mat@[*coef - coef] (where [*coef - coef] representes the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) should fall within [b,-b]. Wood (2017) suggests to find a so that [a*b,a*-b] achieves this.

To do this, we find a for every predi_mat@[*coef - coef] and then select the final one so that 1-alpha % of samples had an equal or lower one. The consequence: 1-alpha % of samples drawn should fall completely within the modified boundaries.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.

  • n_ps (int) – Number of samples to obtain from posterior.

  • b (np.ndarray) – Ci boundary of point-wise CI.

  • predi_mat (scp.sparse.csc_array) – Model matrix for a particular smooth term or additive combination of parameters evaluated usually at a representative sample of predictor variables.

  • use_terms (list[int] | None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.

  • alpha (float) – The alpha level to use for the whole-function interval adjustment calculation as outlined above.

  • seed (int | None) – Can be used to provide a seed for the posterior sampling.

  • par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.

Returns:

The adjusted vector b

Return type:

np.ndarray

mssm.src.python.utils.approx_smooth_p_values(model, par: int = 0, n_sel: int = 100000.0, edf1: bool = True, force_approx: bool = False, seed: int = 0) tuple[list[float], list[float]]

Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).

Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function mssm.models.print_parametric_terms()) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.

They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2013) we here approximate the reference distribution under the Null by means of the computations outlined in the paper by Davies (1980). If this fails, we fall back on a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\).

In case of a two-parameter distribution (i.e., estimated scale parameter \(\phi\)), the Chi-square reference distribution needs to be corrected, again resulting in a weighted chi-square distribution which should behave something like a F distribution with DoF1 = \(r\) and DoF2 = \(\epsilon_{DoF}\) (i.e., the residual degrees of freedom), which would be the reference distribution for \(T_r/r\) if \(r\) were integer and \(\mathbf{V}_{\boldsymbol{\beta}_j}\) full rank. We again follow the recommendations by Wood (2013) and rely on the methods by Davies (1980) to compute the p-value under this reference distribution. If this fails, we approximate the reference distribution for \(T_r/r\) with a Beta distribution, with \(\alpha=r/2\) and \(\beta=\epsilon_{DoF}/2\) (see Wikipedia for the specific transformation applied to \(T_r/r\) so that the resulting transformation is approximately beta distributed) - which is similar to the Gamma approximation used for the Chi-square distribution in the no-scale parameter case.

Warning: The resulting p-values are approximate. They should only be treated as indicative.

Note: Just like in mgcv, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.

References:
  • Davies, R. B. (1980). Algorithm AS 155: The Distribution of a Linear Combination of χ2 Random Variables.

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.

  • testStat function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780

Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.

  • par (int, optional) – Distribution parameter for which to compute p-values. Ignored when model is a GAMM. Defaults to 0

  • n_sel (int, optional) – Maximum number of rows of model matrix. For models with more observations a random sample of n_sel rows is obtained. Defaults to 1e5

  • edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal. Defaults to True

  • force_approx (bool, optional) – Whether or not the p-value should be forced to be approximated based on a Gamma/Beta distribution. Only use for testing - in practice you want to keep this at False. Defaults to False

  • seed (int, optional) – Random seed determining the random sample computation. Defaults to 0

Returns:

Tuple conatining two lists: first list holds approximate p-values for all smooth terms, second list holds test statistic.

Return type:

tuple[list[float],list[float]]

mssm.src.python.utils.computeAr1Chol(formula: Formula, rho: float) tuple[csc_array, float]

Computes the inverse of the cholesky of the (scaled) variance matrix of an ar1 model.

Parameters:
  • formula (Formula) – Formula of the model

  • rho (float) – ar1 weight.

Returns:

Tuple, containing banded inverse Cholesky as a scipy array and the correction needed to get the likelihood of the ar1 model.

Return type:

tuple[scp.sparse.csc_array,float]

mssm.src.python.utils.compute_REML_candidate_GSMM(family: GAMLSSFamily | GSMMFamily, y: ndarray | list[ndarray], Xs: list[csc_array], penalties: list[LambdaTerm], coef: ndarray, n_coef: int, coef_split_idx: list[int], method: str = 'Chol', conv_tol: float = 1e-07, n_c: int = 10, bfgs_options: dict = {}, origNH: csc_array | None = None) tuple[float, csc_array, csc_array, ndarray, float, float]

Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GSMM or GAMMLSS.

Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).

See REML() function for more details.

Parameters:
  • family (GAMLSSFamily | GSMMFamily) – Model Family

  • y (np.ndarray | list[np.ndarray]) – Vector of observations or list of vectors (for GSMM)

  • Xs (list[scp.sparse.csc_array]) – List of model matrices

  • penalties (list[LambdaTerm]) – List of penalties

  • coef (np.ndarray) – Final coefficient estimate obtained from estimation - used to initialize

  • n_coef (int) – Number of coefficients

  • coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.

  • method (str, optional) – Method to use to solve for the coefficients (lambda parameters in case this is set to ‘qEFS’), defaults to “Chol”

  • conv_tol (float, optional) – Tolerance, defaults to 1e-7

  • n_c (int, optional) – Number of cores to use, defaults to 10

  • bfgs_options (dict, optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS', defaults to {}

  • origNH (scp.sparse.csc_array | None, optional) – Optional external hessian matrix, defaults to None

Returns:

reml criterion,conditional covariance matrix of coefficients for this lambda, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, coefficients, total edf, llk

Return type:

tuple[float, scp.sparse.csc_array, scp.sparse.csc_array, np.ndarray, float, float]

mssm.src.python.utils.compute_Vb_corr_WPS(Vbr: csc_array, Vpr, Vr, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1) tuple[ndarray, ndarray]

Computes both correction terms for Vb or \(\mathbf{V}_{\boldsymbol{\beta}}\), which is the co-variance matrix for the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V}_{\boldsymbol{\beta}})\), described by Wood, Pya, & Säfken (2016).

References:
  • Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

  • Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.

  • Vpr (np.ndarray) – A (regularized) estimate of the covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.

  • Vr (np.ndarray) – Transpose of root of un-regularized covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.

  • H (scp.sparse.csc_array) – The Hessian of the log-likelihood

  • S_emb (scp.sparse.csc_array) – The weighted penalty matrix.

  • penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.

  • coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)

  • scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.

Raises:

ArithmeticError – Will throw an error when the negative Hessian of the penalized likelihood is ill-scaled so that a Cholesky decomposition fails.

Returns:

A tuple containing: Vc and Vcc. Vbr.T@Vbr*scale + Vc + Vcc is then approximately the correction devised by WPS (2016).

Return type:

tuple[np.ndarray, np.ndarray]

mssm.src.python.utils.compute_Vp_WPS(Vbr: csc_array, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1) tuple[ndarray, ndarray, ndarray, ndarray, ndarray, ndarray]

Computes the inverse of what is approximately the negative Hessian of the Laplace approximate REML criterion with respect to the log smoothing penalties.

The derivatives computed are only exact for Gaussian additive models and canonical generalized additive models. For all other models they are in-exact in that they assume that the hessian of the log-likelihood does not depend on \(\lambda\) (or \(log(\lambda)\)), so they are essentially the PQL derivatives of Wood et al. (2017). The inverse computed here acts as an approximation to the covariance matrix of the log smoothing parameters.

References:
  • Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

  • Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data.

Parameters:
  • Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.

  • H (scp.sparse.csc_array) – The Hessian of the log-likelihood

  • S_emb (scp.sparse.csc_array) – The weighted penalty matrix.

  • penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.

  • coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)

  • scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.

Returns:

Generalized inverse of negative hessian of approximate REML criterion, regularized version of the former, root of generalized inverse, root of regularized generalized inverse, hessian of approximate REML criterion, np.array of shape ((len(coef),len(penalties))) containing in each row the partial derivative of the coefficients with respect to an individual lambda parameter

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.utils.compute_bias_corrected_edf(model, overwrite: bool = False) None

This function computes and assigns smoothing bias corrected (term-wise) estimated degrees of freedom.

For a definition of smoothing bias-corrected estimated degrees of freedom see Wood (2017).

Note: This function modifies model, setting edf1 and term_edf1 attributes.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.

  • overwrite (bool, optional) – Whether previously computed bias corrected edf should be overwritten. Otherwise this function immediately terminates if model.edf1 is not None, defaults to False

Return type:

None

mssm.src.python.utils.compute_reml_candidate_GAMM(family: Family, y: ndarray, X: csc_array, penalties: list[LambdaTerm], n_c: int = 10, offset: float | ndarray = 0, init_eta: ndarray | None = None, method: str = 'Chol', compute_inv: bool = False, origNH: float | None = None) tuple[float, csc_array | None, csc_array, list[int], ndarray, float, float, float]

Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GAMM model.

Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).

See REML() function for more details.

Parameters:
  • family (Family) – Family of the model

  • y (np.ndarray) – vector of observations

  • X (scp.sparse.csc_array) – Model matrix

  • penalties (list[LambdaTerm]) – List of penalties

  • n_c (int, optional) – Number of cores to use, defaults to 10

  • offset (float | np.ndarray, optional) – Fixed offset to add to eta, defaults to 0

  • init_eta (np.ndarray | None, optional) – Initial vector for linear predictor, defaults to None

  • method (str, optional) – Method to use to solve for coefficients, defaults to ‘Chol’

  • compute_inv (bool, optional) – Whether to compute the inverse of the pivoted Cholesky of the negative hessian of the penalized llk, defaults to False

  • origNH (float | None, optional) – Optional external scale parameter, defaults to None

Returns:

reml criterion, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, pivoted Cholesky, pivot column indices, coefficients, estimated scale, total edf, llk

Return type:

tuple[float, scp.sparse.csc_array|None, scp.sparse.csc_array, list[int], np.ndarray, float, float, float]

mssm.src.python.utils.correct_VB(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, form_t1: bool = False, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', only_expected_edf: bool = False, Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, recompute_H: bool = False, seed: int | None = None, compute_Vcc: bool = True, **bfgs_options) tuple[csc_array | None, csc_array | None, ndarray | None, ndarray | None, ndarray | None, float | None, ndarray | None, float | None, float, ndarray]

Estimate \(\tilde{\mathbf{V}}\), the covariance matrix of the marginal posterior \(\boldsymbol{\beta} | y\) to account for smoothness uncertainty.

Wood et al. (2016) and Wood (2017) show that when basing conditional versions of model selection criteria or hypothesis tests on \(\mathbf{V}\), which is the co-variance matrix for the normal approximation to the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V})\), the tests are severely biased. To correct for this they show that uncertainty in \(\boldsymbol{\lambda}\) needs to be accounted for. Hence they suggest to base these tests on \(\tilde{\mathbf{V}}\), the covariance matrix of the normal approximation to the marginal posterior \(\boldsymbol{\beta} | y\). They show how to obtain an estimate of \(\tilde{\mathbf{V}}\), but this requires \(\mathbf{V}^{\boldsymbol{\rho}}\) - an estimate of the covariance matrix of the normal approximation to the posterior of \(\boldsymbol{\rho}=log(\boldsymbol{\lambda})\). Computing \(\mathbf{V}^{\boldsymbol{\rho}}\) requires derivatives that are not available when using the efs update.

This function implements multiple strategies to approximately correct for smoothing parameter uncertainty, based on the proposals by Wood et al. (2016) and Greven & Scheipl (2017). The most straightforward strategy (grid_type = 'JJJ1') is to obtain a PQL or finite difference approximation for \(\mathbf{V}^{\boldsymbol{\rho}}\) and to then compute approximately the Wood et al. (2016) correction assuming that higher-order derivatives of the llk are zero (this will be exact for Gaussian additive or canonical Generalized models). This is too costly for large sparse multi-level models and not exact for more generic models. The MC based alternative available via grid_type = 'JJJ2' addresses the first problem (Important, set: use_importance_weights=False and only_expected_edf=True.). The second MC based alternative available via grid_type = 'JJJ3' is most appropriate for more generic models (The prior argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set: use_importance_weights=True and only_expected_edf=False). Both strategies use a PQL or finite difference approximation to \(\mathbf{V}^{\boldsymbol{\rho}}\) to obtain nR samples from the (normal approximation) to the posterior of \(\boldsymbol{\rho}\). From these samples mssm then estimates \(\tilde{\mathbf{V}}\) as described in more detail by Krause et al. (in preparation).

Note: If you set only_expected_edf=True, only the last two output arguments will be non-zero.

Example:

# Simulate some data for a Gaussian model
sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),
                             f(["x0"],nk=20),
                             f(["x1"],nk=20),
                             f(["x2"],nk=20),
                             f(["x3"],nk=20)],
                            data=sim_fit_dat,
                            print_warn=False)

model = GAMM(sim_fit_formula,Gaussian())
model.fit(exclude_lambda=False,progress_bar=False,max_outer=100)


# Compute correction from Wood et al. (2016) - will be approximate for more generic models
# V will be approximate covariance matrix of marginal posterior of coefficients
# LV is Cholesky of the former
# Vp is approximate covariance matrix of log regularization parameters
# Vpr is regularized version of the former
# edf is vector of estimated degrees of freedom (uncertainty corrected) per coefficient
# total_edf is sum of former (but subjected to upper bounds so might not be exactly the same)
# ed2 is optionally smoothness bias corrected version of edf
# total_edf2 is optionally bias corrected version of total_edf (subjected to upper bounds)
# expected_edf is None here but for MC strategies (i.e., ``grid!=1``) will be an estimate
# of total_edf (**without being subjected to upper bounds**) that does not require forming
# V (only computed when ``only_expected_edf=True``). 
# mean_coef is None here but for MC strategies will be an estimate of the mean of the
# marginal posterior of coefficients, only computed when setting ``recompute_H=True``

V,LV,Vp,Vpr,edf,total_edf,edf2,total_edf2,expected_edf,mean_coef = correct_VB(model,
                                                                              grid_type="JJJ1",
                                                                              verbose=True,
                                                                              seed=20)

# Compute MC estimate for generic model and given prior
prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior
V_MC,LV_MC,Vp_MC,Vpr_MC,edf_MC,        total_edf_MC,edf2_MC,total_edf2_MC,expected_edf_MC,mean_coef_MC = correct_VB(model2,
                                                                             grid_type="JJJ3",
                                                                             verbose=True,
                                                                             seed=20,
                                                                             df=10,
                                                                             prior=prior,
                                                                             recompute_H=True)
References:
  • Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.

  • Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models

  • Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)

  • nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250

  • grid_type (str, optional) – How to compute the smoothness uncertainty correction - see above for details, defaults to ‘JJJ1’

  • a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate

  • b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate

  • df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40

  • n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 10

  • form_t1 (bool, optional) – Whether or not the smoothness uncertainty + smoothness bias corrected edf should be computed, defaults to False

  • verbose (bool, optional) – Whether to print progress information or not, defaults to False

  • drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.

  • only_expected_edf (bool,optional) – Whether to compute edf. by explicitly forming covariance matrix (only_expected_edf=False) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense when grid_type!='JJJ1'. Defaults to False

  • Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)

  • use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)

  • prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None

  • recompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False

  • compute_Vcc (bool, optional) – Whether to compute the second correction term when strategy=’JJJ1’ (or when computing the lower-bound for the remaining strategies) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to True

  • seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None

  • bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of scipy.optimize.minimize(). If none are provided, the gtol argument will be initialized to 1e-3. Note also, that in any case the maxiter argument is automatically set to 100. Defaults to None.

Returns:

A tuple containing: V - an estimate of the unconditional covariance matrix, LV - the Cholesky of the former, Vp - an estimate of the covariance matrix for \(\boldsymbol{\rho}\), Vpr - a regularized version of the former, edf - smoothness uncertainty corrected coefficient-wise edf, total_edf - smoothness uncertainty corrected total (i.e., model) edf, edf2 - smoothness uncertainty + smoothness bias corrected coefficient-wise edf, total_edf2 - smoothness uncertainty + smoothness bias corrected total (i.e., model) edf, expected_edf - an optional estimate of total_edf that does not require forming V, mean_coef - an optional estimate of the mean of the posterior of the coefficients

Return type:

tuple[scp.sparse.csc_array|None, scp.sparse.csc_array|None, np.ndarray|None ,np.ndarray|None, np.ndarray|None, float|None, np.ndarray|None, float|None, float, np.ndarray]

mssm.src.python.utils.estimateVp(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, drop_NA: bool = True, method: str = 'Chol', Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, seed: int | None = None, **bfgs_options) tuple[ndarray, ndarray, ndarray, ndarray, ndarray]

Estimate covariance matrix \(\mathbf{V}^{\boldsymbol{\rho}}\) of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\).

Either \(\mathbf{V}^{\boldsymbol{\rho}}\) is based on finite difference approximation or on a PQL approximation (see grid_type parameter), or it is estimated via numerical integration similar to what is done in the correct_VB() function (this is done when grid_type=='JJJ2'; see the aforementioned function for details).

Example:

# Simulate some data for a Gaussian model
sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),f(["x0"],nk=20,rp=0),f(["x1"],nk=20,rp=0),f(["x2"],nk=20,rp=0),f(["x3"],nk=20,rp=0)],
                            data=sim_fit_dat,
                            print_warn=False)

model = GAMM(sim_fit_formula,Gaussian())
model.fit(exclude_lambda=False,progress_bar=False,max_outer=100)

# Compute correction from Wood et al. (2016) - will be approximate for more generic models
# Vp is approximate covariance matrix of log regularization parameters
# Vpr is regularized version of the former
# Ri is a root of covariance matrix of log regularization parameters
# Rir is a root of regularized version of covariance matrix of log regularization parameters
# ep will be an estimate of the mean of the marginal posterior of log regularization parameters (for ``grid_type="JJJ1"`` this will simply be the log of the estimated regularization parameters)
Vp, Vpr, Ri, Rir, ep = estimateVp(model,grid_type="JJJ1",verbose=True,seed=20)


# Compute MC estimate for generic model and given prior
prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior
Vp_MC, Vpr_MC, Ri_MC, Rir_MC, ep_MC = estimateVp(model,strategy="JJJ2",verbose=True,seed=20,use_importance_weights=True,prior=prior)
References:
Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)

  • nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250

  • grid_type (str, optional) – How to compute the smoothness uncertainty correction. Setting grid_type="JJJ1" means a PQL or finite difference approximation is obtained. Setting grid_type="JJJ2" means numerical integration is performed - see correct_VB() for details , defaults to ‘JJJ1’

  • a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate

  • b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate

  • df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40

  • n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 10

  • drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.

  • method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.

  • Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)

  • use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)

  • prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None

  • recompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False

  • seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None

  • bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of scipy.optimize.minimize. If none are provided, the gtol argument will be initialized to 1e-3. Note also, that in any case the maxiter argument is automatically set to 100. Defaults to None.

Returns:

A tuple with 5 elements: an estimate of the covariance matrix of the posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\), a regularized version of the former, a root of the covariance matrix, a root of the regularized covariance matrix, and an estimate of the mean of the posterior

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.utils.print_parametric_terms(model, par: int = 0) None

Prints summary output for linear/parametric terms in the model of a specific parameter, not unlike the one returned in R when using the summary function for mgcv models.

If the model has not been estimated yet, it prints the term names instead.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:
Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

Return type:

None

mssm.src.python.utils.print_smooth_terms(model, par: int = 0, pen_cutoff: float = 0.2, ps: list[float] | None = None, Trs: list[float] | None = None) None

Prints the name of the smooth terms included in the model of a given parameter.

After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:
  • Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:
  • model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GSMM, GAMMLSS, or GAMM model

  • par (int, optional) – Distribution parameter for which to compute p-values. Ignored when model is a GAMM. Defaults to 0

  • pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None

  • ps ([float], optional) – Optional list of p-values per smooth term if these should be printed, defaults to None

  • Trs ([float], optional) – Optional list of test statistics (based on which the ps were computed) per smooth term if these should be printed, defaults to None

Return type:

None

mssm.src.python.utils.sample_MVN(n: int, mu: int | ndarray, scale: float, P: csc_array | None, L: csc_array | None, LI: csc_array | None = None, use: list[int] | None = None, seed: int | None = None) ndarray

Draw n samples from multivariate normal with mean \(\boldsymbol{\mu}\) (mu) and covariance matrix \(\boldsymbol{\Sigma}\).

\(\boldsymbol{\Sigma}\) does not need to be provided. Rather the function expects either L (\(\mathbf{L}\) in what follows) or LI (\(\mathbf{L}^{-1}\) in what follows) and scale (\(\phi\) in what follows). These relate to \(\boldsymbol{\Sigma}\) so that \(\boldsymbol{\Sigma}/\phi = \mathbf{L}^{-T}\mathbf{L}^{-1}\) or \(\mathbf{L}\mathbf{L}^T = [\boldsymbol{\Sigma}/\phi]^{-1}\) so that \(\mathbf{L}*(1/\phi)^{0.5}\) is the Cholesky of the precision matrix of \(\boldsymbol{\Sigma}\).

Notably, for models available in mssm L (and LI) have usually be computed for a permuted matrix, e.g., \(\mathbf{P}[\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]\mathbf{P}^T\) (see Wood & Fasiolo, 2017). Hence for sampling we often need to correct for permutation matrix \(\mathbf{P}\) (P). if LI is provided, then P can be omitted and is assumed to have been used to un-pivot LI already.

Used for example sample the uncorrected posterior \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\boldsymbol{\mu} = \hat{\boldsymbol{\beta}},[\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}\phi)\) for a GAMM (see Wood, 2017). Based on section 7.4 in Gentle (2009), assuming \(\boldsymbol{\Sigma}\) is \(p*p\) and covariance matrix of uncorrected posterior, samples \(\boldsymbol{\beta}\) are then obtained by computing:

\[\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + [\mathbf{P}^T \mathbf{L}^{-T}*\phi^{0.5}]\mathbf{z}\ \text{where}\ z_i \sim N(0,1)\ \forall i = 1,...,p\]

Alternatively, relying on the fact of equivalence that:

\[[\mathbf{L}^T*(1/\phi)^{0.5}]\mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] = \mathbf{z}\]

we can first solve for \(\mathbf{y}\) in:

\[[\mathbf{L}^T*(1/\phi)^{0.5}] \mathbf{y} = \mathbf{z}\]

followed by computing:

\[ \begin{align}\begin{aligned}\mathbf{y} = \mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}]\\\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + \mathbf{P}^T\mathbf{y}\end{aligned}\end{align} \]

The latter avoids forming \(\mathbf{L}^{-1}\) (which unlike \(\mathbf{L}\) might not benefit from the sparsity preserving permutation \(\mathbf{P}\)). If LI is None, L will thus be used for sampling as outlined in these alternative steps.

Often we care only about a handfull of elements in mu (e.g., the first ones corresponding to “fixed effects’” in a GAMM). In that case we can generate samles only for this sub-set of interest by only using a sub-block of rows of \(\mathbf{L}\) or \(\mathbf{L}^{-1}\) (all columns remain). Argument use can be a np.array containg the indices of elements in mu that should be sampled. Because this only works efficiently when LI is available an error is raised when not use is None and LI is None.

If mu is set to any integer (i.e., not a Numpy array/list) it is automatically treated as 0. For mssm.models.GAMMLSS or mssm.models.GSMM models, scale can be set to 1.

References:

  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

  • Gentle, J. (2009). Computational Statistics.

Parameters:
  • n (int) – Number of samples to generate

  • mu (int | np.ndarray) – mean of normal distribution as described above

  • scale (float) – scaling parameter of covariance matrix as described above

  • P (scp.sparse.csc_array | None) – Permutation matrix or None.

  • L (scp.sparse.csc_array | None) – Cholesky of precision of scaled covariance matrix as described above.

  • LI (scp.sparse.csc_array | None, optional) – Inverse of cholesky factor of precision of scaled covariance matrix as described above.

  • use (list[int] | None, optional) – Indices of parameters in mu for which to generate samples, defaults to None in which case all parameters will be sampled

  • seed (int | None, optional) – Seed to use for random sample generation, defaults to None

Returns:

Samples from multi-variate normal distribution. In case use is not provided, the returned array will be of shape (p,n) where p==LI.shape[1]. Otherwise, the returned array will be of shape (len(use),n).

Return type:

np.ndarray

mssm.src.python.utils.updateVp(ep: ndarray, ws: ndarray, rGrid: ndarray) ndarray

Update covariance matrix of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\). REML scores are used to approximate expectation, similar to what was suggested by Greven & Scheipl (2016).

References:
Parameters:
  • ep (np.ndarray) – Model estimate log(lambda), i.e., the expectation over rGrid

  • ws (np.ndarray) – weight associated with each log(lambda) value used for numerical integration

  • rGrid (np.ndarray) – A 2d array, holding all lambda samples considered so far. Each row is one sample

Returns:

An estimate of the covariance matrix of log(lambda) - 2d array of shape len(mp)*len(mp).

Return type:

np.ndarray