api

mssm.models module

class mssm.models.GAMM(formula: Formula, family: Family)

Bases: GAMMLSS

Class to fit Generalized Additive Mixed Models.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

#### Binomial model example ####
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# By default, the Binomial family assumes binary data and uses the logit link.
# Count data is also possible though - see the `Binomial` family.
model = GAMM(formula,Binomial())
model.fit()

# Plot estimated effects on scale of the log-odds
plot(model)

#### Gaussian model with tensor smooth and p-values ####
sim_dat = sim3(n=500,scale=2,c=0,seed=20)

formula = Formula(lhs("y"),[i(),f(["x0","x3"],te=True,nk=9),f(["x1"]),f(["x2"])],
                  data=sim_dat)
model = GAMM(formula,Gaussian())

model.fit()
model.print_smooth_terms(p_values=True)


#### Standard linear (mixed) models are also possible ####
# *li() with three variables: three-way interaction
sim_dat,_ = sim1(100,random_seed=100)

# Specify formula with three-way linear interaction and random intercept term
formula = Formula(lhs("y"),[i(),*li(["fact","x","time"]),ri("sub")],data=sim_dat)

# ... and model
model = GAMM(formula,Gaussian())

# then fit
model.fit()

# get estimates for linear terms
model.print_parametric_terms()

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

formula (Formula) – A formula for the GAMM model
family (Family) – A distribution implementing the Family class.

Variables:

formulas ([Formula]) – A list including the formula passed to the constructor.
lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.
coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.
preds ([[float]]) – The first index corresponds to the linear predictors for the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
mus ([[float]]) – The first index corresponds to the estimated value of the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood used during fitting - will be the expected hessian for non-canonical models. Initialized with None.
edf (float) – The model estimated degrees of freedom as a float. Initialized with None.
edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.
term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.
overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.
info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.
res (np.ndarray) – The working residuals of the model (If applicable). Initialized with None.
Wr (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the root of the Fisher weights at convergence. Initialized with None.
WN (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the Newton weights at convergence. Initialized with None.
hessian_obs (scp.sparse.csc_array) – Observed hessian of the log-likelihood at final coefficient estimate. Not updated for strictly additive models (i.e., Gaussian with identity link). Initialized with None.
rho (float) – Optional auto-correlation at lag 1 parameter used during estimation. Initialized with None.
res_ar (np.ndarray) – Holding the working residuals of the model corrected for any auto-correlation parameter used during estimation. Initialized with None.

fit(max_outer: int = 200, max_inner: int = None, conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 2, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', restart: bool = False, method: str = 'QR', check_cond: int = 1, progress_bar: bool = True, n_cores: int = 10, offset: float | ndarray | None = None, rho: float | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see the ‘Big model’ example below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *

########## Big Model ##########
dat = pd.read_csv(
    'https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv'
    )

# mssm requires that the data-type for variables used as factors is 'O'=object
dat = dat.astype({'series': 'O',
                'cond':'O',
                'sub':'O',
                'series':'O'})

formula = Formula(lhs=lhs("y"), # The dependent variable - here y!
                    terms=[i(), # The intercept, a
                            l(["cond"]), # For cond='b'
                             # to-way interaction between time and cond;
                             # one smooth over time per cond level
                            f(["time"],by="cond",constraint=ConstType.QR),
                             # to-way interaction between x and cond;
                             # one smooth over x per cond level
                            f(["x"],by="cond",constraint=ConstType.QR),
                             # three-way interaction
                            f(["time","x"],by="cond",constraint=ConstType.QR,nk=9),
                             # Random non-linear effect of time - one smooth per level
                             # of factor sub
                            fs(["time"],rf="sub")],
                    data=dat,
                    print_warn=False,find_nested=False)

model = GAMM(formula,Gaussian())

# To speed up estimation, use the following key-word arguments:
# max_inner only matters for Generalized models (i.e., non-Gaussian)
# but for those will often be much faster
model.fit(method="Chol",max_inner=1)

########## ar1 model (without resets per time-series) ##########

# No series identifier passed to formula -> ar1 model does not reset!
formula = Formula(lhs=lhs("y"),
                    terms=[i(),
                            l(["cond"]),
                            f(["time"],by="cond"),
                            f(["x"],by="cond"),
                            f(["time","x"],by="cond")],
                    data=dat,
                    print_warn=False,
                    series_id=None)

model = GAMM(formula,Gaussian())

model.fit(rho=0.99)

# Visualize the un-corrected residuals:
plot_val(model,resid_type="Pearson")

# And the corrected residuals:
plot_val(model,resid_type="ar1")

########## ar1 model (with resets per time-series) ##########

# 'series' variable identifies individual time-series -> ar1 model resets per series!
formula = Formula(lhs=lhs("y"),
                    terms=[i(),
                            l(["cond"]),
                            f(["time"],by="cond"),
                            f(["x"],by="cond"),
                            f(["time","x"],by="cond")],
                    data=dat,
                    print_warn=False,
                    series_id='series')

model = GAMM(formula,Gaussian())

model.fit(rho=0.99)

# Visualize the un-corrected residuals:
plot_val(model,resid_type="Pearson")

# And the corrected residuals:
plot_val(model,resid_type="ar1")

Parameters:

max_outer (int,optional) – The maximum number of fitting iterations. Defaults to 200.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step updating the coefficients for Generalized models. Defaults to 500 for non ar1 models.
conv_tol (float, optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.
extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
exclude_lambda (bool,optional) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
method (str,optional) – Which method to use to solve for the coefficients. (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but then also pivots for stability in order to get an estimate of rank defficiency. This takes substantially longer. This argument is ignored if len(self.formulas[0].file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to “QR”.
check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). When check_cond=2, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosen method. Is ignored, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to 1.
progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
offset (float or np.ndarray,optional) – Mimics the behavior of the offset argument for gam in mgcv in R. If a value is provided here (can either be a float or a numpy.array of shape (-1,1) - if it is an array, then the first dimension has to match the number of observations in the data. NANs present in the dependent variable will be excluded from the offset vector.) then it is consistently added to the linear predictor during estimation. It will not be used by any other function of the GAMM class (e.g., for prediction). This argument is ignored if len(self.formulas[0].file_paths)>0 that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to None.
rho (float,optional) – Optional correlation parameter for an “ar1 residual model”. Essentially mimics the behavior of the rho paramter for the bam function in mgcv. Note, if you want to re-start the ar1 process multiple times (for example because you work with time-series data and have multiple time-series) then you must pass the series.id argument to the Formula used for this model. Defaults to None.

get_llk(penalized: bool = True, ext_scale: float | None = None) → float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the training data. LLK can optionally be evaluated for an external scale parameter ext_scale.

Will instead return None if called before fitting.

Parameters:

penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
ext_scale (float, optional) – Optionally provide an external scale parameter at which to evaluate the log-likelihood, defaults to None

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

Returns:

llk score

Return type:

float or None

get_mmat(use_terms: list[int] | None = None) → csc_array

Returns exaclty the model matrix used for fitting as a scipy.sparse.csc_array. Will throw an error when called for a model for which the model matrix was never former completely - i.e., when \(\mathbf{X}^T\mathbf{X}\) was formed iteratively for estimation, by setting the file_paths argument of the Formula to a non-empty list.

Optionally, all columns not corresponding to terms for which the indices are provided via use_terms can be zeroed.

Parameters:

use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely

Returns:

Model matrix \(\mathbf{X}\) used for fitting.

Return type:

scp.sparse.csc_array

get_pars() → tuple[ndarray | None, float | None]

Returns a tuple. The first entry is a np.ndarray with all estimated coefficients. The second entry is the estimated scale parameter.

Will instead return (None,None) if called before fitting.

Returns:: Model coefficients and scale parameter that were estimated
Return type:: (np.ndarray,float) or (None, None)

get_reml() → float

Get’s the (Laplace approximate) REML (Restricted Maximum Likelihood) score (as a float) for the estimated lambda values (see Wood, 2011).

References:

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
TypeError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

REML score

Return type:

float

get_resid(type: str = 'Pearson') → ndarray

Get different types of residuals from the estimated model.

By default (type='Pearson') this returns the residuals \(e_i = y_i - \mu_i\) for additive models and the pearson/working residuals \(w_i^{0.5}*(z_i - \eta_i)\) (see Wood, 2017 sections 3.1.5 & 3.1.7) for generalized additive models. Here \(w_i\) are the Fisher scoring weights, \(z_i\) the pseudo-data point for each observation, and \(\eta_i\) is the linear prediction (i.e., \(g(\mu_i)\) - where \(g()\) is the link function) for each observation.

If type= "Deviance", the deviance residuals are returned, which are equivalent to \(sign(y_i - \mu_i)*D_i^{0.5}\), where \(\sum_{i=1,...N} D_i\) equals the model deviance (see Wood 2017, section 3.1.7). Additionally, if the model was estimated with rho!=None, type="ar1" returns the standardized working residuals corrected for lag1 auto-correlation. These are best compared to the standard working residuals.

Throws an error if called before model was fitted, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which model.rho==None.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: type (str,optional) – The type of residual to return for a Generalized model, “Pearson” by default, but can be set to “Deviance” and (for some models) to “ar1” as well.
Raises:: ValueError – Will throw an error when called before the model was fitted/before model penalties were formed, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which model.rho==None.
Returns:: Empirical residual vector in a numpy array
Return type:: np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat.

But only using the terms indexed by use_terms. Importantly, predictions and standard errors are always returned on the scale of the linear predictor. When estimating a Generalized Additive Model, the mean predictions and standard errors (often referred to as the ‘response’-scale predictions) can be obtained by applying the link inverse function to the predictions and the CI-bounds on the linear predictor scale (DO NOT transform the standard error first and then add it to the transformed predictions - only on the scale of the linear predictor is the standard error additive). See examples below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat)

# By default, the Gamma family assumes that the model predictions match
# log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now make prediction for `f["x0"]`
new_dat = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":np.linspace(0,1,30),
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

f0,X_f,ci = model.predict([1],new_dat,ci=True)

# Can also use the plot function from mssmViz
plot(model,which=[1])

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:

use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.
n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)).

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets.

Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

# Include tensor smooth in model of log(mean)
formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],
                  data=Gammadat)

# By default, the Gamma family assumes that the model predictions match
# log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now we want to know whether the effect of x0 is different for two values of x1:
new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.25 for _ in range(30)],
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.75 for _ in range(30)],
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

# Now we can get the predicted difference of the effect of x0 for the two values of x1:
pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0)

# mssmViz also has a convenience function to visualize it:
plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Simpson, G. (2016). Simultaneous intervals for smooths revisited.

get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:

dat1 – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)). The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, not unlike the one returned in R when using the summary function for mgcv models.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:: NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:

pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to True

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) → ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where V is \([\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}*/\phi\) (see Wood, 2017; section 6.10). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Fit a Gamma Gam
Gammadat = sim3(500,2,family=Gamma(),seed=0)

formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat)

# By default, the Gamma family assumes that the model predictions match
# log(\mu_i), i.e., a log-link is used.
model = GAMM(formula,Gamma())
model.fit()

# Now get model matrix for a couple of example covariates
new_dat = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":np.linspace(0,1,30),
                        "x2":np.linspace(0,1,30),
                        "x3":np.linspace(0,1,30)})

f0,X_f,ci = model.predict([1],new_dat,ci=True)

# Get `use_post` to only identify coefficients related to `f(["x0"])` - that way we
# can efficiently sample the posterior only for `f(["x0"])`. If you want to sample all
# coefficients, simply set `use_post=None`.
use_post = X_f.sum(axis=0) != 0
use_post = np.arange(0,X_f.shape[1])[use_post]
print(use_post)

# `use_post` can now be passed to `sample_post`:
post = model.sample_post(10000,use_post,deviations=False,seed=0,par=0)

# Since we set deviations to false post has coefficient samples and can simply be
# post-multiplied to get samples of `f(["x0"])` - importantly, post has a different
# shape than X_f, so we need to account for that
post_f = X_f[:,use_post] @ post

# Note: samples are also on scale of linear predictor!
plt.plot(new_dat["x0"],f0,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2)

plt.show()

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves/predictions.

Return type:

np.ndarray

class mssm.models.GAMMLSS(formulas: list[Formula], family: GAMLSSFamily)

Bases: GSMM

Class to fit Generalized Additive Mixed Models of Location Scale and Shape (see Rigby & Stasinopoulos, 2005).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now define the model and fit!
model = GAMMLSS(formulas,family)
model.fit()

# Get total coef vector & split them
coef = model.coef
split_coef = np.split(coef,model.coef_split_idx)

# Get coef associated with the mean
coef_m = split_coef[0]
# and with the scale parameter
coef_s = split_coef[1]

# Similarly, `preds` holds linear predictions for m & s
pred_m = model.preds[0]
pred_s = model.preds[1]

# While `mu` holds the estimated fitted parameters
# (i.e., `preds` after applying the inverse of the link function of each parameter)
mu_m = model.mus[0]
mu_s = model.mus[1]

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

formulas ([Formula]) – A list of formulas for the GAMMLS model
family (GAMLSSFamily) – A GAMLSSFamily. Currently GAUMLSS, MULNOMLSS, and GAMMALS are supported.

Variables:

formulas ([Formula]) – The list of formulas passed to the constructor.
lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with None.
coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.
preds ([[float]]) – The linear predictors for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
mus ([[float]]) – The predicted means for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to hessian - diag*eps if self.info.eps > 0 after fitting). Initialized with None.
edf (float) – The model estimated degrees of freedom as a float. Initialized with None.
edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.
term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.
coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. See the examples. Initialized after fitting!
overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.
info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.
res (np.ndarray) – The working residuals of the model (If applicable). Initialized with None.

fit(max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int = 2, restart: bool = False, method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), should_keep_drop: bool = True, prefit_grad: bool = True, repara: bool = True, progress_bar: bool = True, n_cores: int = 10, seed: int = 0, init_lambda: list[float] | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster estimation set method='Chol'.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now define the model and fit!
model = GAMMLSS(formulas,family)
model.fit()

# Now fit again via Cholesky
model.fit(method="Chol")

Parameters:

max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to max_inner.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.
extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models involving heavily penalized functions. Disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. Defaults to “QR/Chol”.
check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.
piv_tol (float,optional) – Deprecated.
should_keep_drop (bool,optional) – Only used when method in ["QR/Chol","LU/Chol","Direct/Chol"]. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.
prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.
repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True.
progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None

get_llk(penalized: bool = True) → float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:: penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
Returns:: llk score
Return type:: float or None

get_mmat(use_terms: list[int] | None = None, par: int | None = None) → list[csc_array] | csc_array

Returns a list containing exaclty the model matrices used for fitting as a scipy.sparse.csc_array. Will raise an error when fitting was not completed before calling this function.

Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting par to the desired index, instead of None. Additionally, all columns not corresponding to terms for which the indices are provided via use_terms can optionally be zeroed.

Parameters:

use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
par (int or None, optional) – The index corresponding to the parameter of the distribution for which to obtain the model matrix. Setting this to None means all matrices are returned in a list, defaults to None.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Model matrices \(\mathbf{X}\) used for fitting - one per parameter of self.family or a single model matrix for a specific parameter.

Return type:

[scp.sparse.csc_array] or scp.sparse.csc_array

get_pars() → ndarray

Returns a list containing all coefficients estimated for the model. Use self.coef_split_idx to split the vector into separate subsets per distribution parameter.

Will return None if called before fitting was completed.

Returns:: Model coefficients - before splitting!
Return type:: [float] or None

get_reml() → float

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Raises:: ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
Returns:: REML score
Return type:: float

get_resid(**kwargs) → ndarray

Returns standarized residuals for GAMMLSS models (Rigby & Stasinopoulos, 2005).

The computation of the residual vector will differ between different GAMMLSS models and is thus implemented as a method by each GAMMLSS family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking \(N\) independent samples from \(N(0,1)\).

Additional arguments required by the specific GAMLSSFamily.get_resid() method can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:

NotImplementedError – An error is raised in case the residuals are to be computed for a Multinomial GAMMLSS model, which is currently not supported.
ValueError – An error is raised in case the residuals are requested before the model has been fit.

Returns:

A np.ndarray of standardized residuals that should be \(\sim N(0,1)\) if the model is correct.

Returns:

Standardized residual vector as array of shape (-1,1)

Return type:

np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat using only the terms indexed by use_terms and for distribution parameter par.

Importantly, predictions and standard errors are always returned on the scale of the linear predictor. For the Gaussian GAMMLSS model, the predictions for the standard deviation will for example usually (i.e., for the default link choices) reflect the log of the standard deviation. To get the predictions on the standard deviation scale, one could then apply the inverse log-link function to the predictions and the CI-bounds on the scale of the respective linear predictor. See the examples below.

Examples:

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)})

# Mean predictions don't have to be transformed since the Identity link is
# used for this predictor.
mu_mean,_,b_mean = model.predict(None,new_dat,ci=True)

# These can be used for confidence intervals:
mean_upper_CI = mu_mean + b_mean
mean_lower_CI = mu_mean - b_mean

# Standard deviation predictions do have to be transformed - by default they
# are on the log-scale.
eta_sd,_,b_sd = model.predict(None,new_dat,ci=True,par=1)

# Index to `links` is 1 because the sd is the second parameter!
mu_sd = model.family.links[1].fi(eta_sd)

# These can be used for approximate confidence intervals:
sd_upper_CI = model.family.links[1].fi(eta_sd + b_sd)
sd_lower_CI = model.family.links[1].fi(eta_sd - b_sd)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:

use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or None in which case all terms will be used.
n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0

Raises:

ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)).

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets and for distribution parameter par. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim9(500,1,seed=20)

# We include a tensor smooth in the model of the mean
formula_m = Formula(lhs("y"),
                    [i(),f(["x0","x1"],te=True)],
                    data=GAUMLSSDat)

# The model of the standard deviation remains the same
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"])],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

# Now we want to know whether the effect of x0 is different for two values of x1:
new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.25 for _ in range(30)]})

new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30),
                        "x1":[0.75 for _ in range(30)]})

# Now we can get the predicted difference of the effect of x0 for the two values of x1:
pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0)

# mssmViz also has a convenience function to visualize it:
plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:

dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or None in which case all terms will be used.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0

Raises:

ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)). The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:: NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:

pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) → ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the distribution (see argument par), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}\), set deviations to false.

see sample_MVN() for more details.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate 500 data points
GAUMLSSDat = sim6(500,seed=20)

# We need to model the mean: \mu_i = \alpha + f(x0)
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# and the standard deviation as well: log(\sigma_i) = \alpha + f(x0)
formula_sd = Formula(lhs("y"),
                    [i(),f(["x0"],nk=10)],
                    data=GAUMLSSDat)

# Collect both formulas
formulas = [formula_m,formula_sd]

# Create Gaussian GAMMLSS family with identity link for mean
# and log link for sigma
family = GAUMLSS([Identity(),LOG()])

# Now fit
model = GAMMLSS(formulas,family)
model.fit()

new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)})

# Now obtain the estimate for `f(["x0"],nk=10)` and the model matrix corresponding to
# it! Note, that we set `use_terms = [1]` - so all columns in X_f not belonging to
# `f(["x0"],nk=10)` (e.g., the first one, belonging to the offset) are zeroed.
mu_f,X_f,_ = model.predict([1],new_dat,ci=True)

# Now we can sample from the posterior of `f(["x0"],nk=10)` in the model of the mean:
post = model.sample_post(10000,None,deviations=False,seed=0,par=0)

# Since we set deviations to false post has coefficient samples and can simply be
# post-multiplied to get samples of `f(["x0"],nk=10)`
post_f = X_f @ post

# Plot the estimated effect and 50 posterior samples
plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2)

plt.show()

# In this case, we are not interested in the offset, so we can omit it during the
# sampling step (i.e., to not sample coefficients for it):

# `use_post` identifies only coefficients related to `f(["x0"],nk=10)`
use_post = X_f.sum(axis=0) != 0
use_post = np.arange(0,X_f.shape[1])[use_post]
print(use_post)

# `use_post` can now be passed to `sample_post`:
post2 = model.sample_post(10000,use_post,deviations=False,seed=0,par=0)

# Importantly, post2 now has a different shape - which we have to take into account
# when multiplying.
post_f2 = X_f[:,use_post] @ post2

plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2)

for sidx in range(50):
    plt.plot(new_dat["x0"],post_f2[:,sidx],alpha=0.2)

plt.show()

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter par, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
par (int) – The index corresponding to the distribution parameter for which to make the prediction (e.g., 0 = mean)

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves.

Return type:

np.ndarray

class mssm.models.GSMM(formulas: list[Formula], family: GSMMFamily)

Bases: object

Class to fit General Smooth/Mixed Models (see Wood, Pya, & Säfken; 2016). Estimation is possible via exact Newton method for coefficients of via L-qEFS update (see Krause et al., (submitted) and example below).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

class NUMDIFFGENSMOOTHFamily(GSMMFamily):
    # Implementation of the ``GSMMFamily`` class that uses finite differencing to obtain the
    # gradient of the likelihood to estimate a Gaussian GAMLSS via the general smooth code
    # and the L-qEFS update by Krause et al. (in preparation).

    # References:
    #  - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General
    #       Smooth Models.
    #  - Nocedal & Wright (2006). Numerical Optimization. Springer New York.


    def __init__(self, pars: int, links:[Link]) -> None:
        super().__init__(pars, links)

    def llk(self, coef, coef_split_idx, ys, Xs):
        # Likelihood for a Gaussian GAM(LSS) - implemented so
        # that the model can be estimated using the general smooth code.
        y = ys[0]
        split_coef = np.split(coef,coef_split_idx)
        eta_mu = Xs[0]@split_coef[0]
        eta_sd = Xs[1]@split_coef[1]

        mu_mu = self.links[0].fi(eta_mu)
        mu_sd = self.links[1].fi(eta_sd)

        family = GAUMLSS(self.links)
        llk = family.llk(y,mu_mu,mu_sd)
        return llk

# Simulate 500 data points
sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False)

# We need to model the mean: \mu_i
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                    data=sim_dat)

# And for sd - here constant
formula_sd = Formula(lhs("y"),
                    [i()],
                    data=sim_dat)

# Collect both formulas
formulas = [formula_m,formula_sd]
links = [Identity(),LOGb(-0.001)]

# Now define the general family + model and fit!
gsmm_fam = NUMDIFFGENSMOOTHFamily(2,links)
model = GSMM(formulas=formulas,family=gsmm_fam)

# Fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(method='qEFS',bfgs_options=bfgs_opt)

# Extract all coef
coef = model.coef

# Now split them to get separate lists per parameter of the log-likelihood (here mean and
# scale) split_coef[0] then holds the coef associated with the first parameter (here the
# mean) and so on
split_coef = np.split(coef,model.coef_split_idx)

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

formulas ([Formula]) – A list of formulas, one per parameter of the likelihood that is to be modeled as a smooth model
family (GSMMFamily) – A GSMMFamily family.

Variables:

formulas ([Formula]) – The list of formulas passed to the constructor.
lvi (scp.sparse.csc_array | None) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix - or None, in case the L-BFGS-B optimizer was used and form_VH was set to False when calling model.fit(). Initialized with None.
lvi_linop (scp.sparse.linalg.LinearOperator) – A scipy.sparse.linalg.LinearOperator of the conditional model coefficient covariance matrix (not the root) - or None. Only available in case the L-BFGS-B optimizer was used and form_VH was set to False when calling model.fit().
coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with None.
preds ([[float]]) – The linear predictors for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
mus ([[float]]) – The predicted means for every parameter of family evaluated for each observation in the training data (after removing NaNs). Initialized with None.
hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to hessian - diag*eps if self.info.eps > 0 after fitting). Initialized with None.
edf (float) – The model estimated degrees of freedom as a float. Initialized with None.
edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with None.
term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the approx_smooth_p_values() function, the first time it is called. Initialized with None.
penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with None.
coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of family. See the examples. Initialized after fitting!
overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with None.
info (Fit_info) – A Fit_info instance, with information about convergence (speed) of the model.

fit(init_coef: ndarray | None = None, max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int | None = None, restart: bool = False, optimizer: str = 'Newton', method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), progress_bar: bool = True, n_cores: int = 10, seed: int = 0, drop_NA: bool = True, init_lambda: list[float] | None = None, form_VH: bool = True, use_grad: bool = False, build_mat: list[bool] | None = None, should_keep_drop: bool = True, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = False, prefit_grad: bool = True, repara: bool = None, extra_penalties: list[LambdaTerm] | None = None, callback: Callable | None = None, init_bfgs_options: dict | None = None, bfgs_options: dict | None = None)

Fit the specified model.

Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see examples below.

Parameters:

init_coef (np.ndarray,optional) – An initial estimate for the coefficients. Must be a numpy array of shape (-1,1). Defaults to None.
max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to max_inner.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against conv_tol * previous penalized deviance) criterion used to determine convergence.
extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models with heavily penalized functions. Disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed. Set to 2 by default if method != 'qEFS' and otherwise to 1.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
optimizer (str,optional) – Deprecated. Defaults to “Newton”
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “QR/Chol”.
check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). Defaults to 1.
piv_tol (float,optional) – Deprecated.
progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
drop_NA (bool,optional) – Whether to drop rows in the model matrices and observations vectors corresponding to NAs in the observation vectors. Set this to False if you want to handle NAs yourself in the likelihood function. Defaults to True.
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None
form_VH (bool,optional) – Whether to explicitly form matrix V - the estimated inverse of the negative Hessian of the penalized likelihood - and H - the estimate of the Hessian of the log-likelihood - when using the qEFS method. If set to False, only V is returned - as a scipy.sparse.linalg.LinearOperator - and available in self.lvi. Additionally, self.hessian will then be equal to None. Note, that this will break default prediction/confidence interval methods - so do not call them. Defaults to True
use_grad (bool,optional) – Deprecated.
build_mat ([bool], optional) – An (optional) list, containing one bool per mssm.src.python.formula.Formula in self.formulas - indicating whether the corresponding model matrix should be built. Useful if multiple formulas specify the same model matrix, in which case only one needs to be built. Only the matrices actually built are then passed down to the likelihood/gradient/hessian function in Xs, for the remaining ones None is inserted in the list. Defaults to None, which means all model matrices are built.
should_keep_drop (bool,optional) – Only used when method in ["QR/Chol","LU/Chol","Direct/Chol"]. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.
gamma (float,optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models! Defaults to 1.
qEFSH (str,optional) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS') . Defaults to ‘SR1’.
overwrite_coef (bool,optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS'. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect when qEFS_init_converge=True. Defaults to True.
max_restarts (int,optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when method='qEFS'. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more. Defaults to 0.
qEFS_init_converge (bool,optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS'. Defaults to False.
prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.
repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True if method != 'qEFS' else False.
extra_penalties (list[LambdaTerm] | None, optional) – Experimental. An optional list of extra penalties to be placed on the coefficients. Important: mssm does not (currently) support partially overlapping penalties. Thus, if you do provide your own penalties they should either only affect coefficients not penalized already or match (in terms of start_index and dimensions of S_J) an existing penalty matrix. Currently, is not supported together with repara, so if this argument is not None we set repara=False. Defaults to None.
callback (Callable | None ,optional) – An optional callback function to call after every update to the \(\lambda\) parameters. The signature of the provided function needs to match callback(outer:int,pen_llk:float,coef:np.ndarray,lam:[float]) -> None, where outer is the current iteration of the outer algorithm used to update the \(\lambda`\) parameters, pen_llk is the current penalized log-likelihood, coef is the current coefficient estimate, and lam holds a list with the current \(lambda\) parameters. Defaults to None.
init_bfgs_options (dict,optional) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem. If this is None, it will be set to a copy of bfgs_options. Only has an effect when qEFS_init_converge=True. Defaults to None.
bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'. If none are provided, the gtol argument will be initialized to conv_tol. Note also, that in any case the maxiter argument is automatically set to max_inner. Defaults to None.

Raises:

ValueError – Will throw an error when optimizer is not ‘Newton’.

get_llk(penalized: bool = True, drop_NA: bool = True) → float | None

Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.

Will instead return None if called before fitting.

Parameters:

penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped, defaults to True

Returns:

llk score

Return type:

float or None

get_mmat(use_terms: list[int] | None = None, drop_NA: bool = True, par: int | None = None) → list[csc_array] | csc_array

By default, returns a list containing exactly the model matrices used for fitting as a scipy.sparse.csc_array. Will raise an error when fitting was not completed before calling this function.

Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting par to the desired index, instead of None. Additionally, all columns not corresponding to terms for which the indices are provided via use_terms are zeroed in case use_terms is not None.

Parameters:

use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True
par (int or None, optional) – The index corresponding to the parameter of the log-likelihood for which to obtain the model matrix. Setting this to None means all matrices are returned in a list, defaults to None.

Raises:

ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.

Returns:

Model matrices \(\mathbf{X}\) used for fitting - one per parameter of self.family or a single model matrix for a specific parameter.

Return type:

[scp.sparse.csc_array] or scp.sparse.csc_array

get_pars() → ndarray

Returns a list containing all coefficients estimated for the model. Use self.coef_split_idx to split the vector into separate subsets per parameter of the log-likelihood.

Will return None if called before fitting was completed.

Returns:: Model coefficients - before splitting!
Return type:: [float] or None

get_reml(drop_NA: bool = True) → float

Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).

References:

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:: drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped when computing the log-likelihood, defaults to True
Raises:: ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
Returns:: REML score
Return type:: float

get_resid(drop_NA: bool = True, **kwargs) → ndarray

The computation of the residual vector will differ between different GSMM models and is thus implemented as a method by each GSMMFamily family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking independent samples from \(N(0,1)\).

Additional arguments required by the specific GSMMFamily.get_resid() method can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped from the model matrices, defaults to True
Raises:: ValueError – An error is raised in case the residuals are requested before the model has been fit.
Returns:: vector of standardized residuals of shape (-1,1). Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation).
Return type:: np.ndarray

predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, csc_array, ndarray | None]

Make a prediction using the fitted model for new data n_dat using only the terms indexed by use_terms and for parameter par of the log-likelihood.

Importantly, predictions and standard errors are always returned on the scale of the linear predictor.

See the GAMMLSS.predict() function for code examples.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:

use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.
n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
ci (bool, optional) – Whether the standard error se for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred - se, pred + se]
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [pred - se, pred + se]
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0

Raises:

ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 3 entries. The first entry is the prediction pred based on the new data n_dat. The second entry is the model matrix built for n_dat that was post-multiplied with the model coefficients to obtain pred. The third entry is None if ci``==``False else the standard error se in the prediction multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)).

Return type:

(np.ndarray,scp.sparse.csc_array,np.ndarray or None)

predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) → tuple[ndarray, ndarray]

Get the difference in the predictions for two datasets and for parameter par of the log-likelihood. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case, dat1 and dat2 should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see the predict() method for details.

See the GAMMLSS.predict_diff() function for code examples.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Simpson, G. (2016). Simultaneous intervals for smooths revisited.

get_difference function from itsadug R-package: https://rdrr.io/cran/itsadug/man/get_difference.html

Parameters:

dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.
alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0

Raises:

ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.

Returns:

A tuple with 2 entries. The first entry is the predicted difference (between the two data sets dat1 & dat2) diff. The second entry is the standard error se of the predicted difference multiplied by the critical value determined by alpha (e.g., ~ 1.96 if alpha = 0.05). If you want the function to return just the standard error, set alpha = 2 * (1 - scp.stats.norm.cdf(1)). The difference CI is then [diff - se, diff + se]

Return type:

(np.ndarray,np.ndarray)

print_parametric_terms()

Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Raises:: NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)

Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:

pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False

sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) → ndarray

Obtain n_ps samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the log-likelihood (see argument par), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}_m\), set deviations to false.

see sample_MVN() for more details and the GAMMLSS.sample_post() function for code examples.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter par, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.

Returns:

An np.ndarray of dimension [len(use_post),n_ps] containing the posterior samples. If use_post is None, len(use_post) will match the number of coefficients associated with parameter par of the log-likelihood instead. Can simply be post-multiplied with (the subset of columns indicated by use_post of) the model matrix \(\mathbf{X}^m\) associated with the parameter \(m\) of the log-likelihood to generate posterior sample curves.

Return type:

np.ndarray

mssm.src.python.compact_rep module

mssm.src.python.compact_rep.computeH(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, explicit: bool = True) → ndarray | tuple[ndarray, ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the quasi-Newton approximation to the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)) from the L-BFGS-B optimizer info.

Relies on equations 2.16 in Byrd, Nocdeal & Schnabel (1992).

References:

Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).
H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).
explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of four update matrices.

Returns:

H, either as np.ndarray (explicit=='True') or represented implicitly via four update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeHSR1(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, omega: float = 1, make_psd: bool = False, make_pd: bool = False, explicit: bool = True) → ndarray | tuple[ndarray, ndarray, ndarray]

Computes, (explicitly or implicitly) the symmetric rank one (SR1) approximation of the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)).

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the make_psd and make_pd arguments.

References:

Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).
H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).
omega (float, optional) – Multiple of the identity matrix used as initial estimate.
make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.
make_pd (bool, optional) – Whether to enforce numeric positive definiteness, not just PSD. Ignored if make_psd=False. By default set to False.
explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

H, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeV(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, explicit: bool = True) → ndarray | tuple[ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the quasi-Newton approximation to the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)) from the L-BFGS-B optimizer info.

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992).

References:

Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).
V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).
explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

V, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compact_rep.computeVSR1(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, omega: float = 1, make_psd: bool = False, explicit: bool = True) → ndarray | tuple[ndarray, ndarray, ndarray]

Computes (explicitly or implicitly) the symmetric rank one (SR1) approximation of the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)).

Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the make_psd argument.

References:

Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set m of update vectors from Byrd, Nocdeal & Schnabel (1992).
rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise `1/y.T@s from Byrd, Nocdeal & Schnabel (1992).
V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by omega).
omega (float, optional) – Multiple of the identity matrix used as initial estimate.
make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.
explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.

Returns:

V, either as np.ndarray (explicit=='True') or represented implicitly via three update vectors (also np.ndarrays)

Return type:

np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.compare module

mssm.src.python.compare.compare_CDL(model1: GAMM | GAMMLSS | GSMM, model2: GAMM | GAMMLSS | GSMM, correct_V: bool = True, correct_t1: bool | None = None, perform_GLRT: bool = False, nR: int = 250, n_c: int = 1, alpha: int = 0.05, grid: str | None = None, a: float = 1e-07, b: float = 10000000.0, df: int = 40, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', seed: int | None = None, only_expected_edf: bool | None = None, Vp_fidiff: bool = False, use_importance_weights: bool | None = None, prior: Callable | None = None, recompute_H: bool | None = None, compute_Vcc: bool | None = None, bfgs_options: dict = {}) → dict

Computes the AIC difference and (optionally) performs an approximate GLRT on twice the difference in unpenalized likelihood between models model1 and model2 (see Wood et al., 2016).

For the GLRT to be appropriate model1 should be set to the model containing more effects and model2 should be a nested, simpler, variant of model1. For the degrees of freedom for the test, the expected degrees of freedom (EDF) of each model are used (i.e., this is the conditional test discussed in Wood (2017: 6.12.4)). The difference between the models in EDF serves as DoF for computing the Chi-Square statistic. In addition, correct_t1 should be set to True, when computing the GLRT.

To get the AIC for each model, 2*edf is added to twice the negative (conditional) likelihood (see Wood et al., 2016).

By default (correct_V=True), mssm will attempt to correct the edf for uncertainty in the estimated \(\lambda\) parameters. Which correction is computed depends on the choice for the grid argument. Approximately the analytic solution for the correction proposed by Wood, Pya, & Säfken (2016) is computed when grid='JJJ1' (the default) - which is exact for strictly Gaussian and some canonical Generalized additive models. This is too costly for very large sparse multi-level models and not exact for more generic models. The MC based alternative available via grid = 'JJJ2' addresses the first problem (Important, set: use_importance_weights=False and only_expected_edf=True.). The second MC based-alternative available via grid_type = 'JJJ3' is most appropriate for more generic models (The prior argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set: use_importance_weights=True and only_expected_edf=False). For more details consult the mssm.src.python.utils.correct_VB() function, the examples below, and Krause et al. (submitted).

In case any of those correction strategies is too expensive, it might be better to rely on hypothesis tests for individual smooths, confidence intervals, and penalty-based selection approaches instead (see Marra & Wood, 2011 for details on the latter).

In case correct_t1=True the EDF will be set to the (smoothness uncertainty corrected in case correct_V=True) smoothness bias corrected exprected degrees of freedom (t1 in section 6.1.2 of Wood, 2017), for the GLRT (based on recomendation given in section 6.12.4 in Wood, 2017). The AIC (Wood, 2017) of both models will still be based on the regular (smoothness uncertainty corrected) edf.

The computation here is different to the one performed by the compareML function in the R-package itsadug - which rather performs a version of the marginal GLRT (also discussed in Wood, 2017: 6.12.4) - and more similar to the anova.gam implementation provided by mgcv (particularly if grid='JJJ1'). The returned p-value is approximate - very **very** much so if ``correct_V=False (this should really never be done). Also, the GLRT should not be used to compare models differing in their random effect structures - the AIC is more appropriate for this (see Wood, 2017: 6.12.4).

Examples:

### Model comparison and smoothness uncertainty correction for strictly additive model

# Simulate some data
sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),f(["x0"],nk=20,rp=1),
                             f(["x1"],nk=20,rp=1),
                             f(["x2"],nk=20,rp=1),
                             f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model = GAMM(sim_fit_formula,Gaussian())
sim_fit_model.fit()

sim_fit_formula2 = Formula(lhs("y"),
                            [i(),f(["x1"],nk=20,rp=1),
                             f(["x2"],nk=20,rp=1),
                             f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model2 = GAMM(sim_fit_formula2,Gaussian())
sim_fit_model2.fit()


# And perform a smoothness uncertainty corrected comparisons
cor_result1 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22)

# To perform a GLRT and correct the edf for smoothness bias as well (e.g., Wood, 2017) run:
cor_result2 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22,
    perform_GLRT=True,correct_t1=True)

# Model comparison and smoothness uncertainty correction for very large
# strictly additive model

# If the models are quite large (many coefficients) the following
# (this is the first MC strategy discussed in
# section 5.2 of Krause et al. (submitted)) can be much faster:
nR = 250 # Number of samples to use for the numeric integration
cor_result3 = compare_CDL(sim_fit_model,sim_fit_model2,nR=nR,n_c=10,
    correct_t1=False,grid='JJJ2',seed=22,only_expected_edf=True,
    use_importance_weights=False)

### Model comparison and smoothness uncertainty correction for more generic smooth model
# (GAMM, GAMMLSS, etc.) We can still rely on grid='JJJ1' (which is why it is the default)
# but this will be approximate.
# See section 5.1 in the manuscript by Krause et al. (submitted) for justification or
# section 3.4.3 in the book by Wood (2017)). An alternative is the second MC strategy
# discussed in section 5.3 of Krause et al. (submitted).
# The code below shows how to get mssm to rely on this strategy:

# Simulate some data
sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gamma(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),f(["x0"],nk=20,rp=1),
                             f(["x1"],nk=20,rp=1),
                             f(["x2"],nk=20,rp=1),
                             f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_formula_sd = Formula(lhs("y"),
                            [i()],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model = GAMMLSS([sim_fit_formula,copy.deepcopy(sim_fit_formula_sd)],
    family = GAMMALS([LOG(),LOGb(-0.01)]))
sim_fit_model.fit()

sim_fit_formula2 = Formula(lhs("y"),
                            [i(),f(["x1"],nk=20,rp=1),
                             f(["x2"],nk=20,rp=1),
                             f(["x3"],nk=20,rp=1)],
                            data=sim_fit_dat,
                            print_warn=False)

sim_fit_model2 = GAMMLSS([sim_fit_formula2,copy.deepcopy(sim_fit_formula_sd)],
    family = GAMMALS([LOG(),LOGb(-0.01)]))
sim_fit_model2.fit()

# Set up a uniform prior from log(1e-7) to log(1e12) for each regularization parameter
prior = DummyRhoPrior(b=np.log(1e12))

# Now correct for uncertainty in regularization parameters using the second MC strategy
# discussed by Krause et al. (submitted). You can also set prior to ``None`` in which case
# the proposal distribution (by default a T-distribution with 40 degrees of freedom) is
# used as prior:
cor_result_gs_1 = compare_CDL(sim_fit_model,sim_fit_model2,n_c=10,grid='JJJ3',seed=22,
    only_expected_edf=False,use_importance_weights=True,prior=prior,recompute_H=True)

References:

Marra, G., & Wood, S. N. (2011) Practical variable selection for generalized additive models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
compareML function from itsadug R-package: https://rdrr.io/cran/itsadug/man/compareML.html
anova.gam function from mgcv, see: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/anova.gam

Parameters:

model1 (GAMM | GAMMLSS | GSMM) – GAMM, GAMMLSS, or GSMM 1.
model2 (GAMM | GAMMLSS | GSMM) – GAMM, GAMMLSS, or GSMM 2.
correct_V (bool, optional) – Whether or not to correct for smoothness uncertainty. Defaults to True
correct_t1 (bool | None, optional) – Whether or not to also correct the smoothness bias corrected edf for smoothness uncertainty. Defaults to None - meaning that mssm will select an appropriate value.
perform_GLRT (bool, optional) – Whether to perform both a GLRT and to compute the AIC or to only compute the AIC. Defaults to True.
nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250
n_c (int, optional) – Number of cores to use during parallel parts of the correction. Note, if you want to use more than one core for more generic models it will most likely be necessary to install mssm with the extra mp dependency set. This installs the multiprocess package, which is necessary since most general models implement at least one local function that cannot be serialized by the standard multiprocessing library. To install the extra dependency set simply run pip install -U mssm[mp], defaults to 1
alpha (float, optional) – alpha level of the GLRT. Defaults to 0.05
grid (str | None, optional) – How to compute the smoothness uncertainty correction, defaults to None - meaning that mssm will select an appropriate value.
a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate
b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate
df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample/propose the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40
verbose (bool, optional) – Whether to print progress information or not, defaults to False
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.
seed (int,optional) – Seed to use for random parts of the correction. Defaults to None
only_expected_edf (bool|None, optional) – Whether to compute edf. by explicitly forming covariance matrix (only_expected_edf=False) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense when grid_type!='JJJ1'. Defaults to None - meaning that mssm will select an appropriate value.
Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}_{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool | None,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}_{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to None - meaning that mssm will select an appropriate value.
prior (any, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None
recompute_H (bool | None, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to None - meaning that mssm will select an appropriate value.
compute_Vcc (bool | None, optional) – Whether to compute the second correction term when grid=’JJJ1’ (or when computing the lower-bound for the remaining grids) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to None - meaning that mssm will select an appropriate value.
bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'. If none are provided, the gtol argument will be initialized to conv_tol. Note also, that in any case the maxiter argument is automatically set to max_inner. Defaults to None.

Raises:

ValueError – If both models are from different families.
ValueError – If perform_GLRT=True and model1 has fewer coef than model2 - i.e., model1 has to be the notationally more complex one.

Returns:

A dictionary with outcomes of all tests. Key H1 will be a bool indicating whether Null hypothesis was rejected or not, p will be the p-value, test_stat will be the test statistic used, Res. DOF will be the degrees of freedom used by the test, aic1 and aic2 will be the aic scores for both models.

Return type:

dict

mssm.src.python.custom_types module

class mssm.src.python.custom_types.ConstType(*values)

Bases: Enum

Custom Constraint data type used by internal functions.

DIFF = 3

DROP = 1

QR = 2

class mssm.src.python.custom_types.Constraint(Z: ndarray | int | None = None, type: ConstType | None = None)

Bases: object

Constraint storage. Z, either holds the Qr-based correction matrix that needs to be multiplied with \(\mathbf{X}\), \(\mathbf{S}\), and \(\mathbf{D}\) (where \(\mathbf{D}\mathbf{D}^T = \mathbf{S}\)) to make terms subject to the conventional sum-to-zero constraints applied also in mgcv (Wood, 2017), the column/row that should be dropped from those - then \(\mathbf{X}\) can also no longer take on a constant, or None indicating that the model should be “difference re-coded” to enable sparse sum-to-zero constraints. The latter two are available in mgcv’s smoothCon function by setting the sparse.cons argument to 1 or 2 respectively.

The QR-based approach is described in detail by Wood (2017) and is similar to just mean centering every basis function involved in the smooth and then dropping one column from the corresponding centered model matrix. The column-dropping approach is self-explanatory. The difference re-coding re-codes bases functions to correspond to differences of bases functions. The resulting basis remains sparser than the alternatives, but this is not a true centering constraint: \(f(x)\) will not necessarily be orthogonal to the intercept, i.e., \(\mathbf{1}^T \mathbf{f(x)}\) will not necessarily be 0. Hence, confidence intervals will usually be wider when using ConstType.DIFF (also when using ConstType.DROP, for the same reason) instead of ConstType.QR (see Wood; 2017,2020)!

A final note regards the use of tensor smooths when te==False. Since the value of any constant estimated for a smooth depends on the type of constraint used, the marginal functions estimated for the “main effects” (\(f(x)\), \(f(z)\)) and “interaction effect” (\(f(x,z)\)) in a model: \(y = a + f(x) + f(z) + f(x,z)\) will differ depending on the type of constraint used. The “Anova-like” decomposition described in detail in Wood (2017) is achievable only when using ConstType.QR.

Thus, ConstType.QR is the default by all mssm functions, and the other two options should be considered experimental.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2020). Inference and computation with generalized additive models and their extensions. TEST, 29(2), 307–339. https://doi.org/10.1007/s11749-020-00711-5
Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655

Z: ndarray | int | None = None

type: ConstType | None = None

class mssm.src.python.custom_types.Fit_info(lambda_updates: int = 0, iter: int = 0, code: int = 1, eps: float | None = None, K2: float | None = None, dropped: ndarray[tuple[Any, ...], dtype[int64]] | None = None)

Bases: object

Holds information related to convergence (speed) for GAMMs, GAMMLSS, and GSMMs.

Variables:

lambda_updates (int) – The total number of lambda updates computed during estimation. Initialized with 0.
iter (int) – The number of outer iterations (a single outer iteration can involve multiple lambda updates) completed during estimation. Initialized with 0.
code (int) – Convergence status. Anything above 0 indicates that the model did not converge and estimates should be considered carefully. Initialized with 1.
eps (float) – The fraction added to the last estimate of the negative Hessian of the penalized likelihood during GAMMLSS or GSMM estimation. If this is not 0 - the model should not be considered as converged, irrespective of what code indicates. This most likely implies that the model is not identifiable. Initialized with None and ignored for GAMM estimation.
K2 (float) – An estimate for the condition number of matrix A, where A.T@A=H and H is the final estimate of the negative Hessian of the penalized likelihood. Only available if check_cond>0 when model.fit() is called for any model (i.e., GAMM, GAMMLSS, GSMM). Initialized with None.
dropped (np.typing.NDArray[np.int]) – The final set of coefficients dropped during GAMMLSS/GSMM estimation when using method in ["QR/Chol","LU/Chol","Direct/Chol"] or None in which case no coefficients were dropped. Initialized with None.

K2: float | None = None

code: int = 1

dropped: ndarray[tuple[Any, ...], dtype[int64]] | None = None

eps: float | None = None

iter: int = 0

lambda_updates: int = 0

Bases: object

\(\lambda\) storage term.

Usually model.overall_penalties holds a list of these.

Variables:

S_J (scp.sparse.csc_array) – The penalty matrix associated with this lambda term. Note, in case multiple penalty matrices share the same lambda value, the rep_sj argument determines how many diagonal blocks we need to fill with this penalty matrix to get S_J_emb. Initialized with None.
S_J_emb (scp.sparse.csc_array) – A zero-embedded version of the penalty matrix associated with this lambda term. Note, this matrix contains rep_sj diagonal sub-blocks each filled with S_J. Initialized with None.
D_J_emb (scp.sparse.csc_array) – Root of S_J_emb, so that D_J_emb@D_J_emb.T=S_J_emb. Initialized with None.
rep_sj (int) – How many sequential sub-blocks of S_J_emb need to be filled with S_J. Useful if all levels of a categorical variable for which a separate smooth is to be estimated are assumed to share the same lambda value. Initialized with 1.
lam (float) – The current estimate for \(\lambda\). Initialized with 1.1.
start_index (int) – The first row and column in the overall penalty matrix taken up by S_J. Initialized with None.
type (PenType) – The type of this penalty term. Initialized with None.
rank (int) – The rank of S_J. Initialized with None.
term (list[int] | int) – The index (indices) of the term(s) in a mssm.src.python.formula.Formula with which this penalty is associated. Initialized with None.

D_J_emb: csc_array | None = None

D_J_embs: list[csc_array] | None = None

S_J: csc_array | None = None

S_J_emb: csc_array | None = None

S_J_embs: list[csc_array] | None = None

S_Js: list[csc_array] | None = None

clust_series: list[int] | None = None

clust_weights: list[list[float]] | None = None

dist_param: list[int] | int = 0

frozen: bool = False

id: int | None = None

lam: float = 1.1

rank: int | None = None

ranks: list[int] | None = None

rep_sj: int = 1

rep_sjs: list[int] | None = None

rp_idx: int | None = None

start_index: int | None = None

start_indices: list[csc_array] | None = None

term: list[int] | int | None = None

type: PenType | None = None

types: list[PenType] | None = None

class mssm.src.python.custom_types.PenType(*values)

Bases: Enum

Custom Penalty data type used by internal functions.

COEFFICIENTS = 7

CUSTOM = 8

DERIVATIVE = 6

DIFFERENCE = 2

DISTANCE = 3

IDENTITY = 1

NULL = 5

REPARAM1 = 4

REPARAM2 = 10

SHARED = 9

Bases: object

Holds information necessary to re-parameterize a smooth term.

Variables:

Srp (scp.sparse.csc_array) – The transformed penalty matrix
Drp (scp.sparse.csc_array) – The root of the transformed penalty matrix
C (scp.sparse.csc_array) – Transformation matrix for model matrix and/or penalty.

C: csc_array | None = None

Drp: csc_array | None = None

IRrp: csc_array | None = None

Srp: csc_array | None = None

rank: int | None = None

rms1: float | None = None

rms2: float | None = None

scale: float | None = None

class mssm.src.python.custom_types.TermType(*values)

Bases: Enum

Custom Term data type used by internal functions.

IRSMOOTH = 1

LINEAR = 3

RANDINT = 4

RANDSLOPE = 5

SMOOTH = 2

class mssm.src.python.custom_types.VarType(*values)

Bases: Enum

Custom variable data type used by internal functions.

FACTOR = 2

NUMERIC = 1

mssm.src.python.exp_fam module

class mssm.src.python.exp_fam.Binomial(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Logit object>, n: int | list[int] = 1)

Bases: Family

Binomial family. For this implementation we assume that we have collected proportions of success, i.e., the dependent variables specified in the model Formula needs to hold observed proportions and not counts! If we assume that each observation \(y_i\) reflects a single independent draw from a binomial, (with \(n=1\), and \(p_i\) being the probability that the result is 1) then the dependent variable should either hold 1 or 0. If we have multiple independent draws from the binomial per observation (i.e., row in our data-frame), then \(n\) will usually differ between observations/rows in our data-frame (i.e., we observe \(k_i\) counts of success out of \(n_i\) draws - so that \(y_i=k_i/n_i\)). In that case, the Binomial() family accepts a vector for argument \(\mathbf{n}\) (which is simply set to 1 by default, assuming binary data), containing \(n_i\) for every observation \(y_i\).

In this implementation, the scale parameter is kept fixed/known at 1.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical logit link.
n (int or [int], optional) – Number of independent draws from a Binomial per observation/row of data-frame. For binary data this can simply be set to 1, which is the default.

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the model deviance

Return type:

np.ndarray

V(mu: ndarray) → ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2) for the Binomial model. Variance is minimal for \(\mu=1\) and \(\mu=0\), maximal for \(\mu=0.5\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

Deviance of the model

Return type:

float

init_mu(y: ndarray) → ndarray

Function providing initial \(\boldsymbol{\mu}\) vector for GAMM.

Estimation assumes proportions as dep. variable. According to: https://stackoverflow.com/questions/60526586/ the glm() function in R always initializes \(\mu\) = 0.75 for observed proportions (i.e., elements in \(\mathbf{y}\)) of 1 and \(\mu\) = 0.25 for proportions of zero. This can be achieved by adding 0.5 to the observed proportion of success (and adding one observation).

Parameters:: y (np.ndarray) – A numpy array containing each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing an initial estimate of the probability of success per observation
Return type:: np.ndarray

llk(y: ndarray, mu: ndarray) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

log-likelihood of the model

Return type:

float

lp(y: ndarray, mu: ndarray) → ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective binomial with mean = \(\boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed proportion.
mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.ExtendedFamily(link: Link, theta: None | ndarray = None)

Bases: Family

Base class to be implemented by any “extended family” member. This family, defined by Wood et al. (2016) essentially includes any model which we can estimate via iterative reweighted least-squares. Likelihood can have additional parameters beyond scale and mean which can be estimated along model coefficients (see theta parameter).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

link (Link) – The link function to be used by the model of the mean of this family.
theta (float or np.ndarray, optional) – Any additional parameters of the likelihood (inculding any required scale parameter). Array needs to be of shape (-1,1). Setting this to None means the parameters have to be estimated.

Variables:

theta (None | np.ndarray) – The (estimated) extra parameters of the log-likelihood. Each implementation of this class should initalize these if not provided and calls to GAMM.fit() will overwrite this attribute if the initial value for theta passed to the constructor was None. Defaults to None

D(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016).

Take a look at the ScaledT implementation as an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the overall deviance.

Return type:

np.ndarray

Ed2Ddmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes expected second derivative of the deviance or twice the negative log-likelihood with respect to mu. This function is used during fitting, but by default simply falls back to calling the d2Ddmu function to get the observed second derivatives.

Take a look at the ScaledT implementation as an example.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the expected second derivatives of the deviance with respect to mu per observation.

Return type:

np.ndarray

V(mu: ndarray, theta: None | ndarray = None) → ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2) for an extended family.

Take a look at the ScaledT implementation as an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean

Return type:

np.ndarray

d2Ddmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes second derivative of the deviance or twice the negative log-likelihood with respect to mu. This function is by default used directly during fitting, but can be overwritten by implementing the Ed2Ddmu method. In that case, this function is only called after estimation has been completed to get the observed hessian at the final coefficient estimate.

Take a look at the ScaledT implementation as an example.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the second derivative of the deviance with respect to mu for each observation.

Return type:

np.ndarray

dDdmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes derivative of the deviance or twice the negative log-likelihood with respect to mu.

Take a look at the ScaledT implementation as an example.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the derivatives of the deviance with respect to mu for each observation.

Return type:

np.ndarray

dVy1(mu: ndarray, theta: None | ndarray = None) → ndarray | None

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean. Optional function that might simply return None

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean or None

Return type:

np.ndarray | None

deviance(y: ndarray, mu: ndarray, theta: None | ndarray = None) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

Take a look at the ScaledT implementation as an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

Deviance of the model under this family

Return type:

float

gradientLTheta(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes gradient of the log-likelihood with respect to self.theta, given mu.

Take a look at the ScaledT implementation as an example.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (len(self.theta),1) containing the gradient of the log-likelihood with respect to theta.

Return type:

np.ndarray

hessianLTheta(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes (expected) hessian of the log-likelihood with respect to self.theta, given mu.

Take a look at the ScaledT implementation as an example.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (len(self.theta),len(self.theta)) containing the hessian of the log-likelihood with respect to theta.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, theta: None | ndarray = None) → float

log-probability of \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\). Essentially sum over all elements in the vector returned by the lp() method.

Take a look at the ScaledT implementation as an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

log-likelihood of the model under this family

Return type:

float

lp(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Log-probability of observing every value in \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\).

Take a look at the ScaledT implementation as an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Any additional parameters of the likelihood. Array needs to be of shape (-1,1). When this is set to None, self.theta should be used.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Family(link: Link, twopar: bool, scale: float = None)

Bases: object

Base class to be implemented by Exp. family member.

Parameters:

link (Link) – The link function to be used by the model of the mean of this family.
twopar (bool) – Whether the family has two parameters (mean,scale) to be estimated (i.e., whether the likelihood is a function of two parameters), or only a single one (usually the mean).
scale (float or None, optional) – Known/fixed scale parameter for this family. Setting this to None means the parameter has to be estimated. Must be set to 1 if the family has no scale parameter (i.e., when twopar = False)

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the overall deviance.

Return type:

np.ndarray

V(mu: ndarray) → ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2). Different exponential families allow for different relationships between the variance in our random response variable and the mean of it. For the normal model this is assumed to be constant.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

Deviance of the model under this family

Return type:

float

init_mu(y: ndarray) → ndarray | None

Convenience function to compute an initial \(\boldsymbol{\mu}\) estimate passed to the GAMM/PIRLS estimation routine.

Parameters:: y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing an initial estimate of the mean
Return type:: np.ndarray

llk(y: ndarray, mu: ndarray, **kwargs) → float

log-probability of \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\). Essentially sum over all elements in the vector returned by the lp() method.

Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e., two_par=True) must pass as key-word argument a scale parameter with a default value, e.g.,:

def llk(self, y, mu, scale=1):
   ...

You can check the implementation of the Gaussian Family for an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

log-likelihood of the model under this family

Return type:

float

lp(y: ndarray, mu: ndarray, **kwargs) → ndarray

Log-probability of observing every value in \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\).

Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e., two_par=True) must pass as key-word argument a scale parameter with a default value, e.g.,:

def lp(self, y, mu, scale=1):
   ...

You can check the implementation of the Gaussian Family for an example.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAMLSSFamily(pars: int, links: list[Link])

Bases: object

Base-class to be implemented by families of Generalized Additive Mixed Models of Location, Scale, and Shape (GAMMLSS; Rigby & Stasinopoulos, 2005).

Apart from the required methods, three mandatory attributes need to be defined by the __init__() constructor of implementations of this class. These are required to evaluate the first and second (pure & mixed) derivative of the log-likelihood with respect to any of the log-likelihood’s parameters (alternatively the linear predictors of the parameters - see the description of the d_eta instance variable.). See the variables below.

Optionally, a mean_init_fam attribute can be defined - specfiying a Family member that is fitted to the data to get an initial estimate of the mean parameter of the assumed distribution.

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

pars (int) – Number of parameters of the distribution belonging to the random variables assumed to have generated the observations, e.g., 2 for the Normal: mean and standard deviation.
links ([Link]) – Link functions for each of the parameters of the distribution.

Variables:

d_eta (bool) – A boolean indicating whether partial derivatives of llk are provided with respect to the linear predictor instead of parameters (i.e., the mean), defaults to False (derivatives are provided with respect to parameters)
d1 ([Callable]) – A list holding n_par functions to evaluate the first partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling __init__().
d2 ([Callable]) – A list holding n_par functions to evaluate the second (pure) partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling __init__().
d2m ([Callable]) – A list holding n_par*(n_par-1)/2 functions to evaluate the second mixed partial derivatives of llk with respect to each parameter of the llk in order: d2m[0] = \(\partial l/\partial \mu_1 \partial \mu_2\), d2m[1] = \(\partial l/\partial \mu_1 \partial \mu_3\), …, d2m[n_par-1] = \(\partial l/\partial \mu_1 \partial \mu_{n_{par}}\), d2m[n_par] = \(\partial l/\partial \mu_2 \partial \mu_3\), d2m[n_par+1] = \(\partial l/\partial \mu_2 \partial \mu_4\), … . Needs to be initialized when calling __init__().

get_resid(y: ndarray, *mus: list[ndarray], **kwargs) → ndarray | None

Get standardized residuals for a GAMMLSS model (Rigby & Stasinopoulos, 2005).

Any implementation of this function should return a vector that looks like what could be expected from taking len(y) independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

a vector of shape (-1,1) containing standardized residuals under the current model or None in case residuals are not readily available.

Return type:

np.ndarray | None

init_coef(models: list[Callable]) → ndarray

(Optional) Function to initialize the coefficients of the model.

Can return None , in which case random initialization will be used.

Parameters:: models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.
Returns:: A numpy array of shape (-1,1), holding initial values for all model coefficients.
Return type:: np.ndarray

init_lambda(penalties: list[Callable]) → list[float]

(Optional) Function to initialize the smoothing parameters of the model.

Can return None , in which case random initialization will be used.

Parameters:: penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.
Returns:: A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.
Return type:: [float]

lcp(y: ndarray, *mus: list[ndarray]) → ndarray | None

Log of the cumulative probability of observing a value as extreme or less extreme for every element in \(\mathbf{y}\) under their respective distribution parameterized by mus.

Important: Families for which this function is not implemented can return None.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

a N-dimensional vector of shape (-1,1) containing the log cumulative probability of observing a value as extreme or less extreme for every data-point under the current model or None if this function is not implemented by the specific family.

Return type:

np.ndarray

llk(y: ndarray, *mus: list[ndarray]) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, *mus: list[ndarray]) → ndarray

Log-probability of observing every element in \(\mathbf{y}\) under their respective distribution parameterized by mus.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAMMALS(links: list[Link])

Bases: GAMLSSFamily

Family for a GAMMA GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family follows the Gamma family, in that we assume: \(Y_i \sim \Gamma(\mu_i,\phi_i)\). The difference to the Gamma family is that we now also model \(\phi\) as an additive combination of smooth variables and other parametric terms. The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code for lp() method of the Gamma family for details).

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:: links ([Link]) – Link functions for the mean and standard deviation. Standard would be links=[LOG(),LOG()].

get_resid(y: ndarray, mu: ndarray, scale: ndarray) → ndarray

Get standardized residuals for a Gamma GAMMLSS model (Rigby & Stasinopoulos, 2005).

Essentially, to get a standaridzed residual vector we first have to account for the mean-variance relationship of our RVs (which we also have to do for the Gamma family) - for this we can simply compute deviance residuals again (see Wood, 2017). These should be \(\sim N(0,\phi_i)\) (where \(\phi_i\) is the element in scale for a specific observation) - so if we divide each of those by the observation-specific scale we can expect the resulting standardized residuals to be :math:` sim N(0,1)` if the model is correct.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

A list of standardized residuals that should be ~ N(0,1) if the model is correct.

Return type:

np.ndarray

init_coef(models: list[Callable]) → ndarray

Function to initialize the coefficients of the model.

Fits a GAMM for the mean and initializes all coef. for the scale parameter to 1.

Parameters:: models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.
Returns:: A numpy array of shape (-1,1), holding initial values for all model coefficients.
Return type:: np.ndarray

lcp(y: ndarray, mu: ndarray, scale: ndarray) → ndarray

Log of the cumulative probability of observing every value in y under their respective Gamma with mean = \(\boldsymbol{\mu}\) and scale = \(\boldsymbol{\phi}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log of the cumulative probability of observing each data-point under the current model.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, scale: ndarray) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: ndarray) → ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\) and scale = \(\boldsymbol{\phi}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GAUMLSS(links: list[Link])

Bases: GAMLSSFamily

Family for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family follows the Gaussian family, in that we assume: \(Y_i \sim N(\mu_i,\sigma_i)\). i.e., each of the \(N\) observations is still believed to have been generated from an independent normally distributed RV with observation-specific mean.

The important difference is that the scale parameter, \(\sigma\), is now also observation-specific and modeled as an additive combination of smooth functions and other parametric terms, just like the mean is in a Normal GAM. Note, that this explicitly models heteroscedasticity - the residuals are no longer assumed to be i.i.d samples from \(\sim N(0,\sigma)\), since \(\sigma\) can now differ between residual realizations.

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:: links ([Link]) – Link functions for the mean and standard deviation. Standard would be links=[Identity(),LOG()].

get_resid(y: ndarray, mu: ndarray, sigma: ndarray) → float

Get standardized residuals for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).

Essentially, each residual should reflect a realization of a normal with mean zero and observation-specific standard deviation. After scaling each residual by their observation-specific standard deviation we should end up with standardized residuals that can be expected to be i.i.d \(\sim N(0,1)\) - assuming that our model is correct.

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

A list of standardized residuals that should be ~ N(0,1) if the model is correct.

Return type:

np.ndarray

init_coef(models: list[Callable]) → ndarray

Function to initialize the coefficients of the model.

Fits a GAMM for the mean and initializes all coef. for the standard deviation to 1.

Parameters:: models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.
Returns:: A numpy array of shape (-1,1), holding initial values for all model coefficients.
Return type:: np.ndarray

lcp(y: ndarray, mu: ndarray, sigma: ndarray) → ndarray

Log of the cumulative probability of observing every value in y under their respective Normal with observation-specific mean and standard deviation.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log of the cumulative probability of observing each data-point under the current model.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, sigma: ndarray) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, sigma: ndarray) → ndarray

Log-probability of observing every value in y under their respective Normal with observation-specific mean and standard deviation.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.GSMMFamily(pars: int, links: list[Link], *llkargs)

Bases: object

Base-class for General Smooth “families” as discussed by Wood, Pya, & Säfken (2016). For estimation of mssm.models.GSMM models via L-qEFS (Krause et al., submitted) it is sufficient to implement llk(). gradient() and hessian() can then simply return None. For exact estimation via Newton’s method, the latter two functions need to be implemented and have to return the gradient and hessian at the current coefficient estimate respectively.

Additional parameters needed for likelihood, gradient, or hessian evaluation can be passed along via the llkargs. They are then made available in self.llkargs.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

pars (int) – Number of parameters of the likelihood.
links ([Link]) – List of Link functions for each parameter of the likelihood, e.g., links=[Identity(),LOG()].

Variables:

extra_coef (int, optional) – Number of extra coefficients required by specific family or None. By default set to None and changed to int by specific families requiring this.
llkargs (list[any]) – A list holding any extra arguments passed to the constructor via llkargs.

get_resid(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray | None], Xs: list[csc_array | None], **kwargs) → ndarray | None

Get standardized residuals for a GSMM model.

Any implementation of this function should return a vector that looks like what could be expected from taking independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along via kwargs.

Note: Families for which no residuals are available can return None.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys (list[np.ndarray | None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs (list[scp.sparse.csc_array | None]) – A list of sparse model matrices per likelihood parameter. Might contain None at indices for matrices which were flagged as “do not build” via the build_mat argument of the mssm.models.GSMM.fit() method.

Returns:

a vector of shape (-1,1) containing standardized residuals under the current model (Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation)) or None in case residuals are not readily available.

Return type:

np.ndarray | None

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray | None], Xs: list[csc_array | None]) → ndarray

Function to evaluate the gradient of the llk at current coefficient estimate coef.

By default relies on numerical differentiation as implemented in scipy to approximate the Gradient from the implemented log-likelihood function. See the link in the references for more details.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
scipy.optimize.approx_fprime: at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys (list[np.ndarray | None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs (list[scp.sparse.csc_array | None]) – A list of sparse model matrices per likelihood parameter. Might contain None at indices for matrices which were flagged as “do not build” via the build_mat argument of the mssm.models.GSMM.fit() method.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray | None], Xs: list[csc_array | None]) → csc_array | None

Function to evaluate the hessian of the llk at current coefficient estimate coef.

Only has to be implemented if full Newton is to be used to estimate coefficients. If the L-qEFS update by Krause et al. (in preparation) is to be used insetad, this method does not have to be implemented.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
scipy.optimize.approx_fprime: at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys (list[np.ndarray | None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs (list[scp.sparse.csc_array | None]) – A list of sparse model matrices per likelihood parameter. Might contain None at indices for matrices which were flagged as “do not build” via the build_mat argument of the mssm.models.GSMM.fit() method.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

init_coef(models: list[Callable]) → ndarray

(Optional) Function to initialize the coefficients of the model.

Can return None , in which case random initialization will be used.

Parameters:: models ([mssm.models.GAMM]) – A list of mssm.models.GAMM’s, - each based on one of the formulas provided to a model.
Returns:: A numpy array of shape (-1,1), holding initial values for all model coefficients.
Return type:: np.ndarray

init_lambda(penalties: list[Callable]) → list[float]

(Optional) Function to initialize the smoothing parameters of the model.

Can return None , in which case random initialization will be used.

Parameters:: penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.
Returns:: A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.
Return type:: np.ndarray

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray | None], Xs: list[csc_array | None]) → float

log-probability of data under given model.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys (list[np.ndarray | None]) – List containing the vectors of observations (each of shape (-1,1)) passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs (list[scp.sparse.csc_array | None]) – A list of sparse model matrices per likelihood parameter. Might contain None at indices for matrices which were flagged as “do not build” via the build_mat argument of the mssm.models.GSMM.fit() method.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

class mssm.src.python.exp_fam.Gamma(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float = None)

Bases: Family

Gamma Family.

We assume: \(Y_i \sim \Gamma(\mu_i,\phi)\). The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code for lp() method for details).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) → ndarray

Variance function for the Gamma family.

The variance of random variable \(Y\) is proportional to it’s mean raised to the second power.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
Returns:: mu raised to the power of 2
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, scale: float = 1) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: float = 1) → ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Gaussian(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Identity object>, scale: float = None)

Bases: Family

Normal/Gaussian Family.

We assume: \(Y_i \sim N(\mu_i,\sigma)\) - i.e., each of the \(N\) observations is generated from a normally distributed RV with observation-specific mean and shared scale parameter \(\sigma\). Equivalent to the assumption that the observed residual vector - the difference between the model prediction and the observed data - should look like what could be expected from drawing \(N\) independent samples from a Normal with mean zero and standard deviation equal to \(\sigma\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical identity link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) → ndarray

Variance function for the Normal family.

Not really a function since the link between variance and mean of the RVs is assumed constant for this model.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
Returns:: a N-dimensional vector of 1s
Return type:: np.ndarray
Returns:: a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, scale: float = 1) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) sigma (variance) parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: float = 1) → ndarray

Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Normal with mean = \(\boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) sigma (variance) parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.Identity

Bases: Link

Identity Link function. \(\boldsymbol{\mu}=\boldsymbol{\eta}\) and so this link is trivial.

dy1(mu: ndarray) → ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) → ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) → ndarray

Canonical link for normal distribution with \(\boldsymbol{\eta} = \boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) → ndarray

For the identity link, \(\boldsymbol{\eta} = \boldsymbol{\mu}\), so the inverse is also just the identity. see Faraway (2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.InvGauss(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float | None = None)

Bases: Family

Inverse Gaussian Family.

We assume: \(Y_i \sim IG(\mu_i,\phi)\). The Inverse Gaussian distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and scale parameter - called \(\nu\) and \(\lambda\) respectively (see the scipy implementation). We can simply set \(\nu=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017). Wood (2017) shows that \(\phi=1/\lambda\), so this provides \(\lambda=1/\phi\)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.invgauss.html

Parameters:

link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) → ndarray

Variance function for the Inverse Gaussian family.

The variance of random variable \(Y\) is proportional to it’s mean raised to the third power.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
Returns:: mu raised to the power of 3
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

llk(y: ndarray, mu: ndarray, scale: float = 1) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray, scale: float = 1) → ndarray

Log-probability of observing every value in \(\mathbf{y}\) under their respective inverse Gaussian with mean = \(\boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.LOG

Bases: Link

Log Link function. \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).

dy1(mu: ndarray) → ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) → ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) → ndarray

Non-canonical link for Gamma distribution with \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) → ndarray

For the log link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu})\), so \(exp(\boldsymbol{\eta})=\boldsymbol{\mu}\). see Faraway (2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.LOGb(b: float)

Bases: Link

Log + b Link function. \(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).

Parameters:: b (float) – The constant to add to \(\mu\) before taking the log.

dy1(mu: ndarray) → ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) → ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) → ndarray

\(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) → ndarray

For the logb link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu} + b)\), so \(exp(\boldsymbol{\eta})-b =\boldsymbol{\mu}\)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.Link

Bases: object

Link function base class. To be implemented by any link functiion used for GAMMs and GAMMLSS models. Only links used by GAMLSS models require implementing the dy2 function. Note, that care must be taken that every method returns only valid values. Specifically, no returned element may be numpy.nan or numpy.inf.

dy1(mu: ndarray) → ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\) Needed for Fisher scoring/PIRLS (Wood, 2017).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) → ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) → ndarray

Link function \(f()\) mapping mean \(\boldsymbol{\mu}\) of an exponential family to the model prediction \(\boldsymbol{\eta}\), so that \(f(\boldsymbol{\mu}) = \boldsymbol{\eta}\). See Wood (2017, 3.1.2) and Faraway (2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) → ndarray

Inverse of the link function mapping \(\boldsymbol{\eta} = f(\boldsymbol{\mu})\) to the mean \(fi(\boldsymbol{\eta}) = fi(f(\boldsymbol{\mu})) = \boldsymbol{\mu}\). see Faraway (2016) and the Link.f function.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.Logit

Bases: Link

Logit Link function, which is canonical for the binomial model. \(\boldsymbol{\eta}\) = log-odds of success.

dy1(mu: ndarray) → ndarray

First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017):

\[ \begin{align}\begin{aligned}f(\mu) = log(\mu / (1 - \mu))\\f(\mu) = log(\mu) - log(1 - \mu)\\\partial f(\mu)/ \partial \mu = 1/\mu - 1/(1 - \mu)\end{aligned}\end{align} \]

Faraway (2016) simplifies this to: \(\partial f(\mu)/ \partial \mu = 1 / (\mu - \mu^2) = 1/ ((1-\mu)\mu)\)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

dy2(mu: ndarray) → ndarray

Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

f(mu: ndarray) → ndarray

Canonical link for binomial distribution with \(\boldsymbol{\mu}\) holding the probabilities of success, so that the model prediction \(\boldsymbol{\eta}\) is equal to the log-odds.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

fi(eta: ndarray) → ndarray

For the logit link and the binomial model, \(\boldsymbol{\eta}\) = log-odds, so the inverse to go from \(\boldsymbol{\eta}\) to \(\boldsymbol{\mu}\) is \(\boldsymbol{\mu} = exp(\boldsymbol{\eta}) / (1 + exp(\boldsymbol{\eta}))\). see Faraway (2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.

class mssm.src.python.exp_fam.MULNOMLSS(pars: int)

Bases: GAMLSSFamily

Family for a Multinomial GAMMLSS model (Rigby & Stasinopoulos, 2005).

This Family assumes that each observation \(y_i\) corresponds to one of \(K\) classes (labeled as 0, …, \(K\)) and reflects a realization of an independent RV \(Y_i\) with observation-specific probability mass function defined over the \(K\) classes. These \(K\) probabilities - that \(Y_i\) takes on class 1, …, \(K\) - are modeled as additive combinations of smooth functions of covariates and other parametric terms.

As an example, consider a visual search experiment where \(K\) distractors are presented on a computer screen together with a single target and subjects are instructed to find the target and fixate it. With a Multinomial model we can estimate how the probability of looking at each of the \(K\) stimuli on the screen changes (smoothly) over time and as a function of other predictor variables of interest (e.g., contrast of stimuli, dependening on whether parfticipants are instructed to be fast or accurate).

References:

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:: pars (int) – K-1, i.e., 1- Number of classes or the number of linear predictors.

get_resid(y: ndarray, *mus: list[ndarray]) → None

Placeholder function for residuals of a Multinomial model - yet to be implemented.

Parameters:

y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

Currently None - since no residuals are implemented

llk(y: ndarray, *mus: list[ndarray])

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, *mus: list[ndarray]) → ndarray

Log-probability of observing class k under current model.

Our DV consists of K classes but we essentially enforce a sum-to zero constraint on the DV so that we end up modeling only K-1 (non-normalized) probabilities of observing class k (for all k except k==0) as an additive combination of smooth functions of our covariates and other parametric terms. The probability of observing class 0 as well as the normalized probabilities of observing each other class can readily be computed from these K-1 non-normalized probabilities. This is explained quite well on Wikipedia (see refs).

Specifically, the probability of the outcome being class k is simply:

\(p(Y_i == k) = \mu_k / (1 + \sum_j^{K-1} \mu_j)\) where \(\mu_k\) is the aforementioned non-normalized probability of observing class \(k\) - which is simply set to 1 for class \(k==0\) (this follows from the sum-to-zero constraint; see Wikipedia).

So, the log-prob of the outcome being class k is:

\(log(p(Y_i == k)) = log(\mu_k) - log(1 + \sum_j^{K-1} \mu_j)\)

References:

Wikipedia. https://en.wikipedia.org/wiki/Multinomial_logistic_regression

gamlss.dist on Github (see Rigby & Stasinopoulos, 2005). https://github.com/gamlss-dev/gamlss.dist/blob/main/R/MN4.R

Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.

Parameters:

y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.MultiGauss(pars: int, links: list[Link])

Bases: GSMMFamily

Family for multivariate additive models - a type of General Smooth model as discussed by Wood, Pya, & Säfken (2016).

Implementation based on Supplementary materials H in Wood, Pya, & Säfken (2016). Currently, these models can only be estimated via the L-qEFS update in mssm.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate data
sim_dat = sim16(500,seed=1134,correlate=True)

# We need formulas for each mean!
formulas = [
    Formula(lhs("y0"), [i(), f(["x0"])], data=sim_dat),
    Formula(lhs("y1"), [i(), f(["x1"]), f(["x2"])], data=sim_dat),
    Formula(lhs("y2"), [i(), f(["x3"])], data=sim_dat)
]

# Now define the model...
model = GSMM(formulas, MultiGauss(3,[Identity() for _ in range(3)]))

# ... and fit!
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    model.fit(method='qEFS')

# Get overview:
model.print_parametric_terms()
model.print_smooth_terms(p_values=True)

# And plot smooth function estimates for mean at index 1
plot(model,dist_par=1)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:

pars (int) – Number of means (i.e., dimension of the multivariate Gaussian)
links (list[Link]) – List of link functions for the models of the means. For example [Identity() for _ in range(pars)].

getR(theta: ndarray) → tuple[ndarray, float]

Returns transpose of Cholesky of precision matrix of multivariate Gaussian.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate data
sim_dat = sim16(500,seed=1134,correlate=True)

# We need formulas for each mean!
formulas = [
    Formula(lhs("y0"), [i(), f(["x0"])], data=sim_dat),
    Formula(lhs("y1"), [i(), f(["x1"]), f(["x2"])], data=sim_dat),
    Formula(lhs("y2"), [i(), f(["x3"])], data=sim_dat)
]

# Now define the model...
model = GSMM(formulas, MultiGauss(3,[Identity() for _ in range(3)]))

# ... and fit!
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    model.fit(method='qEFS')

# Extract R
split_coef = np.split(model.coef,model.coef_split_idx)
theta = split_coef[-1].flatten()
R,log_det = model.family.getR(theta)

# R is the transpose of the Cholesky of the precision matrix. So to get the
# Covariance matrix of the multivariate Gaussian we need to compute:
Sigma = np.linalg.inv(R.T@R)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:: theta (np.ndarray) – Flattened array holding inverses of log(variance) and co-variance parameters
Returns:: Transpose of Cholesky as a numpy array and log-determinant of Cholesky
Return type:: tuple[np.ndarray,float]

get_resid(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array | None], mean: int | None = None) → ndarray

Computes Deviance residuals of a multivariate normal model given coefficients in the additive models of the mean and the log(variance) and co-variance parameters.

If the model is correct, each column in the returned matrix should look like an i.i.d sample of size N from \(N(0,1)\).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate data
sim_dat = sim16(500,seed=1134,correlate=True)

# We need formulas for each mean!
formulas = [
    Formula(lhs("y0"), [i(), f(["x0"])], data=sim_dat),
    Formula(lhs("y1"), [i(), f(["x1"]), f(["x2"])], data=sim_dat),
    Formula(lhs("y2"), [i(), f(["x3"])], data=sim_dat)
]

# Now define the model...
model = GSMM(formulas, MultiGauss(3,[Identity() for _ in range(3)]))

# ... and fit!
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    model.fit(method='qEFS')

# Can now extract the residual matrix
res = model.get_resid()

# The ``get_resid`` method supports a ``mean`` key-word to extract univariate residuals
# for an individual mean. We can use mssmViz's plot function to visualize these
# for example for mean at index 2:
plot_val(model,gsmm_kwargs={"mean":2})

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!). Note the last int(pars * (pars + 1) / 2) elements contain the log(variance) and co-variance parameters.
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each mean and a final sub-set containing all log(variance) and co-variance parameters.
ys ([np.ndarray]) – List containing the vectors of observations.
Xs ([scp.sparse.csc_array | None]) – A list containing the sparse model matrices associated with the models of the means. Note, this implementation allows to make use of the build_mat argument of the GSMM.fit() method. Specifically, for means that have the same predictor structure as the first mean we can set build_mat[idx] = False. The code then automatically assigns X[idx] = X[0]. See the GSMM.fit() documentation for more details.
mean (int | None, optional) – Optionally, the index of a specific mean for which to extract the residuals. This allows to extract univariate residuals for a specific mean. Setting this to None means the (N * self.n_par) residual matrix is returned where N is the number of observations. Defaults to None

Returns:

Residual matrix. Will be a residual vector if mean is not set to None.

Return type:

np.ndarray

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array | None]) → ndarray

Computes gradient of a multivariate normal containing partial derivatives with respect to coefficients in the additive models of the mean and the log(variance) and co-variance parameters.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!). Note the last int(pars * (pars + 1) / 2) elements contain the log(variance) and co-variance parameters.
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each mean and a final sub-set containing all log(variance) and co-variance parameters.
ys ([np.ndarray]) – List containing the vectors of observations.
Xs ([scp.sparse.csc_array | None]) – A list containing the sparse model matrices associated with the models of the means. Note, this implementation allows to make use of the build_mat argument of the GSMM.fit() method. Specifically, for means that have the same predictor structure as the first mean we can set build_mat[idx] = False. The code then automatically assigns X[idx] = X[0]. See the GSMM.fit() documentation for more details.

Returns:

The gradient at the current parameters estimates in a numpy array of shape (-1,1)

Return type:

np.ndarray

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array | None]) → float

Computes the log-likelihood under a multivariate normal given coefficients in the additive models of the mean and the log(variance) and co-variance parameters.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!). Note the last int(pars * (pars + 1) / 2) elements contain the log(variance) and co-variance parameters.
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each mean and a final sub-set containing all log(variance) and co-variance parameters.
ys ([np.ndarray]) – List containing the vectors of observations.
Xs ([scp.sparse.csc_array | None]) – A list containing the sparse model matrices associated with the models of the means. Note, this implementation allows to make use of the build_mat argument of the GSMM.fit() method. Specifically, for means that have the same predictor structure as the first mean we can set build_mat[idx] = False. The code then automatically assigns X[idx] = X[0]. See the GSMM.fit() documentation for more details.

class mssm.src.python.exp_fam.Poisson(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>)

Bases: Family

Poisson Family.

We assume: \(Y_i \sim P(\lambda)\). We can simply set \(\lambda=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017) and treat the scale parameter of a GAMM (\(\phi\)) as fixed/known at 1.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html

Parameters:: link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.

D(y: ndarray, mu: ndarray) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

A N-dimensional vector containing the contribution of each data-point to the overall model deviance.

Return type:

np.ndarray

V(mu: ndarray) → ndarray

Variance function for the Poisson family.

The variance of random variable \(Y\) is proportional to it’s mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
Returns:: mu
Return type:: np.ndarray

dVy1(mu: ndarray) → ndarray

The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:: mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
Return type:: np.ndarray

deviance(y: ndarray, mu: ndarray) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

The model deviance.

Return type:

float

init_mu(y: ndarray) → ndarray

Function providing initial \(\boldsymbol{\mu}\) vector for Poisson GAMM.

We shrink extreme observed counts towards mean.

Parameters:: y (np.ndarray) – A numpy array containing each observation.
Returns:: a N-dimensional vector of shape (-1,1) containing an intial estimate of the mean of the response variables
Return type:: np.ndarray

llk(y: ndarray, mu: ndarray) → float

log-probability of data under given model. Essentially sum over all elements in the vector returned by the lp() method.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1

Returns:

The log-probability of observing all data under the current model.

Return type:

float

lp(y: ndarray, mu: ndarray) → ndarray

Log-probability of observing every value in \(\mathbf{y}\) under their respective Poisson with mean = \(\boldsymbol{\mu}\).

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.

Returns:

a N-dimensional vector containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

class mssm.src.python.exp_fam.PropHaz(ut: ndarray, r: ndarray)

Bases: GSMMFamily

Family for proportional Hazard model - a type of General Smooth model as discussed by Wood, Pya, & Säfken (2016).

Based on Supplementary materials G in Wood, Pya, & Säfken (2016). The dependent variable passed to the mssm.src.python.formula.Formula needs to hold delta indicating whether the event was observed or not (i.e., only values in {0,1}).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False)

# Prep everything for prophaz model
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Can plot the estimated effects on the scale of the
# linear predictor (i.e., log hazard) via mssmViz
plot(model)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:

ut (np.ndarray) – Unique event time vector (each time represnted as int) as described by WPS (2016), holding unique event times in decreasing order.
r (np.ndarray) – Index vector as described by WPS (2016), holding for each data-point (i.e., for each row in Xs[0]) the index to it’s corresponding event time in ut.

get_baseline_hazard(coef: ndarray, delta: ndarray, Xs: list[csc_array]) → ndarray

Get the cumulative baseline hazard function as defined by Wood, Pya, & Säfken (2016).

The function is evaluated for all k unique event times that were available in the data.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,
  correlate=False)

# Prep everything for prophaz model
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Now get cumulative baseline hazard estimate
H = PropHaz_fam.get_baseline_hazard(model.coef,
  sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat],model.get_mmat())

# And plot it
plt.plot(ut,H)
plt.xlabel("Time")
plt.ylabel("Cumulative Baseline Hazard")

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).
Xs ([scp.sparse.csc_array]) – The list of model matrices (here holding a single model matrix) obtained from mssm.models.GAMMLSS.get_mmat().
delta (np.ndarray) – Dependent variable passed to mssm.src.python.formula.Formula(), holds (for each row in Xs[0]) a value in {0,1}, indicating whether for that observation the event was observed or not.

Returns:

numpy array, holding k baseline hazard function estimates

Return type:

np.ndarray

get_resid(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array], resid_type: str = 'Martingale', reorder: ndarray | None = None) → ndarray

Get Martingale or Deviance residuals for a proportional Hazard model.

See the PropHaz.get_survival() function for examples.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.
ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.
Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.
resid_type (str, optional) – The type of residual to compute, supported are “Martingale” and “Deviance”.
reorder (np.ndarray) – A flattened np.ndarray containing for each data point the original index in the data-set before sorting. Used to re-order the residual vector into the original order. If this is set to None, the residual vector is not re-ordered and instead returned in the order of the sorted data-frame passed to the model formula.

Returns:

The residual vector of shape (-1,1)

Return type:

np.ndarray

get_survival(coef: ndarray, Xs: list[csc_array], delta: ndarray, t: int, x: ndarray | csc_array, V: csc_array, compute_var: bool = True) → tuple[ndarray, ndarray | None]

Compute survival function + variance at time-point t, given k optional covariate vector(s) x as defined by Wood, Pya, & Säfken (2016).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

# Simulate some data
sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,
  correlate=False)

# Prep everything for prophaz model

# Create index variable for residual ordering
sim_dat["index"] = np.arange(sim_dat.shape[0])

# Now sort
sim_dat = sim_dat.sort_values(['y'],ascending=[False])
sim_dat = sim_dat.reset_index(drop=True)
print(sim_dat.head(),np.mean(sim_dat["delta"]))

u,inv = np.unique(sim_dat["y"],return_inverse=True)
ut = np.flip(u)
r = np.abs(inv - max(inv))
res_idx = np.argsort(sim_dat["index"].values)

# Now specify formula and model
sim_formula_m = Formula(lhs("delta"),
                        [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                        data=sim_dat)

PropHaz_fam = PropHaz(ut,r)
model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam)

# Fit with Newton
model.fit()

# Now get estimate of survival function and see how it changes with x0
new_dat = pd.DataFrame({"x0":np.linspace(0,1,5),
                        "x1":np.linspace(0,1,5),
                        "x2":np.linspace(0,1,5),
                        "x3":np.linspace(0,1,5)})

# Get model matrix using only f0
_,Xt,_ = model.predict(use_terms=[0],n_dat=new_dat)

# Now iterate over all time-points and obtain the predicted survival
# function + standard error estimate
# for all 5 values of x0:
S = np.zeros((len(ut),Xt.shape[0]))
VS = np.zeros((len(ut),Xt.shape[0]))
for idx,ti in enumerate(ut):

   # Su and VSu are of shape (5,1) here but will generally be of shape (Xt.shape[0],1)
   Su,VSu = PropHaz_fam.get_survival(model.coef,model.get_mmat(),
      sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat],
      ti,Xt,model.lvi.T@model.lvi)

   S[idx,:] = Su.flatten()
   VS[idx,:] = VSu.flatten()

# Now we can plot the estimated survival functions + approximate cis:
for xi in range(Xt.shape[0]):

   plt.fill([*ut,*np.flip(ut)],
            [*(S[:,xi] + 1.96*VS[:,xi]),*np.flip(S[:,xi] - 1.96*VS[:,xi])],alpha=0.5)
   plt.plot(ut,S[:,xi],label=f"x0 = {new_dat["x0"][xi]}")
plt.legend()
plt.xlabel("Time")
plt.ylabel("Survival")
plt.show()

# Note how the main effect of x0 is reflected in the plot above:
plot(model,which=[0])

# Residual plots can be created via `plot_val` from `mssmViz` - by default Martingale
# residuals are returned (see Wood, 2017)
fig = plt.figure(figsize=(10,3),layout='constrained')
axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2})
# Note the use of `gsmm_kwargs_pred={}` to ensure that the re-ordering is not applied
# to the plot against predicted values
plot_val(model,gsmm_kwargs={"reorder":res_idx},gsmm_kwargs_pred={},ar_lag=25,axs=axs)

# Can also get Deviance residuals:
fig = plt.figure(figsize=(10,3),layout='constrained')
axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2})

plot_val(model,
  gsmm_kwargs={"reorder":res_idx,"resid_type":"Deviance"},
  gsmm_kwargs_pred={"resid_type":"Deviance"},ar_lag=25,axs=axs)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).
Xs ([scp.sparse.csc_array]) – The list of model matrices (here holding a single model matrix) obtained from mssm.models.GAMMLSS.get_mmat().
delta (np.ndarray) – Dependent variable passed to mssm.src.python.formula.Formula(), holds (for each row in Xs[0]) a value in {0,1}, indicating whether for that observation the event was observed or not.
t (int) – Time-point at which to evaluate the survival function.
x (np.ndarray or scp.sparse.csc_array) – Optional vector (or matrix - can also be sparse) of covariate values. Needs to be of shape (k,len(coef)).
V (scp.sparse.csc_array) – Estimated Co-variance matrix of posterior for coef
compute_var (bool, optional) – Whether to compue the variance estimate of the survival as well. Otherwise None will be returned as the second argument.

Returns:

Two arrays, the first holds k survival function estimates, the latter holds k variance estimates for each of the survival function estimates. The second argument will be None instead if compute_var = False.

Return type:

tuple[np.ndarray, np.ndarray | None]

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → ndarray

Gradient as defined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.
ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.
Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → csc_array

Hessian as defined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.
ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.
Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

init_coef(models: list[Callable]) → ndarray

Function to initialize the coefficients of the model.

Parameters:: models ([mssm.models.GAMM]) – A list of GAMMs, - each based on one of the formulas provided to a model.
Returns:: A numpy array of shape (-1,1), holding initial values for all model coefficients.
Return type:: np.ndarray

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → float

Log-likelihood function as defined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.
ys ([np.ndarray]) – List containing the delta vector at the first and only index - see description of the model family.
Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

class mssm.src.python.exp_fam.ScaledT(link, theta=None, min_df=3)

Bases: ExtendedFamily

This class implements the scaled T family, based on the implementation in mgcv by Natalya Pya.

Specifically, we assume that \((y_i-\mu_i)/\phi) \sim t_{\nu}\), so that \(\phi\) takes on the role of the scale parameter and \(\nu\) are the degrees of freedom of the T-distribution. Note, that as \(\nu \to \infty\), this family will behave like a Normal distribution with standard deviation \(\phi\).

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *

# Simulate some data
sim_fit_dat = sim3(n=500, scale=2, c=0.0, family=Gaussian(), seed=1)

# Specify formula
sim_fit_formula = Formula(
     lhs("y"),
     [i(), f(["x0"]), f(["x1"]), f(["x2"]), f(["x3"])],
     data=sim_fit_dat,
 )

 # Now fit a model assuming a scaled T family
 model = GAMM(sim_fit_formula, ScaledT(link=Identity()))
 model.fit()

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

link (Link) – The link function to be used by the model of the mean of this family.
theta (None | np.ndarray, optional) – An optional array containing an estimate of the log of the scale parameter and an estimate of the log of \(\nu\). Setting this to None means both parameters have to be estimated.

Variables:

theta (None | np.ndarray) – The latest estimate of theta. Calls to GAMM.fit() will overwrite this attribute if the initial value for theta passed to the constructor was None. Defaults to np.array([0.5, 10]).reshape(-1, 1) or whatever is passed to the constructor for theta.

D(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016).

Deviance contributions are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the overall deviance.

Return type:

np.ndarray

Ed2Ddmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes expected second derivative of the deviance with respect to mu. This function is used during fitting, i.e., estimation is based on Fisher weights.

Derivatives are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the expected second derivatives of the deviance with respect to mu per observation.

Return type:

np.ndarray

V(mu: ndarray, theta: None | ndarray = None) → ndarray

The variance function (of the mean; see Wood, 2017, 3.1.2) for the scaled T family.

The variance function is computed as in the scat implementation available in mgcv by Natalya Pya. Specifically, function returns \(\phi^2 * \nu / (\nu - 2)\) for each element in mu.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean

Return type:

np.ndarray

d2Ddmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes second derivative of the deviance with respect to mu. This function is only called after estimation has been completed to get the observed hessian at the final coefficient estimate.

Derivatives are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the second derivative of the deviance with respect to mu for each observation.

Return type:

np.ndarray

dDdmu(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes derivative of the deviance with respect to mu.

Derivatives are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (len(mu),1) containing the derivatives of the deviance with respect to mu for each observation.

Return type:

np.ndarray

deviance(y: ndarray, mu: ndarray, theta: None | ndarray = None) → float

Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).

Deviance is computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

Deviance of the model under this family

Return type:

float

gradientLTheta(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes gradient of the log-likelihood with respect to self.theta, given mu.

Gradient is based on the derivatives of the deviance and saturated log-likelihood with respect to theta. The latter are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (len(self.theta),1) containing the gradient of the log-likelihood with respect to theta.

Return type:

np.ndarray

hessianLTheta(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Computes hessian of the log-likelihood with respect to self.theta, given mu.

Hessian is based on the derivatives of the deviance and saturated log-likelihood with respect to theta. The latter are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (len(self.theta),len(self.theta)) containing the hessian of the log-likelihood with respect to theta.

Return type:

np.ndarray

llk(y: ndarray, mu: ndarray, theta: None | ndarray = None) → float

log-probability of \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\). Essentially sum over all elements in the vector returned by the lp() method.

Log-likelihood is computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

log-likelihood of the model under this family

Return type:

float

lp(y: ndarray, mu: ndarray, theta: None | ndarray = None) → ndarray

Log-probability of observing every value in \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\).

Log-likelihood contributions are computed as in the scat implementation available in mgcv by Natalya Pya.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scat Family implemented in mgcv by Natalya Pya, see: https://github.com/cran/mgcv/blob/master/R/efam.r#L2195

Parameters:

y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
theta (None | np.ndarray, optional) – Optionally, the latest estimate of theta, containing an estimate of the log of the scale parameter and the log of the degrees of freedom parameter. Array needs to be of shape (-1,1). When this is set to None, self.theta is used.

Returns:

a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.

Return type:

np.ndarray

mssm.src.python.exp_fam.est_scale(res: ndarray, rows_X: int, total_edf: float) → float

Scale estimate from Wood & Fasiolo (2017).

Refereces:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.

Parameters:

res (np.ndarray) – A numpy array containing the difference between the model prediction and the (pseudo) data.
rows_X (int) – The number of observations collected.
total_edf (float) – The expected degrees of freedom for the model.

mssm.src.python.file_loading module

mssm.src.python.file_loading.clear_cache(cache_dir: str, should_cache: bool) → None

Clear up cache for row-subsets of model matrix.

Parameters:

cache_dir (str) – path to cache directory
should_cache (bool) – whether or not the directory should actually be created

mssm.src.python.file_loading.read_cor_cov_single(y: str, x: str, file: str, file_loading_kwargs: dict) → ndarray

Read values of covariate x from file correcting for NaNs in y.

Parameters:

y (str) – name of covariate potentially having NaNs
x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x for which y is not NaN

Return type:

np.ndarray

mssm.src.python.file_loading.read_cov(y: str, x: str, files: list[str], nc: int, file_loading_kwargs: dict) → ndarray

Read values of covariate x from files correcting for NaNs in y.

Parameters:

y (str) – name of covariate potentially having NaNs
x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x for which y is not NaN

Return type:

np.ndarray

mssm.src.python.file_loading.read_cov_no_cor(x: str, files: list[str], nc: int, file_loading_kwargs: dict) → ndarray

Read values of covariate x from files.

Parameters:

x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x

Return type:

np.ndarray

mssm.src.python.file_loading.read_dtype(column: str, file: str, file_loading_kwargs: dict) → dtype

Read datatype of variable column in file.

Parameters:

column (str) – Name of covariate
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

Datatype (numpy) of colum

Return type:

np.dtype

mssm.src.python.file_loading.read_no_cor_cov_single(x: str, file: str, file_loading_kwargs: dict) → ndarray

Read values of covariate x from file.

Parameters:

x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding values in x

Return type:

np.ndarray

mssm.src.python.file_loading.read_unique(x: str, files: list[str], nc: int, file_loading_kwargs: dict) → ndarray

Read unique values of covariate x from files.

Parameters:

x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding unique values

Return type:

np.ndarray

mssm.src.python.file_loading.read_unique_single(x: str, file: str, file_loading_kwargs: dict) → ndarray

Read unique values of covariate x from file.

Parameters:

x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.

Returns:

numpy array holding unique values

Return type:

np.ndarray

mssm.src.python.file_loading.setup_cache(cache_dir: str, should_cache: bool) → None

Set up cache for row-subsets of model matrix.

Parameters:

cache_dir (str) – path to cache directory
should_cache (bool) – whether or not the directory should actually be created

Raises:

ValueError – if the directory already exists

mssm.src.python.formula module

class mssm.src.python.formula.Formula(lhs: lhs, terms: list[GammTerm], data: DataFrame, series_id: str | None = None, codebook: dict | None = None, print_warn: bool = True, keep_cov: bool = False, find_nested: bool = True, file_paths: list[str] = [], file_loading_nc: int = 1, file_loading_kwargs: dict = {'header': 0, 'index_col': False})

Bases: object

The formula of a regression equation.

Note: The class implements multiple get_* functions to access attributes stored in instance variables. The get functions always return a copy of the instance variable and the results are thus safe to manipulate.

Examples:

from mssm.models import *
from mssmViz.sim import *

from mssm.src.python.formula import build_penalties,build_model_matrix

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# Now with a tensor smooth
formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],data=Binomdat)

# Now with a tensor smooth anova style
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x0","x1"]),f(["x2"]),f(["x3"])],
  data=Binomdat)


######## Stream data from file and set up custom codebook #########

file_paths = [
  f'https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat_cond_{cond}.csv'         for cond in ["a","b"]
  ]

# Set up specific coding for factor 'cond'
codebook = {'cond':{'a': 0, 'b': 1}}

formula = Formula(lhs=lhs("y"), # The dependent variable - here y!
                  terms=[i(), # The intercept, a
                           l(["cond"]), # For cond='b'
                           # to-way interaction between time and cond;
                           # one smooth over time per cond level
                           f(["time"],by="cond"),
                           # to-way interaction between x and cond;
                           # one smooth over x per cond level
                           f(["x"],by="cond"),
                           # three-way interaction
                           f(["time","x"],by="cond"),
                           # Random non-linear effect of time - one
                           # smooth per level of factor sub
                           fs(["time"],rf="sub")],
                  data=None, # No data frame!
                  file_paths=file_paths, # Just a list with paths to files.
                  print_warn=False,
                  codebook=codebook)

# Alternative:
formula = Formula(lhs=lhs("y"),
                        terms=[i(),
                              l(["cond"]),
                              f(["time"],by="cond"),
                              f(["x"],by="cond"),
                              f(["time","x"],by="cond"),
                              fs(["time"],rf="sub")],
                        data=None,
                        file_paths=file_paths,
                        print_warn=False,
                        keep_cov=True, # Keep encoded data structure in memory
                        codebook=codebook)

# preparing for ar1 model (with resets per time-series) and data type requirements

dat = pd.read_csv(
  'https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv'
  )

# mssm requires that the data-type for variables used as factors is 'O'=object
dat = dat.astype({'series': 'O',
                  'cond':'O',
                  'sub':'O',
                  'series':'O'})

formula = Formula(lhs=lhs("y"),
                  terms=[i(),
                           l(["cond"]),
                           f(["time"],by="cond"),
                           f(["x"],by="cond"),
                           f(["time","x"],by="cond")],
                  data=dat,
                  print_warn=False,
                  series_id='series') # 'series' variable identifies individual time-series

Parameters:

lhs – The lhs object defining the dependent variable.
terms ([GammTerm]) – A list of the terms which should be added to the model. See mssm.src.python.terms for info on which terms can be added.
data (pd.DataFrame or None) – A pandas dataframe (with header!) of the data which should be used to estimate the model. The variable specified for lhs as well as all variables included for a term in terms need to be present in the data, otherwise the call to Formula will throw an error.
series_id (str, optional) – A string identifying the individual experimental units. Usually a unique trial identifier. Only necessary if approximate derivative computations are to be utilized for random smooth terms or if you need to estimate an ‘ar1’ model for multiple time-series data.
codebook (dict or None) – Codebook - keys should correspond to factor variable names specified in terms. Values should again be a dict, with keys for each of K levels of the factor and value corresponding to an integer in {0,K}.
print_warn (bool,optional) – Whether warnings should be printed. Useful when fitting models from terminal. Defaults to True.
keep_cov (bool,optional) – Whether or not the internal encoding structure of all predictor variables should be created when forming \(\mathbf{X}^T\mathbf{X}\) iteratively instead of forming \(\mathbf{X}\) directly. Can speed up estimation but increases memory footprint. Defaults to True.
find_nested (bool,optional) – Whether or not to check for nested smooth terms. This only has an effect if you include at least one smooth term with more than two variables. Additionally, this check is often not necessary if you correctly use the te key-word of smooth terms and ensure that the marginals used to construct ti smooth terms have far fewer basis functions than the “main effect” univariate smooths. Thus, if you know what you’re doing and you’re working with large models, you might want to disable this (i.e., set to False) because this check can get quite expensive for larger models. Defaults to True.
file_paths ([str],optional) – A list of paths to .csv files from which \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Setting this to a non-empty list will prevent fitting X as a whole. data should then be set to None. Defaults to an empty list.
file_loading_nc (int,optional) – How many cores to use to a) accumulate \(\mathbf{X}\) in parallel (if data is not None and file_paths is an empty list) or b) to accumulate \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) (and \(\mathbf{\eta}\) during estimation) (if data is None and file_paths is a non-empty list). For case b, this should really be set to the maimum number of cores available. For a this only really speeds up accumulating \(\mathbf{X}\) if \(\mathbf{X}\) has many many columns and/or rows. Defaults to 1.
file_loading_kwargs (dict,optional) – Any key-word arguments to pass to pandas.read_csv when \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively (if data is None and file_paths is a non-empty list). Defaults to {"header":0,"index_col":False}.

Variables:

lhs (lhs) – The left-hand side object of the regression formula passed to the constructor. Initialized at construction.
terms ([GammTerm]) – The list of terms passed to the constructor. Initialized at construction.
data (pd.DataFrame) – The dataframe passed to the constructor. Initialized at construction.
coef_per_term ([int]) – A list containing the number of coefficients corresponding to each term included in terms. Initialized at construction.
coef_names ([str]) – A list containing a named identifier (e.g., “Intercept”) for each coefficient estimated by the model. Initialized at construction.
n_coef (int) – The number of coefficients estimated by the model in total. Initialized at construction.
unpenalized_coef (int) – The number of un-penalized coefficients estimated by the model. Initialized at construction.
y_flat (np.ndarray or None) – An array, containing all values on the dependent variable (i.e., specified by lhs.variable) in order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.
cov_flat (np.ndarray or None) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.
NOT_NA_flat (np.ndarray or None) – An array, containing an indication (as bool) for each value on the dependent variable (i.e., specified by lhs.variable) whether the corresponding value is not a number (“NA”) or not. In order of the data-frame passed to data. This variable will be initialized at construction but only if file_paths=None, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.

Encodes data, which needs to be a pd.DataFrame and by default (if prediction==False) builds an index of which rows in data are NA in the column of the dependent variable described by self.lhs.

Parameters:

data (pd.DataFrame) – The data to encode.
prediction (bool, optional) – Whether or not a NA index and a column for the dependent variable should be generated.

Returns:

A tuple with 7 (optional) entries: the dependent variable described by self.lhs, the encoded predictor variables as a (N,k) array (number of rows matches the number of rows of the first entry returned, the number of columns matches the number of k variables present in the formula), an indication for each row whether the dependent variable described by self.lhs is NA, like the first entry but split into a list of lists by self.series_id, like the second entry but split into a list of lists by self.series_id, ike the third entry but split into a list of lists by self.series_id, start and end points for the splits used to split the previous three elements (identifying the start and end point of every level of self.series_id).

Return type:

get_coding_factors() → dict: Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

get_data() → DataFrame: Get a copy of the data specified for this formula.

get_depvar() → ndarray: Get a copy of the encoded dependent variable (defined via self.lhs).

get_factor_codings() → dict: Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the levels (str) of the factor and the values to their encoded levels (int).

get_factor_levels() → dict: Get a copy of the factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

get_has_intercept() → bool: Does this formula include an intercept or not.

get_ir_smooth_term_idx() → list[int]: Get a copy of the list of indices that identify impulse response terms in self.terms.

get_lhs() → lhs: Get a copy of the lhs specified for this formula.

get_linear_term_idx() → list[int]: Get a copy of the list of indices that identify linear terms in self.terms.

get_n_coef() → int: Get the number of coefficients that are implied by the formula.

get_notNA() → ndarray: Get a copy of the encoded ‘not a NA’ vector for the dependent variable (defined via self.lhs).

get_random_term_idx() → list[int]: Get a copy of the list of indices that identify random terms in self.terms.

get_smooth_term_idx() → list[int]: Get a copy of the list of indices that identify smooth terms in self.terms.

get_subgroup_variables() → list: Returns a copy of sub-group variables for factor smooths.

get_term_names() → list[str]: Returns a copy of the list with the names of the terms specified for this formula.

get_terms() → list[GammTerm]: Get a copy of the terms specified for this formula.

get_var_map() → dict: Get a copy of the var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix returned by self.encode_data.

get_var_maxs() → dict: Get a copy of the var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in self.data for continuous variables or None for categorical variables.

get_var_mins() → dict: Get a copy of the var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on in self.data for continuous variables or None for categorical variables.

get_var_mins_maxs() → tuple[dict, dict]: Get a tuple containing copies of both the mins and maxs directory. See self.get_var_mins and self.get_var_maxs.

get_var_types() → dict: Get a copy of the var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.

has_ir_terms() → bool: Does this formula include impulse response terms or not.

mssm.src.python.formula.build_model_matrix(formula: Formula, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) → csc_array

Function to build the model matrix implied by formula.

Important: A small selection of smooth terms, requires that the penalty matrices are built at least once before the model matrix can be build. For this reason, you generally must call build_penalties(formula) before calling build_model_matrix(formula) (interally, mssm checks whether formula.built_penalties==True.). See the example below.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssmViz.plot import *
import matplotlib.pyplot as plt

from mssm.src.python.formula import build_penalties,build_model_matrix

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# First extract the penalties
penalties = build_penalties(formula)

# Then the model matrix:
X = build_model_matrix(formula)

Parameters:

formula (Formula) – A Formula
pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None
use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If None, then all terms are build. Terms not built are set to zero columns, defaults to None
tol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0

Raises:

ValueError – If formula.built_penalties == False - i.e., it is required that build_penalties(formula) was called before calling build_model_matrix(formula).
NotImplementedError – If the formula was set up to read data from file, rather than from a pd.Dataframe.

Returns:

The model matrix implied by a Formula and cov_flat.

Return type:

scp.sparse.csc_array

mssm.src.python.formula.build_penalties(formula) → list[LambdaTerm]

Function to build all penalty matrices required by a Formula.

The function is called whenever it is needed, but the example below shows you how to use it in case you want to extract the penalties directly.

Examples:

from mssm.models import *
from mssmViz.sim import *
from mssm.src.python.formula import build_penalties

# Get some data and formula
Binomdat = sim3(10000,0.1,family=Binomial(),seed=20)
formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat)

# Now extract the penalties
penalties = build_penalties(formula)

print(penalties)

Parameters:

formula (Formula) – A Formula

Raises:

KeyError – If an un-penalized irf term is included in the formula after penalized terms.
KeyError – If an un-penalized smooth term is included in the formula after penalized terms.
ValueError – If no start index has been defined by the formula. For testing only.

Returns:

A list of all penalties (encoded as LambdaTerm) required by the formula

Return type:

list[LambdaTerm]

mssm.src.python.formula.build_sparse_matrix_from_formula(terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat: ndarray, cov: ndarray | None, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) → csc_array

Build model matrix from formula properties.

This function is used internally to construct model matrices from Formula objects. For greater convenience see the build_model_matrix() function.

Important, make sure to only ever call this when formula.built_penalties==True - see the build_model_matrix() function description.

Parameters:

terms (list[GammTerm]) – List of terms of a Formula
has_intercept (bool) – Indicator of whether the Formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat (np.ndarray) – Encoded data
cov (np.ndarray | None, optional) – Encoded data split by levels of the factor in Formula.series_id
pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None
use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If None, then all terms are build. Terms not built are set to zero columns, defaults to None
tol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0

Returns:

The model matrix implied by a Formula and cov_flat.

Return type:

scp.sparse.csc_array

class mssm.src.python.formula.lhs(variable: str, f: Callable = None)

Bases: object

The Left-hand side of a regression equation.

See the Formula class for examples.

Parameters:

variable (str) – The name of the dependent/response variable in the dataframe passed to a Formula. Can point to continuous and categorical variables. For mssm..models.GSMM models, the variable can also be set to any placeholder variable in the data, since not every Formula will be associated with a particular response variable.
f (Callable, optional) – A function that will be applied to the variable before fitting. For example: np.log(). By default no function is applied to the variable.

mssm.src.python.gamm_solvers module

mssm.src.python.gamm_solvers.PIRLS_newton_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) → tuple[ndarray, ndarray, ndarray]

Internal function. Compute pseudo-data and newton weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1 and 3.1.2)

Calculation reflects full Newton scoring!

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – vector of observations
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
family (Family) – Family of model

Raises:

ValueError – If not a single observation provided information for newton weights.

Returns:

the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations

Return type:

tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.PIRLS_pdat_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) → tuple[ndarray, ndarray, ndarray]

Internal function. Compute pseudo-data and weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1)

Calculation is based on a(mu) = 1, so reflects Fisher scoring!

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – vector of observations
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
family (Family) – Family of model

Raises:

ValueError – If not a single observation provided information for Fisher weights.

Returns:

the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations

Return type:

tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.apply_eigen_perm(Pr: list[int], InvCholXXSP: csc_array) → csc_array

Internal function. Unpivots columns of InvCholXXSP (usually the inverse of a Cholesky factor) and returns the unpivoted version.

Parameters:

Pr (list[int]) – List of column indices
InvCholXXSP (scp.sparse.csc_array) – Pivoted matrix

Returns:

Unpivoted matrix

Return type:

scp.sparse.csc_array

mssm.src.python.gamm_solvers.back_track_alpha(coef: ndarray, step: ndarray, llk_fun: Callable, grad_fun: Callable, *llk_args, alpha_max: float = 1, c1: float = 0.0001, max_iter: int = 100) → float | None

Simple step-size backtracking function that enforces Armijo condition (Nocedal & Wright, 2004)

References:

Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:

coef (np.ndarray) – coefficient estimate
step (np.ndarray) – step to take to update coefficients
llk_fun (Callable) – llk function
grad_fun (Callable) – function to evaluate gradient of llk
alpha_max (float, optional) – Parameter by Nocedal & Wright, defaults to 1
c1 (float, optional) – 2nd Parameter by Nocedal & Wright, defaults to 1e-4
max_iter (int, optional) – Number of maximum iterations, defaults to 100

Returns:

The step-length meeting the Armijo condition or None in case none such was found

Return type:

float | None

mssm.src.python.gamm_solvers.calculate_edf(LP: csc_array | None, Pr: list[int] | None, InvCholXXS: csc_array | LinearOperator | None, merged_penalties: list[LambdaTerm], lgdetDs: list[float] | None, colsX: int, n_c: int, drop: ndarray[tuple[Any, ...], dtype[int64]] | None, S_emb: csc_array | None) → tuple[float, list[float], list[float]]

Internal function. Follows steps outlined by Wood & Fasiolo (2017) to compute total degrees of freedom by the model.

Generates the B matrix also required for the derivative of the log-determinant of X.T@X+S_lambda. This is either done exactly - as described by Wood & Fasiolo (2017) - or approximately. The latter is much faster.

Also implements the L-qEFS trace computations described by Krause et al. (submitted) based on a quasi-newton approximation to the negative hessian of the log-likelihood.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None
Pr (list[int] | None) – Permutation list of LP
InvCholXXS (scp.sparse.csc_array | scp.sparse.linalg.LinearOperator | None) – Unpivoted Inverse of LP, or a quasi-newton approximation of it (for the L-qEFS update), or None
merged_penalties (list[LambdaTerm]) – list of penalties
lgdetDs (list[float]) – list of Derivatives of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
colsX (int) – Number of columns of model matrix
n_c (int) – Number of cores to use for computations
drop (np.typing.NDArray[np.int_]) – List of dropped coefficients - can be None
S_emb (scp.sparse.csc_array | None) – Total penalty matrix

Returns:

A tuple containing the total estimated degrees of freedom, the amount of parameters penalized away by individual penalties in a list, and a list of the aforementioned sum of the elements of the aforementioned B matrices raised to the power of 2.

Return type:

tuple[float,list[float],list[float]]

mssm.src.python.gamm_solvers.calculate_term_edf(merged_penalties: list[LambdaTerm], param_penalized: list[float]) → list[float]

Internal function. Computes the smooth-term (and random term) specific estimated degrees of freedom.

See Wood (2017) for a definition and Wood, S. N., & Fasiolo, M. (2017). for the computations.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

merged_penalties (list[LambdaTerm]) – List of penalties
param_penalized (list[float]) – List holding the amount of parameters penalized away by individual penalties - obtained from calculate_edf().

Returns:

A list holding the estimated degrees of freedom per smooth/random term in the model

Return type:

list[float]

mssm.src.python.gamm_solvers.check_drop_valid_gammlss(y: ndarray, coef: ndarray, coef_split_idx: list[int], Xs: list[csc_array], S_emb: csc_array, keep: ndarray[tuple[Any, ...], dtype[int64]], family: GAMLSSFamily) → tuple[bool, float | None]

Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.

Parameters:

y (np.ndarray) – Vector of response variable
coef (np.ndarray) – Vector of coefficientss
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
Xs (list[scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
keep (np.typing.NDArray[np.int_]) – Array of coefficients to retain
family (GAMLSSFamily) – Model family

Returns:

tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set (or None if invalid).

Return type:

tuple[bool,float | None]

mssm.src.python.gamm_solvers.check_drop_valid_gensmooth(ys: list[ndarray | None], coef: ndarray, Xs: list[csc_array | None], S_emb: csc_array, keep: ndarray[tuple[Any, ...], dtype[int64]], family: GSMMFamily) → tuple[bool, float | None]

Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.

Parameters:

ys (list[np.ndarray | None]) – List holding vectors of observations
coef (np.ndarray) – Vector of coefficients
Xs (list[scp.sparse.csc_array | None]) – List of model matrices - one per parameter
S_emb (scp.sparse.csc_array) – Total Penalty matrix
keep (np.typing.NDArray[np.int_]) – Array of coefficients to retain
family (GSMMFamily) – Model family

Returns:

tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set.

Return type:

tuple[bool,float|None]

mssm.src.python.gamm_solvers.compute_S_emb_pinv_det(col_S: int, merged_penalties: list[LambdaTerm], pinv: str, root: bool = False) → tuple[csc_array, csc_array, csc_array | None, list[bool]]

Internal function. Compute the total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and determines for which EFS updates the rank rather than the generalized inverse should be used.

Parameters:

col_S (int) – Number of columns of total penalty matrix
merged_penalties (list[LambdaTerm]) – List of penalties - potentially including shared lambda terms
pinv (str) – Strategy to use to compute the generalized inverse. Set this to ‘svd’.
root (bool, optional) – Whther to compute a root of the generalized inverse, defaults to False

Returns:

A tuple holding total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and a list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used

Return type:

tuple[scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array|None, list[bool]]

mssm.src.python.gamm_solvers.compute_eigen_perm(Pr: list[int] | ndarray) → csc_array

Internal function. Computes column permutation matrix obtained from Eigen.

Parameters:: Pr (list[int] | np.ndarray) – List of column indices
Returns:: Permutation matrix as sparse array
Return type:: scp.sparse.csc_array

mssm.src.python.gamm_solvers.compute_lgdetD_bsb(rank: int | None, cLam: float, gInv: csc_array, emb_SJ: csc_array, cCoef: ndarray) → tuple[float, float]

Internal function. Computes derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.

See Wood, Shaddick, & Augustin, (2017) and Wood & Fasiolo (2017), and Wood (2017), and Wood (2011)

References:

Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

rank (int | None) – Known rank of penalty matrix or None (should only be set to int for single penalty terms)
cLam (float) – Current lambda value
gInv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix
emb_SJ (scp.sparse.csc_array) – Embedded penalty matrix
cCoef (np.ndarray) – coefficient vector

Returns:

Tuple, first element is aforementioned derivative, second is cCoef.T@emb_SJ@cCoef

Return type:

tuple[float,float]

mssm.src.python.gamm_solvers.computetrVS3(t1: ndarray | None, t2: ndarray | None, t3: ndarray | None, lTerm: LambdaTerm, V0: csc_array) → float

Internal function. Compute tr(V@lTerm.S_j) from linear operator of V obtained from L-BFGS-B optimizer.

Relies on equation 3.13 in Byrd, Nocdeal & Schnabel (1992). Adapted to ensure positive semi-definitiness required by EFS update.

References:

Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

t1 (np.ndarray or None) – nCoef*2m matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then V is treated like an identity matrix.
t2 (np.ndarray or None) – 2m*2m matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then V is treated like an identity matrix.
t3 (np.ndarray or None) – 2m*nCoef matrix from Byrd, Nocdeal & Schnabel (1992). If t2 is None, then t1 is treated like an identity matrix.
lTerm (LambdaTerm) – Current lambda term for which to compute the trace.
V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian fo the negative penalized likelihood.

Returns:

trace

Return type:

float

mssm.src.python.gamm_solvers.correct_coef_step(coef: ndarray, n_coef: ndarray, dev: float, pen_dev: float, c_dev_prev: float, family: Family, eta: ndarray, mu: ndarray, y: ndarray, X: csc_array | None, n_pen: float, S_emb: csc_array, formula: Formula | None, n_c: int, offset: float | ndarray) → tuple[float, float, ndarray, ndarray, ndarray]

Internal function. Performs step-length control on the coefficient vector.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

coef (np.ndarray) – Current coefficient estimate
n_coef (np.ndarray) – New coefficient estimate
dev (float) – new deviance
pen_dev (float) – new penalized deviance
c_dev_prev (float) – previous penalized deviance
family (Family) – Family of model
eta (np.ndarray) – vector of linear predictors - under new coefficient estimate
mu (np.ndarray) – vector of mean estimates - under new coefficient estimate
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array | None) – Model matrix
n_pen (float) – total penalty under new coefficient estimate
S_emb (scp.sparse.csc_array) – Total penalty matrix
formula (Formula | None) – (optionally) Formula of model
n_c (int) – Number of cores
offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

Updated versions of dev,pen_dev,mu,eta,coef

Return type:

tuple[float,float,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.gamm_solvers.correct_coef_step_gammlss(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) → tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], float, float, float]

Apply step size correction to Newton update for GAMLSS models, as discussed by WPS (2016).

References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

family (GAMLSSFamily) – Family of model
y (np.ndarray) – Vector of observations
Xs (list[scp.sparse.csc_array]) – List of model matrices
coef (np.ndarray) – Current coefficient estimate
next_coef (np.ndarray) – Updated coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
c_llk (float) – Current log likelihood
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update

Returns:

A tuple containing the corrected coefficient estimate next_coef,``next_coef`` split via coef_split_idx,next mus,next etas,next llk,nex penalized llk, updated step length fro next gradient update

Return type:

tuple[np.ndarray,list[np.ndarray],list[np.ndarray],list[np.ndarray],float,float,float]

mssm.src.python.gamm_solvers.correct_coef_step_gen_smooth(family: GSMMFamily, ys: list[ndarray | None], Xs: list[csc_array | None], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) → tuple[ndarray, float, float, float]

Apply step size correction to Newton update for general smooth models, as discussed by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

family (GSMMFamily) – Model family
ys (list[np.ndarray | None]) – List of vectors of observations
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
coef (np.ndarray) – Coefficient estimate
next_coef (np.ndarray) – Proposed next coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
c_llk (float) – Current log likelihood
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update

Returns:

A tuple containing the corrected coefficient estimate next_coef,next llk, next penalized llk, updated step length for next gradient update

Return type:

tuple[np.ndarray,float,float,float]

mssm.src.python.gamm_solvers.correct_lambda_step(y: ndarray, yb: ndarray, z: ndarray | None, Wr: csc_array | None, rowsX: int, colsX: int, X: csc_array | None, Xb: csc_array, coef: ndarray, Lrhoi: csc_array | None, family: Family, col_S: int, S_emb: csc_array, penalties: list[LambdaTerm], was_extended: list[bool], pinv: str, lam_delta: ndarray, extend_by: dict, o_iter: int, dev_check: float, n_c: int, control_lambda: int, extend_lambda: bool, exclude_lambda: bool, extension_method_lam: str, formula: Formula | None, form_Linv: bool, method: str, offset: float | ndarray, max_inner: int) → tuple[ndarray, csc_array, ndarray, csc_array, ndarray, ndarray, ndarray, csc_array, csc_array | None, float, list[float], float, ndarray, ndarray, dict, list[LambdaTerm], list[bool], csc_array, int, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None]

Performs step-length control for lambda.

Lambda update is based on EFS update by Wood & Fasiolo (2017), step-length control is partially based on Wood et al. (2017) - Krause et al. (submitted) has the specific implementation.

References:

Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
z (np.ndarray | None) – pseudo-data (can have NaNs for invalid observations)
Wr (scp.sparse.csc_array | None) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array | None) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
coef (np.ndarray) – Current coefficient estimate
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Model family
col_S (int) – Columns of total penalty matrix
S_emb (scp.sparse.csc_array) – Total penalty matrix
penalties (list[LambdaTerm]) – List of penalties
was_extended (bool) – List holding indication per lambda parameter whether it was extended or not
pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!
lam_delta (np.ndarray) – Proposed update to lambda parameters
extend_by (dict) – Extension info dictionary
o_iter (int) – Outer iteration index
dev_check (float) – Multiple of previous deviance used for convergence check
n_c (int) – Number of cores to use
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.
formula (Formula | None) – (Optionally) Formula of model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)
offset (float | np.ndarray) – Offset (fixed effect) to add to eta
max_inner (int) – Maximum number of iterations to use to update the coefficient estimate

Returns:

Tuple containing updated values for yb, Xb, z, Wr, eta, mu, n_coef, the Cholesky of the penalzied hessian CholXXS, the inverse of the former InvCholXXS, total edf, term-wse edfs, updated scale, working residuals, accepted update to lambda, extend_by, penalties, was_extended, updated S_emb, number of lambda updates, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop

Return type:

tuple[np.ndarray, scp.sparse.csc_array, np.ndarray, scp.sparse.csc_array, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array|None, float, list[float], float, np.ndarray, np.ndarray, dict, list[LambdaTerm], list[bool], scp.sparse.csc_array, int, np.typing.NDArray[np.int_]|None, np.typing.NDArray[np.int_]|None]

mssm.src.python.gamm_solvers.correct_lambda_step_gamlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs: list[csc_array], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int) → tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None, csc_array, list[LambdaTerm], float, list[float], ndarray]

Updates and performs step-length control for the vector of lambda parameters of a GAMMLSS model. Essentially completes the steps described in section 3.3 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GAMLSSFamily) – Family of model
mus (list[np.ndarray]) – List of estimated means
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
n_coef (int) – Number of coefficients
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
gamlss_pen (list[LambdaTerm]) – List of penalties
lam_delta (np.ndarray) – Update to vector of lambda parameters
extend_by (dict) – Extension info dictionary
was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not
c_llk (float) – Current llk
fit_info (Fit_info) – A Fit_info object
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]] | None) – Set of previously kept and dropped coeeficients or None
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary
extension_method_lam (str) – Which method to use to extend lambda proposals.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML.
repara (bool) – Whether to apply a stabilizing re-parameterization to the model
n_c (int) – Number of cores to use

Returns:

coef estimate under corrected lambda, split version of next coef estimate, next mus, next etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop, the new total penalty matrix, the new list of penalties, total edf, term-wise edfs, the update to the lambda vector

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, np.typing.NDArray[np.int_] | None, np.typing.NDArray[np.int_] | None, scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]

mssm.src.python.gamm_solvers.correct_lambda_step_gen_smooth(family: GSMMFamily, ys: list[ndarray | None], Xs: list[csc_array | None], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, gamma: float, method: str, qEFSH: str, overwrite_coef: bool, qEFS_init_converge: bool, optimizer: str, __old_opt: LinearOperator | None, use_grad: bool, __neg_pen_llk: Callable, __neg_pen_grad: Callable, piv_tol: float, keep_drop: tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int, init_bfgs_options: dict, bfgs_options: dict) → tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, float, float, LinearOperator | None, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None, csc_array, list[LambdaTerm], float, list[float], ndarray]

Updates and performs step-length control for the vector of lambda parameters of a GSMM model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GSMMFamily) – Model family
ys (list[np.ndarray | None]) – List of observation vectors
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
n_coef (int) – Number of coefficients
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
smooth_pen (list[LambdaTerm]) – List of penalties
lam_delta (np.ndarray) – Update to vector of lambda parameters
extend_by (dict) – Extension info dictionary
was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not
c_llk (float) – Current llk
fit_info (Fit_info) – A Fit_info object
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
gamma (float) – Weight factor determining whether we should look for smoother or less smooth models
method (str) – Method to use to estimate coefficients (and lambda parameter)
qEFSH (str) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS')
overwrite_coef (bool) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS'. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect when qEFS_init_converge=True.
qEFS_init_converge (bool) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS'.
optimizer (str) – Deprecated
__old_opt (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood
use_grad (bool) – Deprecated
__neg_pen_llk (Callable) – Function to evaluate negative penalized log-likelihood
__neg_pen_grad (Callable) – Function to evaluate gradient of negative penalized log-likelihood
piv_tol (float) – Deprecated
keep_drop (tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]] | None) – Set of previously kept and dropped coeeficients or None
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary
extension_method_lam (str) – Which method to use to extend lambda proposals.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed.
repara (bool) – Whether to apply a stabilizing re-parameterization to the model
n_c (int) – Number of cores to use
init_bfgs_options (dict) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem. Only has an effect when qEFS_init_converge=True.
bfgs_options (dict) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS'.

Returns:

coef estimate under corrected lambda, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former (or another instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation), next llk, next penalized llk, if the L-qEFS update is used to estimate coefficients/lambda parameters a scp.sparse.linalg.LinearOperator holding the previous quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop, new total penalty matrix, new list of penalties, total edf, term-wise edfs, the update to the lambda vector

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, float, float, scp.sparse.linalg.LinearOperator|None, np.typing.NDArray[np.int_]|None, np.typing.NDArray[np.int_]|None, scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]

mssm.src.python.gamm_solvers.deriv_transform_eta_beta(d1eta: list[ndarray], d2eta: list[ndarray], d2meta: list[ndarray], Xs, only_grad=False)

Further transforms derivatives of llk with respect to eta to get derivatives of llk with respect to coefficients. Based on section 3.2 and Appendix A in Wood, Pya, & Säfken (2016)

References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

mssm.src.python.gamm_solvers.deriv_transform_mu_eta(y: ndarray, means: list[ndarray], family: GAMLSSFamily) → tuple[list[ndarray], list[ndarray], list[ndarray]]

Compute derivatives (first and second order) of llk with respect to each linear predictor based on their respective mean for all observations following steps outlined by Wood, Pya, & Säfken (2016)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

y (np.ndarray) – Vector of observations
means (list[np.ndarray]) – List holding vectors of mean estimates
family (GAMLSSFamily) – Family of the model

Returns:

A tuple containing a list containing the first order partial derivatives with respect to each parameter, the same for pure second derivatives, and a list containing mixed derivatives

Return type:

tuple[list[np.ndarray],list[np.ndarray],list[np.ndarray]]

mssm.src.python.gamm_solvers.drop_terms_X(Xs: list[csc_array], keep: ndarray[tuple[Any, ...], dtype[int64]]) → tuple[list[csc_array], list[int]]

Drops cols of model matrices corresponding to dropped terms.

Parameters:

Xs (list[scp.sparse.csc_array]) – List of model matrices included in the model formula.
keep (np.typing.NDArray[np.int_]) – Array of columns to keep.

Returns:

Tuple, containing a list of updated model matrices - a copy is made - and a new list conatining the indices by which to split the coefficient vector.

Return type:

tuple[list[scp.sparse.csc_array],list[int]]

mssm.src.python.gamm_solvers.extend_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str) → tuple[float, dict, list[bool]]

Internal function. Performs an update to the lambda parameter, ideally extending the step taken without overshooting the objective.

Parameters:

lti (int) – Penalty index
lam (float) – Current lamda value
dLam (float) – The lambda update
extend_by (dict) – Extension info dictionary
was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not
method (str) – Extension method to use.

Raises:

ValueError – If requested method is not implemented

Returns:

Updated values for dLam, extend_by, was_extended

Return type:

tuple[float,dict,list[bool]]

mssm.src.python.gamm_solvers.form_cross_prod_mp(should_cache: bool, cache_dir: str, file: str, fi: int, y_flat: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) → tuple[csc_array, ndarray]

Computes X.T@X and X.T@y based on the data in file.

Parameters:

should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
y_flat (np.ndarray) – Observation vector
terms (list[GammTerm]) – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on file
cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray]

mssm.src.python.gamm_solvers.form_eta_mp(should_cache: bool, cache_dir: str, file: str, fi: int, coef: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) → ndarray

Computed X@coef, where X is model matrix for file.

Parameters:

should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
coef (np.ndarray) – Current coefficient estimate
terms (list[GammTerm]) – _description_
terms – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on file
cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

X@coef for this file

Return type:

np.ndarray

mssm.src.python.gamm_solvers.gd_coef_smooth(coef: ndarray, grad: ndarray, S_emb: csc_array, a: float) → ndarray

Follows sections 3.1.2 and 3.14 in WPS (2016) to update the coefficients of a GAMLSS/GSMM model via a Gradient descent (ascent actually) step.

1) Computes gradient of the penalized likelihood (grad - S_emb@coef) 3) Uses this to compute update 4) Step size control - happens outside

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

coef (np.ndarray) – Current coefficient estimate
grad (np.ndarray) – gradient of llk with respect to coef
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update

Returns:

An updated estimate of the coefficients

Return type:

np.ndarray

mssm.src.python.gamm_solvers.grad_lambda(lgdet_deriv: float, ldet_deriv: float, bSb: float, scale: float) → float

Internal function. Computes gradient of REML criterion with respect to all lambda paraemters.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.
ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
bSb (float) – cCoef.T@emb_SJ@cCoef where cCoef is current coefficient estimate
scale (float) – Optional scale parameter (or 1)

Returns:

The derivative of the reml criterion with resepct to a specific smoothing penalty

Return type:

np.ndarray

mssm.src.python.gamm_solvers.handle_drop_gammlss(family: GAMLSSFamily, y: ndarray, coef: ndarray, keep: ndarray[tuple[Any, ...], dtype[int64]], Xs: list[csc_array], S_emb: csc_array) → tuple[ndarray, list[ndarray], list[int], list[csc_array], csc_array, list[ndarray], list[ndarray], float, float]

Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.

Parameters:

family (GAMLSSFamily) – Model family
y (np.ndarray) – Vector of observations
coef (np.ndarray) – Vector of coefficients
keep (np.typing.NDArray[np.int_]) – Array of parameter indices to keep.
Xs (list[scp.sparse.csc_array]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix.

Returns:

A tuple holding: reduced coef vector, split version of the reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated etas, mus, llk, and penalzied llk

Return type:

tuple[np.ndarray, list[np.ndarray], list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, list[np.ndarray], list[np.ndarray], float, float]

mssm.src.python.gamm_solvers.handle_drop_gsmm(family: GSMMFamily, ys: list[ndarray | None], coef: ndarray, keep: ndarray[tuple[Any, ...], dtype[int64]], Xs: list[csc_array | None], S_emb: csc_array) → tuple[ndarray, list[int], list[csc_array], csc_array, float, float]

Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.

Parameters:

family (GSMMFamily) – Model family
ys (list[np.ndarray | None]) – List with vector of observations
coef (np.ndarray) – Vector of coefficients
keep (np.typing.NDArray[np.int_]) – Array of parameter indices to keep.
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix.

Returns:

A tuple holding: reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated llk, and penalized llk

Return type:

tuple[np.ndarray, list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, float, float]

mssm.src.python.gamm_solvers.identify_drop(H: csc_array, S_scaled: csc_array, method: str = 'QR') → tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]]

Routine to (approximately) identify the rank of the scaled negative hessian of the penalized likelihood based on a rank revealing QR decomposition or the methods by Foster (1986) and Gotsman & Toledo (2008).

If method=="QR", a rank revealing QR decomposition is performed for the scaled penalized Hessian. The latter has to be transformed to a dense matrix for this. This is essentially the approach by Wood et al. (2016) and is the most accurate. Alternatively, we can rely on a variant of Foster’s method. This is done when method=="LU" or method=="Direct". method=="LU" requires p LU decompositions - where p is approximately the Kernel size of the matrix. Essentially continues to find vectors forming a basis of the Kernel of the balanced penalzied Hessian from the upper matrix of the LU decomposition and successively drops columns corresponding to the maximum absolute value of the Kernel vectors (see Foster, 1986). This is repeated until we can form a cholesky of the scaled penalized hessian which as an acceptable condition number. If method=="Direct", the same procedure is completed, but Kernel vectors are found directly based on the balanced penalized Hessian, which can be less precise.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Foster (1986). Rank and null space calculations using matrix decomposition without column interchanges.
Gotsman & Toledo (2008). On the Computation of Null Spaces of Sparse Rectangular Matrices.
mgcv source code, in particular: https://github.com/cran/mgcv/blob/master/R/gam.fit4.r

Parameters:

H (scp.sparse.csc_array) – Estimate of the hessian of the log-likelihood.
S_scaled (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
method (str, optional) – Which method to use to check for rank deficiency, defaults to ‘QR’

Returns:

A tuple containing arrays of the coefficients to keep and to drop. The latter will be empty if no coefficients need to be dropped.

Return type:

tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]]

mssm.src.python.gamm_solvers.init_step_gam(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, rowsX: int, colsX: int, X: csc_array | None, Xb: csc_array, family: Family, col_S: int, penalties: list[LambdaTerm], pinv: str, n_c: int, formula: Formula | None, form_Linv: bool, method: str, offset: float | ndarray, Lrhoi: csc_array | None) → tuple[float, float, ndarray, ndarray, ndarray, csc_array, csc_array, float, list[float], float, ndarray, ndarray, csc_array]

Internal function. Gets initial estimates for a GAM model for coefficients and proposes first lambda update.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array | None) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of model
col_S (int) – Cols of penalty matrix
penalties (list[LambdaTerm]) – List of penalties
pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!
n_c (int) – Number of cores to use
formula (Formula | None) – (Optionally) Formula of the model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)
offset (float | np.ndarray) – Offset (fixed effect) to add to eta
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

Returns:

A tuple containing the deviance dev, penalized deviance pen_dev,eta, mu, coef, CholXXS, InvCholXXS, total_edf, term_edfs, scale, wres, lam_delta, S_emb

Return type:

tuple[float, float, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, np.ndarray, np.ndarray, scp.sparse.csc_array]

mssm.src.python.gamm_solvers.initialize_extension(method: str, penalties: list[LambdaTerm]) → dict

Internal function. Initializes a dictionary holding all the necessary information to compute the lambda extensions at every iteration of the fitting iteration.

Parameters:

method (str) – Which extension method to use
penalties (list[LambdaTerm]) – List of penalties

Returns:

extension info dictionary

Return type:

dict

mssm.src.python.gamm_solvers.keep_XTX(cov_flat: ndarray, y_flat: ndarray, formula: Formula, nc: int, progress_bar: bool) → tuple[csc_array, ndarray]

Computes X.T@X and X.T@y in blocks.

Parameters:

cov_flat (np.ndarray) – Encoded data as np.array
y_flat (np.ndarray) – vector of observations
formula (Formula) – Formula of model
nc (int) – Number of cores to use
progress_bar (bool) – Whether to print progress or not

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray]

mssm.src.python.gamm_solvers.keep_eta(formula: Formula, coef: ndarray, nc: int) → list[float]

Computes X@coef in parallel, where X is the overall model matrix and coef is current coefficient estimate.

Parameters:

formula (Formula) – Formula of model
coef (np.ndarray) – Current coefficient estimate
nc (int) – Number of cores to use

Returns:

X@coef

Return type:

list[float]

mssm.src.python.gamm_solvers.newton_coef_smooth(coef: ndarray, grad: ndarray, H: csc_array, S_emb: csc_array) → tuple[ndarray, csc_array, csc_array, float]

Follows sections 3.1.2 and 3.14 in Wood, Pya, & Säfken (2016) to update the coefficients of a GAMLSS/GSMM model via a newton step.

Computes gradient of the penalized likelihood (grad - S_emb@coef)
Computes negative Hessian of the penalized likelihood (-1*H + S_emb) and it’s inverse.
Uses these two to compute the Netwon step.
Step size control - happens outside

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
mgcv source code, in particular: https://github.com/cran/mgcv/blob/master/R/gam.fit4.r

Parameters:

coef (np.ndarray) – Current coefficient estimate
grad (np.ndarray) – gradient of llk with respect to coef
H (scp.sparse.csc_array) – hessian of the llk
S_emb (scp.sparse.csc_array) – Total penalty matrix

Returns:

A tuple containing an estimate of the coefficients, the un-pivoted cholesky of the penalized negative hessian, the inverse of the former, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible

Return type:

tuple[np.ndarray,scp.sparse.csc_array,scp.sparse.csc_array,float]

mssm.src.python.gamm_solvers.read_XTX(file: str, formula: Formula, nc: int) → tuple[csc_array, ndarray, int]

Computes X.T@X and X.T@y for this file in parallel, reading data from file.

Parameters:

file (str) – File name
formula (Formula) – Formula of model
nc (int) – Number of cores to use

Returns:

X.T@X, X.T@y

Return type:

tuple[scp.sparse.csc_array,np.ndarray,int]

mssm.src.python.gamm_solvers.read_eta(file, formula: Formula, coef: ndarray, nc: int) → list[float]

Computes X@coef in parallel, where X is the model matrix based on this file and coef is the current coefficient estimate.

Parameters:

file (str) – File name
formula (Formula) – Formula of model
coef (np.ndarray) – Current coefficient estimate
nc (int) – Number of cores to use

Returns:

X@coef

Return type:

list[float]

mssm.src.python.gamm_solvers.read_mmat(should_cache: bool, cache_dir: str, file: str, fi: int, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) → csc_array

Creates model matrix for that dataset. The model-matrix is either cached or not. If the former is the case, the matrix is read in on subsequent calls to this function.

Parameters:

should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
terms (list[GammTerm]) – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on file
cov (list[np.ndarray]) – Essentially [cov_flat_file]

Returns:

model matrix associated with this file

Return type:

scp.sparse.csc_array

mssm.src.python.gamm_solvers.restart_coef(coef: ndarray, c_llk: float | None, c_pen_llk: float | None, n_coef: int, coef_split_idx: list[int], ys: list[ndarray | None], Xs: list[csc_array | None], S_emb: csc_array, family: GSMMFamily, outer: int, restart_counter: int) → tuple[ndarray, float, float]

Shrink coef towards random vector to restart algorithm if it get’s stuck.

Parameters:

coef (np.ndarray) – Coefficient estimate
c_llk (float) – Current llk
c_pen_llk (float) – Current penalized llk
n_coef (int) – Number of coefficients
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
ys (list[np.ndarray | None]) – List of observation vectors
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix
family (GSMMFamily) – Model family
outer (int) – Outer iteration index
restart_counter (int) – Number of restarts already handled previously

Returns:

Updates for coef, c_llk, c_pen_llk

Return type:

tuple[np.ndarray, float, float]

mssm.src.python.gamm_solvers.restart_coef_gammlss(coef: ndarray, split_coef: list[ndarray], c_llk: float, c_pen_llk: float, etas: list[ndarray], mus: list[ndarray], n_coef: int, coef_split_idx: list[int], y: ndarray, Xs: list[csc_array], S_emb: csc_array, family: GAMLSSFamily, outer: int, restart_counter: int) → tuple[ndarray, list[ndarray], float, float, list[ndarray], list[ndarray]]

Shrink coef towards random vector to restart algorithm if it get’s stuck.

Parameters:

coef (np.ndarray) – Coefficient estimate
split_coef (list[np.ndarray]) – Split of coefficient estimate
c_llk (float) – Current llk
c_pen_llk (float) – Current penalized llk
etas (list[np.ndarray]) – List of linear predictors
mus (list[np.ndarray]) – List of estimated means
n_coef (int) – Number of coefficients
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
family (GAMLSSFamily) – Model family
outer (int) – Outer iteration index
restart_counter (int) – Number of restarts already handled previously

Returns:

Updates for coef, split_coef, c_llk, c_pen_llk, etas, mus

Return type:

tuple[np.ndarray, list[np.ndarray], float, float, list[np.ndarray], list[np.ndarray]]

mssm.src.python.gamm_solvers.solve_gamm_sparse(mu_init: ndarray, y: ndarray, X: csc_array, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, max_inner: int = 100, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, method: str = 'Chol', check_cond: int = 2, progress_bar: bool = False, n_c: int = 10, offset: int = 0, Lrhoi: csc_array | None = None) → tuple[ndarray, ndarray, ndarray, csc_array, csc_array, float, csc_array, float, list[float], float, Fit_info]

Estimates a Generalized Additive Mixed model. Implements the algorithms discussed in section 3.2 of the paper by Krause et al. (submitted).

Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017).

References:

Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

mu_init (np.ndarray) – Initial values for means
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Model matrix
penalties (list[LambdaTerm]) – List of penalties
col_S (int) – Columns of total penalty matrix
family (Family) – Family of model
maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10
max_inner (int, optional) – Maximum number of iterations for inner algorithm updating coefficients, defaults to 100
pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str, optional) – _description_, defaults to “nesterov”
form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True
method (str, optional) – Which method to use to solve for the coefficients (“Chol” or “Qr”), defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()). When check_cond=2, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosen method., defaults to 2
progress_bar (bool, optional) – Whether to print progress or not, defaults to False
n_c (int, optional) – Number of cores to use, defaults to 10
offset (int, optional) – Offset (fixed effect) to add to eta, defaults to 0
Lrhoi (scp.sparse.csc_array | None, optional) – Optional covariance matrix of an ar1 model, defaults to None

Raises:

ArithmeticError – _description_
ArithmeticError – _description_
ArithmeticError – _description_
ArithmeticError – _description_
warnings.warn – _description_

Returns:

An estimate of the coefficients coef,the linear predictor eta, the working residuals wres, the root of the Fisher weights as matrix Wr, the matrix with Newton weights at convergence WN, an estimate of the scale parameter, an inverse of the cholesky of the penalized negative hessian InvCholXXS, total edf, term-wise edf, total penalty, a Fit_info object

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, scp.sparse.csc_array, float, list[float], float, Fit_info]

mssm.src.python.gamm_solvers.solve_gamm_sparse2(formula: Formula, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, progress_bar: bool = False, n_c: int = 10) → tuple[ndarray, ndarray, ndarray, csc_array, float, csc_array | None, float, list[float], float, Fit_info]

Estimates an Additive Mixed model. Implements the algorithms discussed in section 3.1 of the paper by Krause et al. (submitted).

Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017). In addition, this function builds the products involving the model matrix only once (iteratively) as described by Wood et al. (2015).

References:

Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N., Goude, Y., & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(1), 139–155. https://doi.org/10.1111/rssc.12068
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

formula (Formula) – Formula of the model
penalties (list[LambdaTerm]) – List of penalties
col_S (int) – Columns of total penalty matrix
family (Family) – Family of model
maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10
pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str, optional) – Which method to use to extend lambda proposals., defaults to “nesterov”
form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True
progress_bar (bool, optional) – Whether to print progress or not, defaults to False
n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

An estimate of the coefficients coef, the linear predictor eta, the working residuals wres, the negative hessian, the estimated scale, an inverse of the cholesky of the negative penalized hessian, total edf, term-wise edfs, total penalty, a Fit_info object

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, float, scp.sparse.csc_array|None, float, list[float], float, Fit_info]

mssm.src.python.gamm_solvers.solve_gammlss_sparse(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 30, min_inner: int = 1, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10) → tuple[ndarray, list[ndarray], list[ndarray], ndarray, csc_array, csc_array, float, list[float], float, list[LambdaTerm], Fit_info]

Fits a GAMLSS model - essentially completes the steps discussed in section 3.3 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GAMLSSFamily) – Model family
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
gamlss_pen (list[LambdaTerm]) – List of penalties
max_outer (int, optional) – Maximum number of outer iterations, defaults to 50
max_inner (int, optional) – Maximum number of inner iterations, defaults to 30
min_inner (int, optional) – Minimum number of inner iterations, defaults to 1
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True
extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”
control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML., defaults to 1
method (str, optional) – Method to use to estimate coefficients, defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition())., defaults to 1
piv_tol (float, optional) – Deprecated, defaults to 0.175
repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True
should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True
prefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients., defaults to False
progress_bar (bool, optional) – Whether progress should be displayed, defaults to True
n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

coef estimate, etas, mus, working residuals, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, total edf, term-wise edfs, total penalty, final list of penalties, a Fit_info object

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, list[LambdaTerm], Fit_info]

mssm.src.python.gamm_solvers.solve_generalSmooth_sparse(family: GSMMFamily, ys: list[ndarray | None], Xs: list[csc_array | None], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 50, min_inner: int = 50, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, optimizer: str = 'Newton', method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, form_VH: bool = True, use_grad: bool = False, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10, callback: Callable | None = None, init_bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}, bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}) → tuple[ndarray, csc_array | None, csc_array | LinearOperator, LinearOperator | None, float, list[float], float, list[LambdaTerm], Fit_info]

Fits a general smooth model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016). An even more general version of :func:solve_gammlss_sparse that can use the L-qEFS update by Krause et al. (submitted) to estimate the coefficients and lambda parameters. The update requires only a function to compute the log-likelihood and a function to compute the gradient of said likelihood with respect to the coefficients. Alternatively full Newton can be used - requiring a function to compute the hessian as well.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GSMMFamily) – Model family
ys (list[np.ndarray | None]) – List of observation vectors
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
smooth_pen (list[LambdaTerm]) – List of penalties
max_outer (int, optional) – Maximum number of outer iterations, defaults to 50
max_inner (int, optional) – Maximum number of inner iterations, defaults to 50
min_inner (int, optional) – Minimum number of inner iterations, defaults to 50
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True
extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”
control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For method != 'qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when setting extend_lambda=True). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. For method=='qEFS' the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed, defaults to 1
optimizer (str, optional) – Deprecated, defaults to “Newton”
method (str, optional) – Which method to use to estimate the coefficients (and lambda parameters), defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When check_cond=0, no check will be performed. When check_cond=1, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (see mssm.src.python.gamm_solvers.est_condition()), defaults to 1
piv_tol (float, optional) – Deprecated, defaults to 0.175
repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True
should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True
form_VH (bool, optional) – Whether to explicitly form matrix V - the estimated inverse of the negative Hessian of the penalized likelihood - and H - the estimate of the Hessian of the log-likelihood - when using the qEFS method, defaults to True
use_grad (bool, optional) – Deprecated, defaults to False
gamma (float, optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models, defaults to 1
qEFSH (str, optional) – Should the hessian approximation use a symmetric rank 1 update (qEFSH='SR1') that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'), defaults to ‘SR1’
overwrite_coef (bool, optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when method='qEFS', defaults to True
max_restarts (int, optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when method='qEFS'. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more, defaults to 0
qEFS_init_converge (bool, optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if overwrite_coef=True) to initialize the q-EFS solver. Ignored if method!='qEFS', defaults to True
prefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients, defaults to False
progress_bar (bool, optional) – Whether progress should be printed or not, defaults to True
n_c (int, optional) – Number of cores to use, defaults to 10
callback (Callable | None ,optional) – An optional callback function to call after every update to the \(\lambda\) parameters. The signature of the provided function needs to match callback(outer:int,llk:float,coef:np.ndarray,lam:[float]) -> None, where outer is the current iteration of the outer algorithm used to update the \(\lambda`\) parameters, llk is the current log-likelihood, coef is the current coefficient estimate, and lam holds a list with the current \(\lambda\) parameters. Defaults to None.
init_bfgs_options (_type_, optional) – An optional dictionary holding the same key:value pairs that can be passed to bfgs_options but pased to the optimizer of the un-penalized problem, defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}
bfgs_options (_type_, optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS', defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}

Returns:

coef estimate, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, if method=='qEFS' an instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation, total edf, term-wise edfs, total penalty, final list of penalties, a Fit_info object

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, scp.sparse.linalg.LinearOperator|None, float, list[float], float, list[LambdaTerm], Fit_info]

mssm.src.python.gamm_solvers.step_fellner_schall_sparse(lgdet_deriv: float, ldet_deriv: float, bSb: float, cLam: float, scale: float) → float

Internal function. Compute a generalized Fellner Schall update step for a lambda term. This update rule is discussed in Wood & Fasiolo (2017).

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.
ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
bSb (float) – cCoef.T@emb_SJ@cCoef where cCoef is current coefficient estimate
cLam (float) – Current lambda value
scale (float) – Optional scale parameter (or 1)

Returns:

The additive update to cLam

Return type:

float

mssm.src.python.gamm_solvers.test_SR1(sk: ndarray, yk: ndarray, rho: ndarray, sks: ndarray, yks: ndarray, rhos: ndarray) → bool

Test whether SR1 update is well-defined for both V and H.

Relies on steps discussed by Byrd, Nocdeal & Schnabel (1992).

References:

Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063

Parameters:

sk (np.ndarray) – New update vector sk
yk (np.ndarray) – New update vector yk
rho (np.ndarray) – New rho
sks (np.ndarray) – Previous update vectors sk
yks (np.ndarray) – Previous update vector sks
rhos (np.ndarray) – Previous rhos

Returns:

Check whether SR1 update is well-defined for both V and H.

Return type:

bool

mssm.src.python.gamm_solvers.undo_extension_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str, family: Family | None) → tuple[float, float]

Internal function. Deals with resetting any extension terms.

Parameters:

lti (int) – Penalty index
lam (float) – Current lamda value
dLam (float) – The lambda update
extend_by (dict) – Extension info dictionary
was_extended (bool) – List holding indication per lambda parameter whether it was extended or not
method (str) – Extension method to use.
family (Family | None) – Deprecated. model family

Raises:

ValueError – If requested method is not implemented

Returns:

Updated values for lam and dlam

Return type:

tuple[float,float]

mssm.src.python.gamm_solvers.updateTheta(mu: ndarray, y: ndarray, family: ExtendedFamily) → ndarray

Updates theta for a ExtendedFamily instance given mu. Returns the new estimate for theta.

Relies on Newton’s method and automatically performs step-length control. theta is chosen to maximize the family’s log-likelihood not the REML criterion. Implementation is based on the estimate.theta function in mgcv which is used by the bam function.

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
estimate.theta function in mgcv: https://github.com/cran/mgcv/blob/master/R/efam.r#L5

Parameters:

mu (np.ndarray) – Vector of mean estimates - one per observation
y (np.ndarray) – Vector of observations
family (ExtendedFamily) – Response family of the model

Returns:

The updated estimate for theta in a np.ndarray of shape (-1,1)

Return type:

np.ndarray

mssm.src.python.gamm_solvers.update_PIRLS(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, X: csc_array, Xb: csc_array, family: Family, Lrhoi: csc_array | None) → tuple[ndarray, csc_array, ndarray | None, csc_array | None]

Internal function. Updates the pseudo-weights and observation vector yb and model matrix Xb of the working model.

Note: Dimensions of yb and Xb might not match those of y and X since rows of invalid pseudo-data observations are dropped here.

Parameters:

y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of model
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model

Returns:

Updated observation vector yb and model matrix Xb of the working model, pseudo-weights, and a diagonal sparse matrix holding the root of the Fisher weights. Latter two are None for strictly additive models.

Return type:

tuple[np.ndarray,scp.sparse.csc_array,np.ndarray|None,scp.sparse.csc_array|None]

mssm.src.python.gamm_solvers.update_coef(yb: ndarray, X: csc_array, Xb: csc_array, family: Family, S_emb: csc_array, S_root: csc_array | None, n_c: int, formula: Formula | None, offset: float | ndarray) → tuple[ndarray, ndarray, ndarray, list[int], csc_array, csc_array, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None]

Internal function. Estimates the coefficients of the model and updates the linear predictor and mean estimates.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

yb (np.ndarray) – vector of observations of the working model
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of Model
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None
n_c (int) – Number of cores
formula (Formula | None) – Formula of model or None
offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

A tuple containing the linear predictor eta, the estimated means mu, the estimated coefficients, the column permutation indices Pr, the column permutation matrix P, the cholesky of the pivoted penalized negative hessian, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, list[int], scp.sparse.csc_array, scp.sparse.csc_array, np.typing.NDArray[np.int_]|None, np.typing.NDArray[np.int_]|None]

mssm.src.python.gamm_solvers.update_coef_and_scale(y: ndarray, yb: ndarray, z: ndarray | None, Wr: csc_array | None, rowsX: int, colsX: int, X: csc_array | None, Xb: csc_array, Lrhoi: csc_array | None, family, S_emb: csc_array, S_root: csc_array | None, S_pinv: csc_array | None, FS_use_rank: list[bool] | None, penalties: list[LambdaTerm] | None, n_c: int, formula: Formula | None, form_Linv: bool, offset: float | ndarray) → tuple[ndarray, ndarray, ndarray, csc_array | None, csc_array | None, list[float], list[float], float, list[float], list[float], float, ndarray, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None]

Internal function to update the coefficients and (optionally) scale parameter of the model.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
z (np.ndarray | None) – vector of pseudo-data (can contain NaNs for invalid observations)
Wr (scp.sparse.csc_array | None) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array | None) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Family of model
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None
S_pinv (scp.sparse.csc_array | None) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool] | None) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
penalties (list[LambdaTerm] | None) – List of penalties
n_c (int) – Number of cores
formula (Formula | None) – (Optionally) Formula of the model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
offset (float | np.ndarray) – Offset (fixed effect) to add to eta

Returns:

A tuple containing the linear predictor eta, the estimated means mu, the estimated coefficients, the unpivoted cholesky of the penalized negative hessian, the inverse of the former (optional), derivative of \(log(|\mathbf{S}_\lambda|_+)\) with respect to lambdas, cCoef.T@emb_SJ@cCoef for each SJ, total edf, termwise edf, a list of the aforementioned sum of the elements of the aforementioned B matrices raised to the power of 2, scale estimate, working residuals, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array | None, scp.sparse.csc_array|None, list[float], list[float], float, list[float], list[float], float, np.ndarray, np.typing.NDArray[np.int_]|None, np.typing.NDArray[np.int_]|None]

mssm.src.python.gamm_solvers.update_coef_gammlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs, coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array | None, FS_use_rank: list[bool] | None, gammlss_penalties: list[LambdaTerm] | None, c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]] | None) → tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None]

Repeatedly perform Newton update with step length control to the coefficient vector - essentially implements algorithm 3 from the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016). Checks for rank deficiency when method != "Chol".

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GAMLSSFamily) – Family of model
mus (list[np.ndarray]) – List of estimated means
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_norm (scp.sparse.csc_array) – Total penalty matrix - normalized/scaled for rank checks
S_pinv (scp.sparse.csc_array | None) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool] | None) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
gammlss_penalties (list[LambdaTerm] | None) – List of penalties
c_llk (float) – Current llk
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]] | None) – Set of previously kept and dropped coeeficients or None

Returns:

A tuple containing an estimate of all coefficients, a split version of the former, updated values for mus, etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop

Return type:

tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, np.typing.NDArray[np.int_] | None, np.typing.NDArray[np.int_] | None]

mssm.src.python.gamm_solvers.update_coef_gen_smooth(family: GSMMFamily, ys: list[ndarray | None], Xs: list[csc_array | None], coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array | None, FS_use_rank: list[bool] | None, smooth_pen: list[LambdaTerm] | None, c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]] | None, opt_raw: LinearOperator | None) → tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, float, float, float, ndarray[tuple[Any, ...], dtype[int64]] | None, ndarray[tuple[Any, ...], dtype[int64]] | None]

Repeatedly perform Newton/Gradient/L-qEFS update with step length control to the coefficient vector - essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).

Based on steps outlined by Wood, Pya, & Säfken (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132

Parameters:

family (GSMMFamily) – Model family
ys (list[np.ndarray | None]) – List of observation vectors
Xs (list[scp.sparse.csc_array | None]) – List of model matrices
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter of llk.
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
S_pinv (scp.sparse.csc_array | None) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool] | None) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
smooth_pen (list[LambdaTerm] | None) – List of penalties
c_llk (float) – Current llk
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]] | None) – Set of previously kept and dropped coeeficients or None
opt_raw (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood

Returns:

A tuple containing an estimate of all coefficients, the negative hessian of the log-likelihood,cholesky of negative hessian of the penalized log-likelihood,inverse of the former (or another instance of scp.sparse.linalg.LinearOperator representing the new quasi-newton approximation), new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional array of the coefficients to keep, an optional array of the estimated coefficients to drop

Return type:

mssm.src.python.gamm_solvers.update_scale_edf(y: ndarray, z: ndarray | None, eta: ndarray, Wr: csc_array | None, rowsX: int, colsX: int, LP: csc_array | None, InvCholXXSP: csc_array | None, Pr: list[int], lgdetDs: list[float], Lrhoi: csc_array | None, family: Family, penalties: list[LambdaTerm], keep: ndarray[tuple[Any, ...], dtype[int64]] | None, drop: ndarray[tuple[Any, ...], dtype[int64]] | None, n_c: int) → tuple[ndarray, csc_array | None, float, list[float], list[float], float]

Internal function. Updates the scale of the model. For this the edf are computed as well - they are returned as well because they are needed for the lambda step.

References:

Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

y (np.ndarray) – vector of observations
z (np.ndarray | None) – vector of pseudo-data (can contain NaNs for invalid observations)
eta (np.ndarray) – vector of linear predictors
Wr (scp.sparse.csc_array | None) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None
InvCholXXSP (scp.sparse.csc_array | None) – Inverse of LP, or None
Pr (list[int]) – Permutation list of LP
lgdetDs (list[float]) – List of derivatives of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambdas.
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Family of model
penalties (list[LambdaTerm]) – List of penalties
keep (np.typing.NDArray[np.int_] | None) – Array of coefficients to keep, can be None -> keep all
drop (np.typing.NDArray[np.int_] | None) – Array of coefficients to drop
n_c (int) – Number of cores to use

Returns:

a tuple containing the working residuals, optionally the unpivoted inverse of LP, total edf, term-wise edf, a list of the aforementioned sum of the elements of the aforementioned B matrices raised to the power of 2, scale estimate

Return type:

tuple[np.ndarray, scp.sparse.csc_array|None, float, list[float], list[float], float]

mssm.src.python.matrix_solvers module

mssm.src.python.matrix_solvers.compute_B(L: csc_array, P: csc_array, lTerm: LambdaTerm, n_c: int = 10, drop: ndarray[tuple[Any, ...], dtype[int64]] | None = None) → float | tuple[float, float]

Solves L @ B = P @ lTerm.D_J_emb for B, then returns B.power(2).sum() or two approximations of this (for very big factor smooth models).

Parameters:

L (scp.sparse.csc_array) – Lower triangular sparse matrix
P (scp.sparse.csc_array) – Permuation matrix
lTerm (LambdaTerm) – Current penalty term
n_c (int, optional) – Number of cores, defaults to 10
drop (np.typing.NDArray[np.int_] | None, optional) – Array of parameters (columns/rows of lTerm.D_J_emb) to drop, defaults to None

Returns:

sum(B.power(2).sum() or sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights) with cluster weights obtained from mssm.src.python.formula.__cluster_discretize().

Return type:

float | tuple[float, float]

mssm.src.python.matrix_solvers.compute_Linv(L: csc_array, n_c: int = 10) → csc_array

Solves L @ inv(L) = I for inv(L) optionally parallelizing over column blocks of I.

Parameters:

L (scp.sparse.csc_array) – Lower triangular sparse matrix
n_c (int, optional) – Number of cores to use, defaults to 10

Returns:

inv(L)

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.compute_block_B_shared(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array) → float

Solves L @ B = T for B via forward solving and based on shared memory for L, then computes and returns B.power(2).sum().

Parameters:

address_dat (str) – Address to data array of L
address_ptr (str) – Address to pointer array of L
address_idx (str) – Address to indices array of L
shape_dat (tuple) – Shape of data array of L
shape_ptr (tuple) – Shape of pointer array of L
rows (int) – Number of rows of L
cols (int) – Number of cols of L
nnz (int) – Number of non-zero elements in L
T (scp.sparse.csc_array) – Target matrix

Returns:

B.power(2).sum()

Return type:

float

mssm.src.python.matrix_solvers.compute_block_B_shared_cluster(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array, cluster_weights: list[float]) → tuple[float, float]

Solves L @ B = T for B via forward solving and based on shared memory for L, then computes and returns sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights).

Parameters:

address_dat (str) – Address to data array of L
address_ptr (str) – Address to pointer array of L
address_idx (str) – Address to indices array of L
shape_dat (tuple) – Shape of data array of L
shape_ptr (tuple) – Shape of pointer array of L
rows (int) – Number of rows of L
cols (int) – Number of cols of L
nnz (int) – Number of non-zero elements in L
T (scp.sparse.csc_array) – Target matrix
cluster_weights (list[float]) – Cluster weights obtained from mssm.src.python.formula.__cluster_discretize().

Returns:

sum(B.power(2).sum()*cluster_weights) and B.power(2).sum()*len(cluster_weights)

Return type:

tuple[float,float]

mssm.src.python.matrix_solvers.compute_block_linv_shared(address_dat: str, address_ptr: str, address_idx: str, shape_dat: tuple, shape_ptr: tuple, rows: int, cols: int, nnz: int, T: csc_array) → csc_array

Solves L@B = T where L is available in shared memory and T is a column subset of the identity matrix.

Parameters:

address_dat (str) – Address to data array of L
address_ptr (str) – Address to pointer array of L
address_idx (str) – Address to indices array of L
shape_dat (tuple) – Shape of data array of L
shape_ptr (tuple) – Shape of pointer array of L
rows (int) – Number of rows of L
cols (int) – Number of cols of L
nnz (int) – Number of non-zero elements in L
T (scp.sparse.csc_array) – Target matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_backsolve_tr(A: csc_array, C: csc_array) → csc_array

Solves A@B=C, where A is sparse and upper triangular. This can be utilized to obtain B = inv(A), when C is the identity.

Parameters:

A (scp.sparse.csc_array) – Lower triangluar sparse matrix
C (scp.sparse.csc_array) – Sparse potentially rectangular matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_chol(A: csc_array) → tuple[csc_array, int]

Computes Cholesky of A.

Parameters:: A (scp.sparse.csc_array) – Some square symmetric matrix
Returns:: Returns Cholesky and code indicating success
Return type:: tuple[scp.sparse.csc_array,int]

mssm.src.python.matrix_solvers.cpp_cholP(A: csc_array) → tuple[csc_array, list[int], int]

Computes pivoted Cholesky of A.

Parameters:: A (scp.sparse.csc_array) – Some square symmetric matrix
Returns:: Returns pivoted Cholesky, pivoted column order, and code indicating success
Return type:: tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_dqrr(A: ndarray) → tuple[list[int], int]

Computes pivoted QR decomposition of dense matrix A.

Parameters:: A (np.ndarray) – Some matrix
Returns:: column pivot order for rank estimation, estimated rank
Return type:: tuple[list[int],int]

mssm.src.python.matrix_solvers.cpp_qr(A: csc_array) → tuple[csc_array, csc_array, list[int], int]

Computes pivoted QR decomposition of A.

Parameters:: A (scp.sparse.csc_array) – Some matrix
Returns:: Matrices Q, R, pivoted column order, and code indicating success
Return type:: tuple[scp.sparse.csc_array,scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_qrr(A: csc_array) → tuple[csc_array, list[int], int, int]

Computes pivoted QR decomposition of A and returns rank estimate

Parameters:: A (scp.sparse.csc_array) – Some matrix
Returns:: Matrices Q, R, pivoted column order, estimated rank, and code indicating success
Return type:: tuple[scp.sparse.csc_array,list[int],int,int]

mssm.src.python.matrix_solvers.cpp_solve_L(X: csc_array, S: csc_array) → tuple[csc_array, list[int], int]

Solves (X.T@X + S)@B=I for B, where (X.T@X + S) is sparse, symmetric, and full rank and I is an identity matrix of suitable dimension via Cholesky decomposition.

Parameters:

X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix

Returns:

B (inverse of pivoted X.T@X + S), list of pivot indices, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_solve_LXX(A: csc_array) → tuple[csc_array, list[int], int]

Solves A@B=I for B, where A is sparse, symmetric, and full rank and I is an identity matrix of suitable dimension via Cholesky decomposition.

Parameters:: A (scp.sparse.csc_array) – Some sparse symmetric matrix
Returns:: B (inverse of pivoted A), list of pivot indices, and code indicating success
Return type:: tuple[scp.sparse.csc_array,list[int],int]

mssm.src.python.matrix_solvers.cpp_solve_am(y: ndarray, X: csc_array, S: csc_array) → tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition and computes inverse of pivoted Cholesky of X.T@X + S.

Parameters:

y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix

Returns:

Inverse of pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coef(y: ndarray, X: csc_array, S: csc_array) → tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition.

Parameters:

y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix

Returns:

Pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coefXX(Xy: ndarray, XXS: csc_array) → tuple[csc_array, list[int], ndarray, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse Cholesky decomposition with (X.T@X + S) and X.T@y pre-computed.

Parameters:

Xy (np.ndarray) – Holds X.T@y
XXS (scp.sparse.csc_array) – Holds (X.T@X + S)

Returns:

Pivoted Cholesky of X.T@X + S, column pivot indices in a list, b, and code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],np.ndarray,int]

mssm.src.python.matrix_solvers.cpp_solve_coef_pqr(y: ndarray, X: csc_array, E: csc_array) → tuple[csc_array, list[int], list[int], ndarray, int, int]

Solves (X.T@X + S)@b = X.T@y for b via sparse QR decomposition, where E.T@E=S.

Does not form ``X.T@X + S`` for solve. Potentially pivots twice - once for sparsity (always) and then once more whenever algorithm detects a diagonal element that is too small.

Examples:

# Solve
RP,Pr1,Pr2,coef,rank,code = cpp_solve_coef_pqr(yb,Xb,S_root.T.tocsc())

# Need to get overall pivot...
P1 = compute_eigen_perm(Pr1)
P2 = compute_eigen_perm(Pr2)
P = P2.T@P1.T

# Need to insert zeroes in case of rank deficiency - first insert nans to that we
# can then easily find dropped coefs.
if rank < S_emb.shape[1]:
   coef = np.concatenate((coef,[np.nan for _ in range(S_emb.shape[1]-rank)]))

# Can now unpivot coef
coef = coef @ P

# And identify which coef was dropped
idx = np.arange(len(coef))
drop = idx[np.isnan(coef)]
keep = idx[np.isnan(coef)==False]

# Now actually set dropped ones to zero
coef[drop] = 0

# Convert R so that rest of code can just continue as with Chol (i.e., L)
LP = RP.T.tocsc()

# Keep only columns of Pr/P that belong to identifiable params. So P.T@LP is Cholesky of
# negative penalized Hessian of model without unidentifiable coef. Important: LP and Pr/P no
# longer match dimensions of embedded penalties after this! So we need to keep track of that
# in the appropriate functions (i.e., `calculate_edf` which calls `compute_B` when called
# with only LP and not Linv).
P = P[:,keep]
_,Pr,_ = translate_sparse(P.tocsc())
P = compute_eigen_perm(Pr)

Parameters:

y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
E (scp.sparse.csc_array) – Sparse square matrix

Returns:

Pivoted Cholesky of X.T@X + S, first column pivot indices in a list, second column pivot indices in a list, b, estimated rank, and code indicating success.

Return type:

tuple[scp.sparse.csc_array,list[int],list[int],np.ndarray,int,int]

mssm.src.python.matrix_solvers.cpp_solve_qr(A: csc_array) → tuple[csc_array, int, int]

Solves A@B=I for B, where A is sparse, square, and full rank and I is an identity matrix of suitable dimension via QR decomposition.

Parameters:: A (scp.sparse.csc_array) – Some sparse square matrix
Returns:: B (inverse of A), estimated rank, and code indicating success
Return type:: tuple[scp.sparse.csc_array,int,int]

mssm.src.python.matrix_solvers.cpp_solve_tr(A: csc_array, C: csc_array) → csc_array

Solves A@B=C, where A is sparse and lower triangular. This can be utilized to obtain B = inv(A), when C is the identity.

Parameters:

A (scp.sparse.csc_array) – Lower triangluar sparse matrix
C (scp.sparse.csc_array) – Sparse potentially rectangular matrix

Returns:

B

Return type:

scp.sparse.csc_array

mssm.src.python.matrix_solvers.cpp_symqr(A: csc_array, tol: float) → tuple[csc_array, list[int], list[int], int, int]

Computes pivoted QR decomposition of symmetric matrix A.

Parameters:

A (scp.sparse.csc_array) – Some symmetric matrix
tol (float) – tolerance for rank estimation

Returns:

Matrix R, column pivot order for sparsity, column pivot order for rank estimation, rank estimate, code indicating success

Return type:

tuple[scp.sparse.csc_array,list[int],list[int],int,int]

mssm.src.python.matrix_solvers.est_condition(L: csc_array, Linv: csc_array, seed: int | None = 0, verbose: bool = True) → tuple[float, float, float, int]

Estimate the condition number K - the ratio of the largest to smallest singular values - of matrix A, where A.T@A = L@L.T.

L and Linv can either be obtained by Cholesky decomposition, i.e., A.T@A = L@L.T or by QR decomposition A=Q@R where R=L.T.

If verbose=True (default), separate warnings will be issued in case K>(1/(0.5*sqrt(epsilon))) and K>(1/(0.5*epsilon)). If the former warning is raised, this indicates that computing L via a Cholesky decomposition is likely unstable and should be avoided. If the second warning is raised as well, obtaining L via QR decomposition (of A) is also likely to be unstable (see Golub & Van Loan, 2013).

References:

Cline et al. (1979). An Estimate for the Condition Number of a Matrix.
Golub & Van Loan (2013). Matrix computations, 4th edition.

Parameters:

L (scp.sparse.csc_array) – Cholesky or any other root of A.T@A as a sparse matrix.
Linv (scp.sparse.csc_array) – Inverse of Choleksy (or any other root) of A.T@A.
seed (int or None or numpy.random.Generator) – The seed to use for the random parts of the singular value decomposition. Defaults to 0.
verbose (bool) – Whether or not warnings should be printed. Defaults to True.

Returns:

A tuple, containing the estimate of condition number K, an estimate of the largest singular value of A, an estimate of the smallest singular value of A, and a code. The latter will be zero in case no warning was raised, 1 in case the first warning described above was raised, and 2 if the second warning was raised as well.

Return type:

tuple[float,float,float,int]

mssm.src.python.matrix_solvers.map_csc_to_eigen(X: csc_array) → tuple[int, int, int, ndarray, ndarray, ndarray]

Pybind11 comes with copy overhead for sparse matrices, so instead of passing the sparse matrix to c++, I pass the data, indices, and indptr arrays as buffers to c++. see: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html.

An Eigen mapping can then be used to refer to these, without requiring an extra copy. see: https://eigen.tuxfamily.org/dox/classEigen_1_1Map_3_01SparseMatrixType_01_4.html

The mapping needs to assume compressed storage, since then we can use the indices, indptr, and data arrays directly for the valuepointer, innerPointer, and outerPointer fields of the sparse array map constructor. see: https://eigen.tuxfamily.org/dox/group__TutorialSparse.html (section sparse matrix format).

I got this idea from the NumpyEigen project, which also uses such a map! see: https://github.com/fwilliams/numpyeigen/blob/master/src/npe_sparse_array.h#L74

Parameters:: X (scp.sparse.csc_array) – Some sparse matrix
Returns:: Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)
Return type:: tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.matrix_solvers.map_csr_to_eigen(X: csr_array) → tuple[int, int, int, ndarray, ndarray, ndarray]

see: map_csc_to_eigen()

Parameters:: X (scp.sparse.csr_array) – Some sparse matrix
Returns:: Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)
Return type:: tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.matrix_solvers.translate_sparse(mat: csc_array) → tuple[ndarray, ndarray, ndarray]

Translate canonical sparse csc matrix representation into data, row, col representation

See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_array.html#scipy.sparse.csc_array # noqa: E501

Parameters:: mat (scp.sparse.csc_array) – sparse matrix
Returns:: data, rows, cols of sparse matrix
Return type:: tuple[np.ndarray,np.ndarray,np.ndarray]

mssm.src.python.penalties module

class mssm.src.python.penalties.DifferencePenalty

Bases: Penalty

Difference Penalty class. Generates penalty matrices for smooth terms.

Variables:: pen_type (PenType.DIFFERENCE) – Type of the penalty matrix.

constructor(n: int, constraint: Constraint | None, m: int = 2) → tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates difference (order=m) n*n penalty matrix + root of the penalty. Based on code in Eilers & Marx (1996) and Wood (2017).

References:

Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

n (int) – Dimension of square penalty matrix
constraint (Constraint|None) – Any contraint to absorb by the penalty or None if no constraint is required
m (int, optional) – Differencing order to apply to the identity matrix to get the penalty (this will also be the dimension of the penalty’s Kernel), defaults to 2

Returns:

penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

class mssm.src.python.penalties.IdentityPenalty(pen_type: PenType)

Bases: Penalty

Difference Penalty class. Generates penalty matrices for smooth terms and random terms.

Parameters:: pen_type (PenType) – Type of the penalty matrix
Variables:: pen_type (PenType) – Type of the penalty matrix passed to init method.

constructor(n: int, constraint: Constraint | None, f: Callable | None = None) → tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates identity matrix penalty + root in case f is None.

Note: This penalty never absorbs marginal constraints. It always returns an identity matrix but just decreases n by 1 if constraint is not None to ensure that the returned penalty matrix is of suitable dimensions.

Parameters:

n (int) – Dimension of square penalty matrix
constraint (Constraint|None) – Any contraint to absorb by the penalty or None if no constraint is required
f (Callable|None, optional) – Any kind of function to apply to the diagonal elements of the penalty, defaults to None

Returns:

penalty data, penalty row indices, penalty column indices, root of penalty data, root of penalty row indices, root of penalty column indices, rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

class mssm.src.python.penalties.Penalty(pen_type: PenType)

Bases: object

Penalty base-class. Generates penalty matrices for smooth terms.

Parameters:: pen_type (PenType) – Type of the penalty matrix
Variables:: pen_type (PenType) – Type of the penalty matrix passed to the init method.

constructor(n: int, constraint: Constraint | None, *args, **kwargs) → tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Creates penalty matrix + root of the penalty and returns both in list form (data, row indices, col indices).

Parameters:

n (int) – Dimension of square penalty matrix
constraint (Constraint | None) – Any contraint to absorb by the penalty or None if no constraint is required

Returns:

penalty data, penalty row indices, penalty column indices, root of penalty data, root of penalty row indices, root of penalty column indices, rank of penalty

Return type:

tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

mssm.src.python.penalties.TP_pen(S_j: csc_array, D_j: csc_array, j: int, ks: list[int], constraint: Constraint | None, scale_pen: bool) → tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]

Computes a tensor smooth penalty + root as defined in section 5.6 of Wood (2017) based on marginal penalty matrix S_j.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

S_j (scp.sparse.csc_array) – Marginal penalty matrix
D_j (scp.sparse.csc_array) – Root of marginal penalty matrix
j (int) – Index for current marginal
ks (list[int]) – List of number of basis functions of all marginals
constraint (Constraint | None) – Any constraint to absorb by the final penalty or None if no constraint is required

Returns:

penalty data, penalty row indices, penalty column indices, root of penalty data, root of penalty row indices, root of penalty column indices, rank of penalty

Return type:

tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]

mssm.src.python.penalties.adjust_pen_drop(dat: list[float], rows: list[int], cols: list[int], drop: list[int], offset: int = 0) → tuple[list[float], list[int], list[int], int]

Adjusts penalty matrix (represented via dat, rows, and cols) by dropping rows and columns indicated by drop.

Optionally, offset is added to the elements in rows and cols, which is useful when indices in drop do not start at zero.

Parameters:

dat ([float]) – List of elements in penalty matrix.
rows ([int]) – List of row indices of penalty matrix.
cols ([int]) – List of column indices of penalty matrix.
drop ([int]) – Rows and columns to drop from penalty matrix. Might actually contain indices corresponding to rows + offset and cols + offset, which can be corrected for via the offset argument.
offset (int, optional) – An optional offset to add to rows and cols to adjust for the indexing in drop, defaults to 0

Returns:

A tuple with 4 elements: the data, rows, and cols of the adjusted penalty matrix excluding dropped elements and the number of excluded elements.

Return type:

tuple[list[float],list[int],list[int],int]

mssm.src.python.penalties.combine_shared_penalties(penalties: list[LambdaTerm]) → list[LambdaTerm]

Identifies penalties that should share a lambda parameter and merges them into a single LambdaTerm.

Parameters:: penalties (list[LambdaTerm]) – A list of term-specific penalties, some of which might have a id flag.
Returns:: A list of penalty matrices in which individual penalties with a shared id flag have been combined into a single LambdaTerm.
Return type:: list[LambdaTerm]

mssm.src.python.penalties.create_id_dict(penalties: list[LambdaTerm]) → dict | None

Identifies penalties that should share a lambda parameter and fills a dictionary holding penalty indices for each shared penalty (id).

Parameters:: penalties (list[LambdaTerm]) – A list of term-specific penalties, some of which might have a id flag.
Returns:: A dictionary. Keys are ids shared by two or more penalties. Values are lists holding corresponding penalty indices. If no id exists that is shared by two or more penalties, None is returned instead.
Return type:: dict|None

mssm.src.python.penalties.embed_in_S_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], S_emb: csc_array | None, S_col: int, SJ_col: int, cIndex: int) → tuple[csc_array, int]

Embed a term-specific penalty matrix SJ (provided as three lists: pen_data, pen_rows and pen_cols) into the total penalty matrix S_emb (see Wood, 2017)

Parameters:

pen_data (list[float]) – Data of SJ
pen_rows (list[int]) – Row indices of SJ
pen_cols (list[int]) – Column indices of SJ
S_emb (scp.sparse.csc_array | None) – Total penalty matrix or None in case S_emb will be initialized by the function.
S_col (int) – Columns of total penalty matrix
SJ_col (int) – Columns of SJ
cIndex (int) – Current row and column index indicating the top left cell of the (SJ_col * SJ_col) block SJ should take up in S_emb

Returns:

S_emb with SJ embedded, the updated cIndex (i.e., cIndex + SJ_col)

Return type:

tuple[scp.sparse.csc_array,int]

mssm.src.python.penalties.embed_in_Sj_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], Sj: csc_array | None, SJ_col: int) → csc_array

Parameterize a term-specific penalty matrix SJ (provided as three lists: pen_data, pen_rows and pen_cols).

Parameters:

pen_data (list[float]) – Data of SJ
pen_rows (list[int]) – Row indices of SJ
pen_cols (list[int]) – Column indices of SJ
Sj (scp.sparse.csc_array | None) – A sparse matrix or None. In the latter case, SJ is simply initialized by the function. If not, then the function returns SJ + Sj. The latter is useful if a term penalty is a sum of individual penalty matrices.
SJ_col (int) – Columns of SJ

Returns:

SJ which might actually be SJ + Sj.

Return type:

scp.sparse.csc_array

mssm.src.python.penalties.embed_shared_penalties(shared_penalties: list[list[LambdaTerm]], formulas: list, extra_coef: int) → list[LambdaTerm]

Embed penalties from individual formulas into overall penalties for GAMMLSS/GSMM models.

Parameters:

shared_penalties (list[list[LambdaTerm]]) – Nested list, with the inner one containing the penalties associated with an individual formula in formulas.
formulas (list) – List of mssm.src.python.formula.Formula objects
extra_coef (int) – Number of extra coefficients required by the model’s family. Will result in the shared penalties being padded by an extra block of extra_coef zeroes.

Returns:

A list of the embedded penalties required by a GAMMLSS or GSMM model.

Return type:

list[LambdaTerm]

mssm.src.python.penalties.sort_penalties(penalties: list[LambdaTerm]) → list[LambdaTerm]

Sorts penalties by start_index in ascending order.

Parameters:: penalties (list[LambdaTerm]) – A list of term-specific penalties.
Returns:: A list of term-specific penalties, sorted by start index.
Return type:: list[LambdaTerm]

mssm.src.python.penalties.split_shared_penalties(merged_penalties: list[LambdaTerm]) → list[LambdaTerm]

Identifies penalties that share a lambda parameter and splits them into individual LambdaTerm`s - all having the same :math:lambda` value.

Basically inverts what is achieved by the combine_shared_penalties() function.

Parameters:: merged_penalties (list[LambdaTerm]) – list of penalty matrices in which individual penalties with a shared id flag have been combined into a single LambdaTerm.
Returns:: A list of penalty matrices in which merged penalties have been split into individual penalties again.
Return type:: list[LambdaTerm]

mssm.src.python.repara module

mssm.src.python.repara.reparam(X: csc_array | None, S: list[LambdaTerm], cov: ndarray | None, option: int = 1, n_bins: int = 30, QR: bool = False, identity: bool = False, scale: bool = False, form_inverse: int = 0, form_root: bool = False, n_c: int = 10) → tuple

Options 1 - 3 are natural reparameterization discussed in Wood (2017; 5.4.2) with different strategies for the QR computation of \(\mathbf{X}\). Option 4 helps with stabilizing the REML computation and is from Appendix B of Wood (2011) and section 6.2.7 in Wood (2017):

Form complete matrix \(\mathbf{X}\) based on entire covariate.

Form matrix \(\mathbf{X}\) only based on unique covariate values.

Form matrix \(\mathbf{X}\) on a sample of values making up covariate. Covariate is split up into n_bins equally wide bins. The number of covariate values per bin is then calculated. Subsequently, the ratio relative to minimum bin size is computed and each ratio is rounded to the nearest integer. Then ratio samples are obtained from each bin. That way, imbalance in the covariate is approximately preserved when forming the QR.

Transform term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on Appendix B of Wood (2011) and section 6.2.7 in Wood (2017) so that they are full-rank and their log-determinant can be computed safely. In that case, only S needs to be provided and has to be a list holding the penalties to be transformed. If the transformation is to be applied to model matrices, coefficients, hessian, and covariance matrices X should be set to something other than None (does not matter what, can for example be the first model matrix.) The mssm.src.python.gamm_solvers.reparam_model() function can be used to apply the transformation and also returns the required transformation matrices to reverse it.

For Options 1-3:

If QR==True then \(\mathbf{X}\) is decomposed into \(\mathbf{Q}\mathbf{R}\) directly via QR decomposition. Alternatively, we first form \(\mathbf{X}^T\mathbf{X}\) and then compute the cholesky \(\mathbf{L}\) of this product - note that \(\mathbf{L}^T = \mathbf{R}\). Overall the latter strategy is much faster (in particular if option==1), but the increased loss of precision in \(\mathbf{L}^T = \mathbf{R}\) might not be ok for some.

After transformation S only contains elements on it’s diagonal and \(\mathbf{X}\) the transformed functions. As discussed in Wood (2017), the transformed functions are decreasingly flexible - so the elements on \(\mathbf{S}\) diagonal become smaller and eventually zero, for elements that are in the kernel of the original \(\mathbf{S}\) (un-penalized == not flexible).

For a similar transformation (based solely on \(\mathbf{S}\)), Wood et al. (2013) show how to further reduce the diagonally transformed \(\mathbf{S}\) to an even simpler identity penalty. As discussed also in Wood (2017) the same behavior of decreasing flexibility if all entries on the diagonal of \(\mathbf{S}\) are 1 can only be maintained if the transformed functions are multiplied by a weight related to their wiggliness. Specifically, more flexible functions need to become smaller in amplitude - so that for the same level of penalization they are removed earlier than less flexible ones. To achieve this Wood further post-multiply the transformed matrix \(\mathbf{X}'\) with a matrix that contains on it’s diagonal the reciprocal of the square root of the transformed penalty matrix (and 1s in the last cells corresponding to the kernel). This is done here if identity=True.

In mgcv the transformed model matrix and penalty can optionally be scaled by the root mean square value of the transformed model matrix (see the nat.param function in mgcv). This is done here if scale=True.

For Option 4:

Option 4 enforces re-parameterization of term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on section Wood (2011) and section 6.2.7 in Wood (2017). In mssm multiple penalties can be placed on individual terms (i.e., tensor terms, random smooths, Kernel penalty) but it is not always the case that the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) - i.e., the sum over all those individual penalties multiplied with their \(\lambda\) parameters, is of full rank. If we need to form the inverse of the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) this is problematic. It is also problematic, as discussed by Wood (2011), if the different \(\lambda\) are all of different magnitude in which case forming the term-specific \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|+)\) becomes numerically difficult.

The re-parameterization implemented by option 4, based on Appendix B in Wood (2011), solves these issues. After this re-parameterization a term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) has been formed that is full rank. And \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|)\) - no longer just a generalized determinant - can be computed without running into numerical problems.

The strategy by Wood (2011) could be applied to form an overall - not just term-specific - \(\mathbf{S}_{\boldsymbol{\lambda}}\) with these properties. However, this does not work for general smooth models as defined by Wood et al. (2016). Hence, mssm opts for the blockwise strategy. However, in mssm penalties currently cannot overlap, so this is not necessary at the moment.

References:

Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Scheipl, F., & Faraway, J. J. (2013). Straightforward intermediate rank tensor product smoothing in mixed models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
mgcv source code (accessed 2024). smooth.R file, nat.param function.

Parameters:

X (scp.sparse.csc_array | None) – Model/Term matrix or None
S (list[LambdaTerm]) – List of penalties
cov (np.ndarray | None) – covariate array associated with a specific term or None
option (int, optional) – Which re-parameterization to compute, defaults to 1
n_bins (int, optional) – Number of bins to use as part of option 3, defaults to 30
QR (bool, optional) – Whether to rely on a QR decomposition or not (then a Cholesky is used) as part of options 1-3, defaults to False
identity (bool, optional) – Whether the penalty matrix should be transformed to identity as part of options 1-3, defaults to False
scale (bool, optional) – Whether the penalty matrix and term matrix should be scaled as part of options 1-3, defaults to False

Returns:

Return object content depends on option but will usually hold informations to apply/undo the required re-parameterization as well as already re-parameterized objects.

Return type:

tuple

mssm.src.python.repara.reparam_model(dist_coef: list[int], dist_up_coef: list[int], coef: ndarray, split_coef_idx: list[int], Xs: list[csc_array], penalties: list[LambdaTerm], form_inverse: bool = True, form_root: bool = True, form_balanced: bool = True, n_c: int = 1) → tuple[ndarray, list[csc_array], list[LambdaTerm], csc_array, csc_array | None, csc_array | None, csc_array | None, csc_array, list[csc_array]]

Relies on the transformation strategy from Appendix B of Wood (2011) to re-parameterize the model.

Coefficients, model matrices, and penalties are all transformed. The transformation is applied to each term separately as explained by Wood et al., (2016).

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.

Parameters:

dist_coef ([int]) – List of number of coefficients per formula/linear predictor/distribution parameter of model.
dist_up_coef ([int]) – List of number of unpenalized (i.e., fixed effects, linear predictors/parameters) coefficients per formula/linear predictor/distribution parameter of model.
coef (numpy.array) – Vector of coefficients (numpy.array of dim (-1,1)).
split_coef_idx ([int]) – List with indices to split coef vector into separate versions per linear predictor.
Xs ([scp.sparse.csc_array]) – List of model matrices obtained for example via model.get_mmat().
penalties ([LambdaTerm]) – List of penalties for model.
form_inverse (bool, optional) – Whether or not an inverse of the transformed penalty matrices should be formed. Useful for computing the EFS update, defaults to True
form_root (bool, optional) – Whether or not to form a root of the total penalty, defaults to True
form_balanced (bool, optional) – Whether or not to form the “balanced” penalty as described by Wood et al. (2016) after the re-parameterization, defaults to True
n_c (int, optional) – Number of cores to use to ocmpute the inverse when form_inverse=True, defaults to 1

Raises:

ValueError – Raises a value error if one of the inverse computations fails.

Returns:

A tuple with 9 elements: the re-parameterized coefficient vector, a list with the re-parameterized model matrices, a list of the penalties after re-parameterization, the total re-parameterized penalty matrix, optionally the balanced version of the former, optionally a root of the re-parameterized total penalty matrix, optionally the inverse of the re-parameterized total penalty matrix, the transformation matrix Q so that Q.T@S_emb@Q = S_emb_rp where S_emb and S_emb_rp are the total penalty matrix before and after re-parameterization, a list of transformation matrices QD so that XD@QD=XD_rp where XD and XD_rp are the model matrix of the Dth linear predictor before and after re-parameterization.

Return type:

tuple[np.ndarray, list[scp.sparse.csc_array], list[LambdaTerm], scp.sparse.csc_array, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array, list[scp.sparse.csc_array]]

mssm.src.python.smooths module

mssm.src.python.smooths.B_spline_basis(cov: ndarray, event_onset: int | None, nk: int, min_c: float | None = None, max_c: float | None = None, drop_outer_k: bool = False, convolve: bool = False, deg: int = 3) → ndarray

Computes B-spline basis of degree deg given knots.

Based on code and definitions in “Splines, Knots, and Penalties” by Eilers & Marx (2010) and adapted to allow for convolving B-spline bases.

References:

Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125

Parameters:

cov (np.ndarray) – Flattened covariate array (i.e., of shape (-1,))
event_onset (int | None) – Sample on which to place a dirac delta with which the B-spline bases should be convolved - ignored if convolve==False.
nk (int) – Number of basis functions to create
min_c (float | None, optional) – Minimum covariate value, defaults to None
max_c (float | None, optional) – Maximum covariate value, defaults to None
drop_outer_k (bool, optional) – Deprecated, defaults to False
convolve (bool, optional) – Whether basis functions should be convolved (i.e., time-shifted) with an impulse response function triggered at event_onset, defaults to False
deg (int, optional) – Degree of basis, defaults to 3

Returns:

An array of shape (-1,nk) holding the nk Basis functions evaluated over x and optionally convolved with an impulse response function triggered at event_onset

Return type:

np.ndarray

mssm.src.python.smooths.TP_basis_calc(cTP: ndarray, nB: ndarray) → ndarray

Computes row-wise Kroenecker product between cTP and nB. Useful to create a Tensor smooth basis.

See Wood(2017) 5.6.1 and B.4.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

cTP (np.ndarray) – Marginal basis or partially accumulated tensor smooth basis
nB (np.ndarray) – Marginal basis to include in the tensor smooth

Returns:

The row-wise Kroenecker product between cTP and nB

Return type:

np.ndarray

mssm.src.python.smooths.bbase(x: ndarray, knots: ndarray, dx: float, deg: int) → ndarray

Computes B-spline basis of degree deg given knots and interval spacing dx.

Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)

References:

Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125

Parameters:

x (np.ndarray) – Covariate
knots (np.ndarray) – knot location vector
dx (float) – Interval spacing (xr-xl) / ndx where xr and xl are max and min of x and ndx=nk-deg where nk is the number of basis functions.
deg (int) – Degree of basis

Returns:

numpy.array of shape (-1,``nk``)

Return type:

np.ndarray

mssm.src.python.smooths.convolve_event(f: ndarray, pulse_location: int) → ndarray

Convolution of function f with dirac delta spike centered around sample pulse_locations.

Based on code by Wierda et al. 2012

References:

Wierda, S. M., van Rijn, H., Taatgen, N. A., & Martens, S. (2012). Pupil dilation deconvolution reveals the dynamics of attention at high temporal resolution. https://doi.org/10.1073/pnas.1201858109

Parameters:

f (np.ndarray) – Function evaluated over some samples
pulse_location (int) – Location of spike (in sample)

Returns:

Convolved function as array

Return type:

np.ndarray

mssm.src.python.smooths.tpower(x: ndarray, t: ndarray, p: int) → ndarray

Computes truncated p-t power function of x.

Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)

References:

Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125

Parameters:

x (np.ndarray) – Covariate
t (np.ndarray) – knot location vector
p (int) – degrees of spline basis

Returns:

np.power(x - t,p) * (x > t)

Return type:

np.ndarray

mssm.src.python.terms module

class mssm.src.python.terms.GammTerm(variables: list[str], type: TermType, is_penalized: bool, penalty: list[Penalty], pen_kwargs: list[dict])

Bases: object

Base-class implemented by the terms passed to mssm.src.python.formula.Formula.

Parameters:

variables ([str]) – List of variables as strings.
type (TermType) – Type of term as enum
is_penalized (bool) – Whether the term is penalized/can be penalized or not
penalty ([Penalty]) – The default penalties associated with a term.
pen_kwargs ([dict]) – A list of dictionaries, each with key-word arguments passed to the construction of the corresponding Penalty in penalty.

build_matrix(*args, **kwargs)

Builds the design/term/model matrix associated with this term and returns it represented as a list of values, a list of row indices, and a list of column indices. Also returns an update to the column index (i.e., how many columns this matrix adds).

This method is implemented by every implementation of the GammTerm class. The returned lists can then be used to create a sparse matrix for this term.

build_penalty(penalties: list[LambdaTerm], cur_pen_idx: int, *args, **kwargs) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.

Returns:

Updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(*args, **kwargs)

Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

This method is implemented by every implementation of the GammTerm class.

mssm.src.python.terms.build_ir_smooth_series(irsterm: irf, s_cov: ndarray, s_event: int, var_map: dict, var_mins: dict, var_maxs: dict, by_levels: ndarray | None) → ndarray

Function to build the impulse response martrix for a single time-series.

Parameters:

irsterm (irf) – Impulse response smooth term
s_cov (np.ndarray) – covariate array associated with irsterm
s_event (int) – Onset of impulse response function
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.
var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.
by_levels (np.ndarray | None) – Numpy array holding the levels of the factor associated with the irsterm term (via irsterm.by) or None

Returns:

The term matrix associated with the particular event at s_event

Return type:

np.ndarray

mssm.src.python.terms.build_linear_term(lTerm: l | rs, has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with a linear/random term and returns it represented as a list of values, a list of row indices, and a list of column indices.

Parameters:

lTerm – Linear or random slope term
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

class mssm.src.python.terms.f(variables: list, by: str = None, by_cont: str = None, binary: tuple[str, str] | None = None, id: int | None = None, nk: int | list[int] = None, te: bool = False, rp: int = 0, constraint: ~mssm.src.python.custom_types.ConstType = ConstType.QR, identifiable: bool = True, scale_te: bool = False, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {}, is_penalized: bool = True, penalize_null: bool = False, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)

Bases: GammTerm

A univariate or tensor interaction smooth term. If variables only contains a single variable \(x\), this term will represent a univariate \(f(x)\) in a model:

\[\mu_i = a + f(x_i)\]

For example, the model below in mgcv:

bam(y ~ s(x,k=10) + s(z,k=20))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19)]),Gaussian())

If variables contains two variables \(x\) and \(z\), then this term will either represent the tensor interaction \(f(x,z)\) in model:

\[\mu_i = a + f(x_i) + f(z_i) + f(x_i,z_i)\]

or in model:

\[\mu_i = a + f(x_i,z_i)\]

The first behavior is achieved by setting te=False. In that case it is necessary to add ‘main effect’ f terms for \(x\) and \(y\). In other words, the behavior then mimicks the ti() term available in mgcv (Wood, 2017). If te=True, the term instead behaves like a te() term in mgcv, so no separate smooth effects for the main effects need to be included.

For example, the model below in mgcv:

bam(y ~ te(x,z,k=10))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x","z"],nk=9,te=True)]),Gaussian())

In addition, the model below in mgcv:

bam(y ~ s(x,k=10) + s(z,k=20) + ti(x,z,k=10))

would be expressed as follows in mssm:

GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19),f(["x","z"],nk=9,te=False)]),
  Gaussian())

By default a B-spline basis is used with nk=9 basis functions (after removing identifiability constrains). This is equivalent to mgcv’s default behavior of using 10 basis functions (before removing identifiability constrains). In case variables contains more then one variable nk can either bet set to a single value or to a list containing the number of basis functions that should be used to setup the spline matrix for every variable. The former implies that the same number of coefficients should be used for all variables. Keyword arguments that change the computation of the spline basis can be passed along via a dictionary to the basis_kwargs argument. Importantly, if multiple variables are present and a list is passed to nk, a list of dictionaries with keyword arguments of the same length needs to be passed to basis_kwargs as well.

Multiple penalties can be placed on every term by adding Penalty to the penalties argument. In case variables contains multiple variables a separate tensor penalty (see Wood, 2017) will be created for every penalty included in penalties. Again, key-word arguments that alter the behavior of the penalty creation need to be passed as dictionaries to pen_kwargs for every penalty included in penalties. By default, a univariate term is penalized with a difference penalty of order 2 (Eilers & Marx, 2010).

References:

Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.
by (str, optional) – A string corresponding to a factor in data passed to Formula. Separate f(variables) (and smoothness penalties) will be estimated per level of by.
by_cont (str, optional) – A string corresponding to a numerical variable in data passed to Formula. The model matrix for the estimated smooth term f(variables) will be multiplied by the column of this variable. Can be used to estimate ‘varying coefficient’ models but also to set up binary smooths or to only estimate a smooth term for specific levels of a factor (i.e., what is possible for ordered factors in R & mgcv).
binary ([str,str], optional) – A list containing two strings. The first string corresponds to a factor in data passed to Formula. A separate f(variables) will be estimated for the level of this factor corresponding to the second string.
id (int|None, optional) – Different smooth functions with the same id share their \(\lambda\) values. Effect differs when also specifying a by variable: In that case, if id is set to any integer the penalties placed on the separate f(variables) will share a single smoothness penalty and other smooth terms will ignore this term’s particular id.
nk (int or list[int], optional) – Number of basis functions to use. Even if identifiable is true, this number will reflect the final number of basis functions for this term (i.e., mssm acts like you would have asked for 10 basis functions if nk=9 and identifiable=True; the default).
te (bool, optional) – For tensor interaction terms only. If set to false, the term mimics the behavior of ti() in mgcv (Wood, 2017). Otherwise, the term behaves like a te() term in mgcv - i.e., the marginal basis functions are not removed from the interaction.
rp (int, optional) – Whether or not to re-parameterize the term. Currently the Demmler & Reinsch parameterization is supported for univariate smooth terms (rp=1) and the ‘natural’ parameterization discussed by Wood (2006) for tensor smooth terms (rp=2). Important: when relying on the qEFS update to estimate smoothing penalty parameters, performance drops drastically for tensor smooth terms when not relying on the parameterization by Wood (2006). Hence, it is reccomended that you set rp=2 for tensor smooths when relying on this update. Defaults to 0, meaning no re-parameterization.
constraint (mssm.src.constraints.ConstType, optional) – What kind of identifiability constraints should be absorbed by the terms (if they are to be identifiable). Either QR-based constraints (default, well-behaved), by means of column-dropping (no infill, not so well-behaved), or by means of difference re-coding (little infill, not so well behaved either).
identifiable (bool, optional) – Whether or not the constant should be removed from the space of functions this term can fit. Achieved by enforcing that \(\mathbf{1}^T \mathbf{X} = 0\) (\(\mathbf{X}\) here is the spline matrix computed for the observed data; see Wood, 2017 for details). Necessary in most cases to keep the model identifiable.
scale_te (bool, optional) – Whether or not the penalty matrices of marginal smooths should be scaled by their largest eigenvalue. This can improve numerical stability and is thus reccomended when relying on the qEFS update to estimate smoothing penalty parameters. Set to False by default.
basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in mssm.src.smooths.B_spline_basis().
basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. Consult the docstring of the function computing the basis you want. For the default B-spline basis for example see the mss.src.smooths.B_spline_basis() function. The default arguments set by any basis function, should work for most cases though.
is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.
penalize_null (bool, optional) – Should a separate Null-space penalty (Marra & Wood, 2011) be placed on the term. By default, the term here will leave a linear f(variables) un-penalized! Thus, there is no option for the penalty to achieve f(variables) = 0 even if that would be supported by the data. Adding a Null-space penalty provides the penalty with that power. This can be used for model selection instead of Hypothesis testing and is the preferred way in mssm (see Marra & Wood, 2011 for details).
penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.
pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. Consult the docstring of the Penalty.constructor() method of the specific Penalty you want to use for details.

absorb_repara(rpidx, X, cov, min_c, max_c)

Computes all terms necessary to absorb a re-parameterization into the term and penalty matrix.

References:

Wood, S. N. (2006). Low‐Rank Scale‐Invariant Tensor Product Smooths for Generalized Additive Mixed Models. Biometrics, 62(4), 1025–1036. https://doi.org/10.1111/j.1541-0420.2006.00574.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

rpidx (int) – Index to specific reparam. obejct. There must be a 1 to 1 relationship between reparam. objects and the number of marginals required by this smooth (i.e., the number of variables).
X (scipy.sparse.csc_array) – Design matrix associated with this term.
cov (np.ndarray) – The covariate this term is a function of as a flattened numpy array.
min_c (float) – The minimum value of the covariate this term is a function of as a float.
max_c (float) – The maximum value of the covariate this term is a function of as a float.

Raises:

ValueError – If this method is called with rpidx exceeding the number of this term’s RP objects (i.e., when rpidx > (len(self.RP) - 1)) or if self.rp is equal to a value for which no reparameterisation is implemented.

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: list[int], cov_flat: ndarray, use_only: list[int], tol: int = 0) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for this smooth term.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.
var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.
tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.
penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
n_coef (int) – Number of coefficients associated with this term.
col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this smooth term, the number of unpenalized coefficients associated with this smooth term, and a list with names for each of the coefficients associated with this smooth term.

Parameters:: factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
Returns:: Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
Return type:: tuple[int,int,list[str]]

class mssm.src.python.terms.fs(variables: list, rf: str = None, nk: int = 9, m: int = 1, rp: int = 1, by_cont: str | None = None, by_subgroup: tuple[str, str] | None = None, approx_deriv: dict | None = None, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {})

Bases: f

Essentially a f term with by=rf, id != None, penalize_null= True, pen_kwargs = [{"m":1}], and rp=1.

This term approximates the “factor-smooth interaction” basis “fs” with m= 1 available in mgcv (Wood, 2017). For example, the term below from mgcv:

s(x,sub,bs="fs"))

would approximately correspond to the following term in mssm:

fs(["x"],rf="sub")

They are however not equivalent (mgcv by default uses a different basis for which the m key-word has a different functionality).

Specifically, here m= 1 implies that the only function left unpenalized by the default (difference) penalty is the constant (Eilers & Marx, 2010). Thus, a linear basis is penalized by the same default penalty that also penalizes smoothness (and not by a separate penalty as is the case in mgcv when m=1 for the default basis)! Any constant basis is penalized by the null-space penalty (in both mgcv and mssm; see Marra & Wood, 2011) - the term thus shrinks towards zero (Wood, 2017).

The factor smooth basis in mgcv allows to let the penalty be different for different levels of an additional factor (by additionally specifying the by argument for a smooth with basis “fs”). I.e.,

s(Time,Subject,by='condition',bs='fs')

in mgcv would estimate a non-linear random smooth of “time” per level of the “subject” & “condition” interaction - with the same penalty being placed on all random smooth terms within the same “condition” level.

This can be achieved in mssm by adding multiple fs terms to the Formula and utilising the by_subgroup argument. This needs to be set to a list where the first element identifies the additional factor variable (e.g., “condition”) and the second element corresponds to a level of said factor variable. E.g., to approximate the aforementioned mgcv term we have to add:

*[fs(["Time"],rf="subject_cond",by_subgroup=["cond",cl]) for cl in np.unique(dat["cond"])]

to the Formula terms list. Importantly, “subject_cond” is the interaction of “subject” and “condition” - not just the “subject variable in the data.

Model estimation can become quite expensive for fs terms, when the factor variable for rf has many levels. (> 10000) In that case, approximate derivative evaluation can speed things up considerably. To enforce this, the approx_deriv argument needs to be specified with a dict, having the following structure: {"no_disc":[str],"excl":[str],"split_by":[str], "restarts":int,"seed":None or int}. “no_disc” should usually be set to an empty list, and should in general only contain names of continuous variables included in the formula. Any variable mentioned here will not be discretized before clustering - this will make the approximation a bit more accurate but also require more time. Similarly, “excl” specifies any continuous variables that should be excluded for clustering. “split_by” should generally be set to a list containing all categorical variables present in the formula. “restarts” indicates the number of times to re-produce the clustering (40 seems to be a good number). “seed” can either be set to None or to an integer - in the latter case, the random cluster initialization will use that seed, ensuring that the clustering outcome (and hence model fit) is replicable.

References:

Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models.Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.). Chapman and Hall/CRC.

Parameters:

variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.
rf (str, optional) – A string corresponding to a (random) factor in data passed to Formula. Separate f(variables) (but a shared smoothness penalty!) will be estimated per level of rf.
nk (int or list[int], optional) – Number of basis functions -1 to use. I.e., if nk=9 (the default), the term will use 10 basis functions. By default f() has identifiability constraints applied and we act as if nk``+ 1 coefficients were requested. The ``fs() term needs no identifiability constrains so if the same number of coefficients used for a f() term is requested (the desired approach), one coefficient is added to compensate for the lack of identifiability constraints. This is the opposite to how this is handled in mgcv: specifying nk=10 for “fixed” univariate smooths results in 9 basis functions being available. However, for a smooth in mgcv with basis=’fs’, 10 basis functions will remain available.
basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in mssm.src.smooths.B_spline_basis().
basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. For the B-spline basis the following arguments (with default values) are available: convolve``=``False, min_c``=``None, max_c``=``None, deg``=``3. See mssm.src.smooths.B_spline_basis() for details.
by_cont (str, optional) – A string corresponding to a numerical variable in data passed to Formula. The model matrix for the estimated smooth term will be multiplied by the column of this variable. Can be used as an alternative to estimate separate random smooth terms per level of another factor (wich is also possible with by_subgroup).
by_subgroup ([str,str], optional) – List including a factor variable and specific level of said variable. Allows for separate penalties as described above.
approx_deriv (dict, optional) – Dict holding important info for the clustering algorithm. Structure: {"no_disc":[str],"excl":[str],"split_by":[str],"restarts":int}

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int], tol: int = 0) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for this factor smooth term.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.
var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.
tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this factor smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.
penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this factor smooth term, the number of unpenalized coefficients associated with this factor smooth term, and a list with names for each of the coefficients associated with this factor smooth term.

Parameters:: factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
Returns:: Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
Return type:: tuple[int,int,list[str]]

mssm.src.python.terms.get_linear_coef_info(lTerm: l | rs, has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with a linear or random term, the number of unpenalized coefficients associated with a linear or random and a list with names for each of the coefficients associated with a linear or random.

Parameters:

lTerm – Linear or random slope term
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.i

Bases: GammTerm

An intercept/offset term. In a model

\[\mu_i = a + f(x_i)\]

it reflects \(a\).

build_matrix(ci: int, ti: int, ridx: ndarray, use_only: list[int]) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix for an intercept term.

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
ridx (np.ndarray) – Array of non NAN rows in the data.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

get_coef_info() → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Returns:: Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
Return type:: tuple[int,int,list[str]]

class mssm.src.python.terms.irf(variables: list[str], event_onset: list[int], basis_kwargs: list[dict], by: str = None, id: int = None, nk: int = 10, basis: ~collections.abc.Callable = <function B_spline_basis>, is_penalized: bool = True, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)

Bases: GammTerm

A simple impulse response term, designed to correct for events with overlapping responses in multi-level time-series modeling.

The idea (see Ehinger & Dimigen; 2019 for a detailed introduction to this kind of deconvolution analysis) is that some kind of event happens during each recorded time-series (e.g., stimulus onset, distractor display, mask onset, etc.) which is assumed to affect the recorded signal in the next X ms in some way. The moment of event onset can differ between recorded time-series. In other words, the event is believed to act like an impulse which triggers a delayed response on the signal. This term class can be used to estimate the shape of this impulse response. Multiple irf terms can be included in a Formula if multiple events happen, potentially with overlapping responses.

Example:

# Simulate time-series based on two events that elicit responses which vary in their overlap.
# The summed responses + a random intercept + noise is then the signal.
overlap_dat,onsets1,onsets2 = sim7(100,1,2,seed=20)

# Model below tries to recover the shape of the two responses in the 200 ms after event
# onset (max_c=200) + the random intercepts:

# For models with irf terms, the column in the data identifying
# unique series need to be specified as well!
overlap_formula = Formula(lhs("y"),[irf(["time"],onsets1,nk=15,
                                      basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]),
                                    irf(["time"],onsets2,nk=15,
                                      basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]),
                                    ri("factor")],
                                    data=overlap_dat,
                                    series_id="series")

model = GAMM(overlap_formula,Gaussian())
model.fit()

Note, that care needs to be taken when predicting for models including irf terms, because the onset of events can differ between time-series. Hence, model predictions + standard errors should first be obtained for the entire data-set used also to train the model and then extract series-specific predictions from the model-matrix as follows:

# Get model matrix for entire data-set but only based on the estimated
# shape for first irf term:
_,pred_mat,ci_b = model.predict([0],overlap_dat,ci=True)

# Now extract the prediction + approximate ci boundaries for a single series:
s = 8
s_pred = pred_mat[overlap_dat["series"] == s,:]@model.coef
s_ci = ci_b[overlap_dat["series"] == s]

# Now the estimated response following the onset of the first event can be
# visualized + an approximate CI:
from matplotlib import pyplot as plt
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred,color='blue')
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred+s_ci,color='blue',
  linestyle='dashed')
plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred-s_ci,color='blue',
  linestyle='dashed')

References:

Ehinger, B. V., & Dimingen, O. (2019). Unfold: an integrated toolbox for overlap correction, non-linear modeling, and regression-based EEG analysis. https://doi.org/10.7717/peerj.7838

Parameters:

variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in data passed to Formula. Need to be continuous.
event_onset ([int]) – A np.array containing, for each individual time-series, the index corresponding to the sample/time-point at which the event eliciting the response to be estimate by this term happened.
basis_kwargs (dict) – A list containing one or multiple dictionaries specifying how the basis should be computed. For irf terms, the convolve argument has to be set to True! Also, min_c and max_c must be specified. min_c corresponds to the assumed min. delay of the response after event onset and can usually be set to 0. max_c corresponds to the assumed max. delay of the response (in ms) after which the response is believed to have returned to a zero base-line.
by (str, optional) – A string corresponding to a factor in data passed to Formula. Separate irf(variables) (and smoothness penalties) will be estimated per level of by.
id (int, optional) – Different impulse response smooth functions with the same id share their \(\lambda\) values. Effect differs when also specifying a by variable: In that case, if id is set to any integer the penalties placed on the separate irf(variables) will share a single smoothness penalty and other impulse response smooth functions will ignore this term’s id.
nk (int, optional) – Number of basis functions to use. I.e., if nk=10 (the default), the term will use 10 basis functions (Note that these terms are not made identifiable by absorbing any kind of constraint).
basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in src.smooths.B_spline_basis.
is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.
penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.
pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. For the default difference penalty (Eilers & Marx, 2010) the only keyword argument (with default value) available is: m=2. This reflects the order of the difference penalty. Note, that while a higher m permits penalizing towards smoother functions it also leads to an increased dimensionality of the penalty Kernel (the set of f[variables] which will not be penalized). In other words, increasingly more complex functions will be left un-penalized for higher m (except if penalize_null is set to True). m=2 is usually a good choice and thus the default but see Eilers & Marx (2010) for details.

build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov: list[ndarray], use_only: list[int], pool, tol: int = 0) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this impulse response smooth term.

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or None for categorical variables.
var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or None for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov ([np.ndarray]) – A list containing a separate array per time-series included in the data and indicated to the formula. The array contains, for the particular time-seriers, all (encoded, in case of categorical predictors) values on each predictor (each columns of the array corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.
pool (Any) – A multiprocessing pool for parallel matrix construction parts
tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero but not absolutely zero. Defaults to strictly zero.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this impulse response smooth term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.
penid (int) – If a term is subjected to multipe penalties, then penid indexes which of those penalties is currently implemented. Otherwise can be set to zero.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(ti: int, factor_levels: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this impulse response smooth term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.l(variables: list)

Bases: GammTerm

Adds a parametric (linear) term to the model formula. The model \(\mu_i = a + b*x_i\) can for example be achieved by adding [i(), l(['x'])] to the term argument of a Formula. The coefficient \(b\) estimated for the term will then correspond to the slope of \(x\). This class can also be used to add predictors for categorical variables. If the formula includes an intercept, binary coding will be utilized to add reference-level adjustment coefficients for the remaining k-1 levels of any additional factor variable.

If more than one variable is included in variables the model will only add the the len(variables)-interaction to the model! Lower order interactions and main effects will not be included by default (see li() function instead, which automatically includes all lower-order interactions and main effects).

Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and a continuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"]),l(["cond","x"])])

This formula will estimate the following model:

\[\mu_i = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]

Here, \(c\) is a binary predictor variable created so that it is 1 if “cond”=2 else 0 and \(b_3\) is the coefficient that is added because l(["cond","x"]) is included in the terms (i.e., the interaction effect).

To get a model with only main effects for “cond” and “x”, the following formula could be used:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])

This formula will estimate:

\[\mu_i = a + b_1*c_i + b_2*x_i\]

Parameters:: variables ([str]) – A list of the variables (strings) for which linear predictors should be included

build_matrix(has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this linear term.

Parameters:

has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

get_coef_info(has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this linear term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:

has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

mssm.src.python.terms.li(variables: list[str])

Behaves like the l class but automatically includes all lower-order interactions and main effects.

Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and acontinuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:

formula = Formula(lhs("y"),terms=[i(),*li(["cond","x"])])

Note, the use of the * operator to unpack the individual terms returned from li!

This formula will still (see l) estimate the following model:

\[\mu = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]

with: \(c\) corresponding to a binary predictor variable created so that it is 1 if “cond”=2 else 0.

To get a model with only main effects for “cond” and “x” li() cannot be used and l needs to be used instead:

formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])

This formula will estimate:

\[\mu_i = a + b_1*c_i + b_2*x_i\]

Parameters:: variables (list[str]) – A list of the variables (strings) for which linear predictors should be included

class mssm.src.python.terms.ri(variable: str, id: int | None = None)

Bases: GammTerm

Adds a random intercept for the factor variable to the model. The random intercepts \(b_i\) are assumed to be i.i.d \(b_i \sim N(0,\sigma_b)\) i.e., normally distributed around zero - the simplest random effect supported by mssm.

Thus, this term achieves exactly what is achieved in mgcv by adding the term:

s(variable,bs="re")

The variable needs to identify a factor-variable in the data (i.e., the .dtype of the variable has to be equal to ‘O’). If you want to add more complex random effects to the model (e.g., random slopes for continuous variable “x” per level of factor variable) use the rs class.

Parameters:

variable (str) – The name (string) of a factor variable. For every level of this factor a random intercept will be estimated. The random intercepts are assumed to follow a normal distribution centered around zero.
id (int|None, optional) – Different random intercepts with the same id share their \(\lambda\) values. Defaults to None.

build_matrix(ci: int, ti: int, var_map: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this random intercept term.

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this random intercept term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(factor_levels: dict, coding_factors: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this random intercept term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:

factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

class mssm.src.python.terms.rs(variables: list[str], by: str | None = None, id: int | None = None)

Bases: GammTerm

Adds random coefficients for the (interaction) effect of variables.

The term works exactly like s(var1,var2,...varK,bs='re',by=fact) works in mgcv. That is, if by=None the model matrix implied by l(vars) is added to the overall model matrix (without applying binary coding to ensure identifiability), and the corresponding coefficients are subjected to an identity penalty matrix.

As described in more detail in the doc string of the l class, if multiple variables are specified in variables, the added model matrix will reflect the partial interaction (again without applying binary coding to ensure identifiability) of the variables.

If by is not None, separate identity penalties (and random coefficients), will be estimated per level of the factor variable passed to by.

Correlations between random effects cannot be taken into account by means of parameters (this is possible for example in lme4).

Examples:

s(fact,bs='re') in mgcv is rs(["fact"]) in mssm
s(cov,bs='re') in mgcv is rs(["cov"]) in mssm
s(fact,cov,bs='re') in mgcv is rs(["fact","cov"]) in mssm
s(fact,cov,bs='re',by=fact2) in mgcv is rs(["fact","cov"],by="fact2") in mssm

where “fact” and “fact2” refer to categorical variables and “cov” refers to a continuous variable.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.). Chapman and Hall/CRC.
Random effects in mgcv: see https://www.rdocumentation.org/packages/mgcv/topics/smooth.construct.re.smooth.spec

Parameters:

variables ([str]) – A list of variables. Can point to continuous and categorical variables.
by (str | None, optional) – Optionally, the name of a factor variable. For each level of this factor, separate random coefficients for variables and penalties will be estimated, defaults to None
id (int | None, optional) – Different random slopes with the same id share their \(\lambda\) values. Defaults to None.

build_matrix(ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) → tuple[list[float], list[int], list[int], int]

Builds the design/term/model matrix associated with this random slope term.

Parameters:

ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of cov_flat corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to None so that no terms are excluded.

Returns:

matrix data, matrix row indices, matrix column indices, added columns

Return type:

tuple[list[float],list[int],list[int],int]

build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) → tuple[list[LambdaTerm], int]

Builds a penalty matrix associated with this random slope term and returns an updated penalties list including it.

This method is implemented by most implementations of the GammTerm class. Two arguments need to be returned: the updated penalties list including the new penalty implemented as a LambdaTerm and the updated cur_pen_idx. The latter simply needs to be incremented for every penalty added to penalties.

Parameters:

ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in penalties.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.

Returns:

Updated penalties list including the new penalties implemented as a LambdaTerm and the updated cur_pen_idx

Return type:

tuple[list[LambdaTerm],int]

get_coef_info(var_types: dict, factor_levels: dict, coding_factors: dict) → tuple[int, int, list[str]]

Returns the total number of coefficients associated with this random slope term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.

Parameters:

var_types (dict) – Var types dictionary. Keys are variables in the data, values are either VarType.NUMERIC for continuous variables or VarType.FACTOR for categorical variables.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).

Returns:

Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names

Return type:

tuple[int,int,list[str]]

mssm.src.python.utils module

class mssm.src.python.utils.DummyRhoPrior(a=np.float64(-16.11809565095832), b=np.float64(16.11809565095832))

Bases: RhoPrior

Simple uniform prior for rho - the log-smoothing penalty parameters

logpdf(rho: ndarray) → ndarray

Returns an array holding zeroes for all log(lambda) parameters within self.a and self.b, otherwise -np.inf.

Parameters:: rho (np.ndarray) – Array of log(lambda) parameters
Returns:: Log-density array as described above
Return type:: np.ndarray

class mssm.src.python.utils.GAMLSSGSMMFamily(pars: int, gammlss_family: GAMLSSFamily)

Bases: GSMMFamily

Implementation of the GSMMFamily class that uses only information about the likelihood to estimate any implemented GAMMLSS model.

Allows to estimate any GAMMLSS as a GSMM via the L-qEFS & Newton update. Example:

# Simulate 500 data points
sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False)

# We need to model the mean: mu_i
formula_m = Formula(lhs("y"),
                    [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],
                    data=sim_dat)

# And for sd - here constant
formula_sd = Formula(lhs("y"),
                    [i()],
                    data=sim_dat)

# Collect both formulas
formulas = [formula_m,formula_sd]
links = [Identity(),LOG()]

# Now define the general family + model
gsmm_fam = GAMLSSGSMMFamily(2,GAUMLSS(links))
model = GSMM(formulas=formulas,family=gsmm_fam)

# Fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(init_coef=None,method='qEFS',extend_lambda=False,
        control_lambda=0,max_outer=200,max_inner=500,min_inner=500,
        seed=0,qEFSH='SR1',max_restarts=5,overwrite_coef=False,
        qEFS_init_converge=False,prefit_grad=True,
        progress_bar=True,**bfgs_opt)

################### Or for a multinomial model: ###################

formulas = [Formula(lhs("y"),
                [i(),f(["x0"])],
                data=sim5(1000,seed=91)) for k in range(4)]

# Create family - again specifying K-1 pars - here 4!
family = MULNOMLSS(4)

# Collect both formulas
links = family.links

# Now again define the general family + model
gsmm_fam = GAMLSSGSMMFamily(4,family)
model = GSMM(formulas=formulas,family=gsmm_fam)

# And fit with SR1
bfgs_opt={"gtol":1e-9,
        "ftol":1e-9,
        "maxcor":30,
        "maxls":200,
        "maxfun":1e7}

model.fit(init_coef=None,method='qEFS',extend_lambda=False,
        control_lambda=0,max_outer=200,max_inner=500,min_inner=500,
        seed=0,qEFSH='SR1',max_restarts=0,overwrite_coef=False,
        qEFS_init_converge=False,prefit_grad=True,
        progress_bar=True,**bfgs_opt)

References:

Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.

Parameters:

pars (int) – Number of parameters of the likelihood.
gammlss_family (GAMLSSFamily) – Any implemented member of the GAMLSSFamily class. Available in self.llkargs[0].

gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → ndarray

Function to evaluate gradient of GAMM(LSS) model when estimated via GSMM.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Gradient of the log-likelihood evaluated at coef as numpy array) of shape (-1,1).

Return type:

np.ndarray

hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → csc_array

Function to evaluate Hessian of GAMM(LSS) model when estimated via GSMM.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The Hessian of the log-likelihood evaluated at coef.

Return type:

scp.sparse.csc_array

llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) → float

Function to evaluate log-likelihood of GAMM(LSS) model when estimated via GSMM.

Parameters:

coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via np.split()) the coef into the sub-sets associated with each paramter of the llk.
ys ([np.ndarray or None]) – List containing the vectors of observations passed as lhs.variable to the formulas. Note: by convention mssm expectes that the actual observed data is passed along via the first formula (so it is stored in ys[0]). If multiple formulas have the same lhs.variable as this first formula, then ys contains None at their indices to save memory.
Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.

Returns:

The log-likelihood evaluated at coef.

Return type:

float

mssm.src.python.utils.REML(llk: float, nH: csc_array, coef: ndarray, scale: float, penalties: list[LambdaTerm], keep: ndarray[tuple[Any, ...], dtype[int64]] | None = None) → float | ndarray

Based on Wood (2011). Exact REML for Gaussian GAM, Laplace approximate (Wood, 2016) for everything else. Evaluated after applying stabilizing reparameterization discussed by Wood (2011).

Important: the dimension of the output depend on the shape of coef. If coef is flattened, then the output will be a float. If coef is of shape (-1,1), the output will be [[float]].

References:

Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models

Parameters:

llk (float) – log-likelihood of model
nH (scp.sparse.csc_array) – negative hessian of log-likelihood of model
coef (np.ndarray) – Estimated vector of coefficients of shape (-1,1)
scale (float) – (Estimated) scale parameter - can be set to 1 for GAMLSS or GSMMs.
penalties ([LambdaTerm]) – List of penalties that were part of the model.
keep (np.typing.NDArray[np.int_]|None, optional) – Optional array of indices corresponding to identifiable coefficients. Coefficients not in this list (not identifiable) are dropped from the negative hessian of the penalized log-likelihood. Can also be set to None (default) in which case all coefficients are treated as identifiable.

Returns:

(Approximate) REML score

Return type:

float|np.ndarray

class mssm.src.python.utils.RhoPrior(*args, **kwargs)

Bases: object

Base class to demonstrate the functionlaity that any prior passed to the correct_VB function has to implement.

logpdf(rho: ndarray)

Compute log density for log smoothing penalty parameters included in rho under this prior.

Parameters:: rho (np.ndarray) – Numpy array of shape (nR,nrho) containing nR proposed candidate vectors for the nrho log-smoothing parameters.

mssm.src.python.utils.adjust_CI(model, n_ps: int, b: ndarray, predi_mat: csc_array, use_terms: list[int] | None, alpha: float, seed: int | None, par: int = 0) → ndarray

Internal function to adjust point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016):

model.coef +- b gives point-wise interval, and for the interval to cover the whole-function, 1-alpha % of posterior samples should be expected to fall completely within these boundaries.

From section 6.10 in Wood (2017) we have that \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V})\). \(\mathbf{V}\) is the covariance matrix of this conditional posterior, and can be obtained by evaluating model.lvi.T @ model.lvi * model.scale (model.scale should be set to 1 for msssm.models.GAMMLSS and msssm.models.GSMM).

The implication of this result is that we can also expect the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\) to follow \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}} | \mathbf{y}, \boldsymbol{\lambda} \sim N(0,\mathbf{V})\). In line with the whole-function interval definition above, 1-alpha % of predi_mat@[*coef - coef] (where [*coef - coef] representes the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) should fall within [b,-b]. Wood (2017) suggests to find a so that [a*b,a*-b] achieves this.

To do this, we find a for every predi_mat@[*coef - coef] and then select the final one so that 1-alpha % of samples had an equal or lower one. The consequence: 1-alpha % of samples drawn should fall completely within the modified boundaries.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
n_ps (int) – Number of samples to obtain from posterior.
b (np.ndarray) – Ci boundary of point-wise CI.
predi_mat (scp.sparse.csc_array) – Model matrix for a particular smooth term or additive combination of parameters evaluated usually at a representative sample of predictor variables.
use_terms (list[int] | None) – The indices corresponding to the terms that should be used to obtain the prediction or None in which case all terms will be used.
alpha (float) – The alpha level to use for the whole-function interval adjustment calculation as outlined above.
seed (int | None) – Can be used to provide a seed for the posterior sampling.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.

Returns:

The adjusted vector b

Return type:

np.ndarray

mssm.src.python.utils.approx_smooth_p_values(model, par: int = 0, n_sel: int = 100000.0, edf1: bool = True, force_approx: bool = False, seed: int = 0) → tuple[list[float], list[float]]

Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithmby Wood (2013).

Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function mssm.models.print_parametric_terms()) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.

They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2013) we here approximate the reference distribution under the Null by means of the computations outlined in the paper by Davies (1980). If this fails, we fall back on a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\).

In case of a two-parameter distribution (i.e., estimated scale parameter \(\phi\)), the Chi-square reference distribution needs to be corrected, again resulting in a weighted chi-square distribution which should behave something like a F distribution with DoF1 = \(r\) and DoF2 = \(\epsilon_{DoF}\) (i.e., the residual degrees of freedom), which would be the reference distribution for \(T_r/r\) if \(r\) were integer and \(\mathbf{V}_{\boldsymbol{\beta}_j}\) full rank. We again follow the recommendations by Wood (2013) and rely on the methods by Davies (1980) to compute the p-value under this reference distribution. If this fails, we approximate the reference distribution for \(T_r/r\) with a Beta distribution, with \(\alpha=r/2\) and \(\beta=\epsilon_{DoF}/2\) (see Wikipedia for the specific transformation applied to \(T_r/r\) so that the resulting transformation is approximately beta distributed) - which is similar to the Gamma approximation used for the Chi-square distribution in the no-scale parameter case.

Warning: The resulting p-values are approximate. They should only be treated as indicative.

Note: Just like in mgcv, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.

References:

Davies, R. B. (1980). Algorithm AS 155: The Distribution of a Linear Combination of χ2 Random Variables.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.
testStat function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
par (int, optional) – Distribution parameter for which to compute p-values. Ignored when model is a GAMM. Defaults to 0
n_sel (int, optional) – Maximum number of rows of model matrix. For models with more observations a random sample of n_sel rows is obtained. Defaults to 1e5
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal. Defaults to True
force_approx (bool, optional) – Whether or not the p-value should be forced to be approximated based on a Gamma/Beta distribution. Only use for testing - in practice you want to keep this at False. Defaults to False
seed (int, optional) – Random seed determining the random sample computation. Defaults to 0

Returns:

Tuple conatining two lists: first list holds approximate p-values for all smooth terms, second list holds test statistic.

Return type:

tuple[list[float],list[float]]

mssm.src.python.utils.computeAr1Chol(formula: Formula, rho: float) → tuple[csc_array, float]

Computes the inverse of the cholesky of the (scaled) variance matrix of an ar1 model.

Parameters:

formula (Formula) – Formula of the model
rho (float) – ar1 weight.

Returns:

Tuple, containing banded inverse Cholesky as a scipy array and the correction needed to get the likelihood of the ar1 model.

Return type:

tuple[scp.sparse.csc_array,float]

mssm.src.python.utils.compute_REML_candidate_GSMM(family: GAMLSSFamily | GSMMFamily, y: ndarray | list[ndarray], Xs: list[csc_array], penalties: list[LambdaTerm], coef: ndarray, n_coef: int, coef_split_idx: list[int], method: str = 'Chol', conv_tol: float = 1e-07, n_c: int = 10, bfgs_options: dict = {}, origNH: csc_array | None = None, keep_drop: tuple[ndarray[tuple[Any, ...], dtype[int64]], ndarray[tuple[Any, ...], dtype[int64]]] | None = None) → tuple[float, csc_array, csc_array, ndarray, float, float]

Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GSMM or GAMMLSS.

Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).

See REML() function for more details.

Parameters:

family (GAMLSSFamily | GSMMFamily) – Model Family
y (np.ndarray | list[np.ndarray]) – Vector of observations or list of vectors (for GSMM)
Xs (list[scp.sparse.csc_array]) – List of model matrices
penalties (list[LambdaTerm]) – List of penalties
coef (np.ndarray) – Final coefficient estimate obtained from estimation - used to initialize
n_coef (int) – Number of coefficients
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
method (str, optional) – Method to use to solve for the coefficients (lambda parameters in case this is set to ‘qEFS’), defaults to “Chol”
conv_tol (float, optional) – Tolerance, defaults to 1e-7
n_c (int, optional) – Number of cores to use, defaults to 10
bfgs_options (dict, optional) – An optional dictionary holding arguments that should be passed on to the call of scipy.optimize.minimize() if method=='qEFS', defaults to {}
origNH (scp.sparse.csc_array | None, optional) – Optional external hessian matrix, defaults to None
keep_drop (tuple[np.typing.NDArray[np.int_],np.typing.NDArray[np.int_]] | None) – Set of kept and dropped coeeficients during estimation or None

Returns:

reml criterion,conditional covariance matrix of coefficients for this lambda, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, coefficients, total edf, llk

Return type:

tuple[float, scp.sparse.csc_array, scp.sparse.csc_array, np.ndarray, float, float]

mssm.src.python.utils.compute_Vb_corr_WPS(Vbr: csc_array, Vpr, Vr, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1, drop: ndarray[tuple[Any, ...], dtype[int64]] | None = None) → tuple[ndarray, ndarray | float]

Computes both correction terms for Vb or \(\mathbf{V}_{\boldsymbol{\beta}}\), which is the co-variance matrix for the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V}_{\boldsymbol{\beta}})\), described by Wood, Pya, & Säfken (2016).

References:

Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.
Vpr (np.ndarray) – A (regularized) estimate of the covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.
Vr (np.ndarray) – Transpose of root of un-regularized covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.
H (scp.sparse.csc_array) – The Hessian of the log-likelihood
S_emb (scp.sparse.csc_array) – The weighted penalty matrix.
penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.
coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)
scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.
drop – Optional array of indices corresponding to unidentifiable coefficients. Coefficients in this list (i.e., not identifiable) are dropped from the negative hessian of the penalized log-likelihood. Can also be set to None (default) in which case all coefficients are treated as identifiable.

Raises:

ArithmeticError – Will throw an error when the negative Hessian of the penalized likelihood is ill-scaled so that a Cholesky decomposition fails.

Returns:

A tuple containing: Vc and Vcc. Vbr.T@Vbr*scale + Vc + Vcc is then approximately the correction devised by WPS (2016). Vcc can simply be zero if the negative penalized Hessian is not positive definite when coefficients have been dropped

Return type:

tuple[np.ndarray, np.ndarray | float]

mssm.src.python.utils.compute_Vp_WPS(Vbr: csc_array, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray, ndarray]

Computes the inverse of what is approximately the negative Hessian of the Laplace approximate REML criterion with respect to the log smoothing penalties.

The derivatives computed are only exact for Gaussian additive models and canonical generalized additive models. For all other models they are in-exact in that they assume that the hessian of the log-likelihood does not depend on \(\lambda\) (or \(log(\lambda)\)), so they are essentially the PQL derivatives of Wood et al. (2017). The inverse computed here acts as an approximation to the covariance matrix of the log smoothing parameters.

References:

Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data.

Parameters:

Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.
H (scp.sparse.csc_array) – The Hessian of the log-likelihood
S_emb (scp.sparse.csc_array) – The weighted penalty matrix.
penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.
coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)
scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.

Returns:

Generalized inverse of negative hessian of approximate REML criterion, regularized version of the former, root of generalized inverse, root of regularized generalized inverse, hessian of approximate REML criterion, np.array of shape ((len(coef),len(penalties))) containing in each row the partial derivative of the coefficients with respect to an individual lambda parameter

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.utils.compute_bias_corrected_edf(model, overwrite: bool = False) → None

This function computes and assigns smoothing bias corrected (term-wise) estimated degrees of freedom.

For a definition of smoothing bias-corrected estimated degrees of freedom see Wood (2017).

Note: This function modifies model, setting edf1 and term_edf1 attributes.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
overwrite (bool, optional) – Whether previously computed bias corrected edf should be overwritten. Otherwise this function immediately terminates if model.edf1 is not None, defaults to False

Return type:

None

mssm.src.python.utils.compute_reml_candidate_GAMM(family: Family, y: ndarray, X: csc_array, penalties: list[LambdaTerm], n_c: int = 10, offset: float | ndarray = 0, init_eta: ndarray | None = None, method: str = 'Chol', compute_inv: bool = False, origNH: float | None = None) → tuple[float, csc_array | None, csc_array, list[int], ndarray, float, float, float]

Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GAMM model.

Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).

See REML() function for more details.

Parameters:

family (Family) – Family of the model
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Model matrix
penalties (list[LambdaTerm]) – List of penalties
n_c (int, optional) – Number of cores to use, defaults to 10
offset (float | np.ndarray, optional) – Fixed offset to add to eta, defaults to 0
init_eta (np.ndarray | None, optional) – Initial vector for linear predictor, defaults to None
method (str, optional) – Method to use to solve for coefficients, defaults to ‘Chol’
compute_inv (bool, optional) – Whether to compute the inverse of the pivoted Cholesky of the negative hessian of the penalized llk, defaults to False
origNH (float | None, optional) – Optional external scale parameter, defaults to None

Returns:

reml criterion, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, pivoted Cholesky, pivot column indices, coefficients, estimated scale, total edf, llk

Return type:

tuple[float, scp.sparse.csc_array|None, scp.sparse.csc_array, list[int], np.ndarray, float, float, float]

mssm.src.python.utils.correct_VB(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, form_t1: bool = False, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', only_expected_edf: bool = False, Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, recompute_H: bool = False, seed: int | None = None, compute_Vcc: bool = True, VP_grid_type: str = 'JJJ1', **bfgs_options) → tuple[csc_array | None, csc_array | None, ndarray | None, ndarray | None, ndarray | None, float | None, ndarray | None, float | None, float, ndarray]

Estimate \(\tilde{\mathbf{V}}\), the covariance matrix of the marginal posterior \(\boldsymbol{\beta} | y\) to account for smoothness uncertainty.

Wood et al. (2016) and Wood (2017) show that when basing conditional versions of model selection criteria or hypothesis tests on \(\mathbf{V}\), which is the co-variance matrix for the normal approximation to the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}}, \mathbf{V})\), the tests are severely biased. To correct for this they show that uncertainty in \(\boldsymbol{\lambda}\) needs to be accounted for. Hence they suggest to base these tests on \(\tilde{\mathbf{V}}\), the covariance matrix of the normal approximation to the marginal posterior \(\boldsymbol{\beta} | y\). They show how to obtain an estimate of \(\tilde{\mathbf{V}}\), but this requires \(\mathbf{V}^{\boldsymbol{\rho}}\) - an estimate of the covariance matrix of the normal approximation to the posterior of \(\boldsymbol{\rho}=log(\boldsymbol{\lambda})\). Computing \(\mathbf{V}^{\boldsymbol{\rho}}\) requires derivatives that are not available when using the efs update.

This function implements multiple strategies to approximately correct for smoothing parameter uncertainty, based on the proposals by Wood et al. (2016) and Greven & Scheipl (2017). The most straightforward strategy (grid_type = 'JJJ1') is to obtain a PQL or finite difference approximation for \(\mathbf{V}^{\boldsymbol{\rho}}\) and to then compute approximately the Wood et al. (2016) correction assuming that higher-order derivatives of the llk are zero (this will be exact for Gaussian additive or canonical Generalized models). This is too costly for large sparse multi-level models and not exact for more generic models. The MC based alternative available via grid_type = 'JJJ2' addresses the first problem (Important, set: use_importance_weights=False and only_expected_edf=True.). The second MC based alternative available via grid_type = 'JJJ3' is most appropriate for more generic models (The prior argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set: use_importance_weights=True and only_expected_edf=False). Both strategies use a PQL or finite difference approximation to \(\mathbf{V}^{\boldsymbol{\rho}}\) to obtain nR samples from the (normal approximation) to the posterior of \(\boldsymbol{\rho}\). From these samples mssm then estimates \(\tilde{\mathbf{V}}\) as described in more detail by Krause et al. (in preparation).

Note: If you set only_expected_edf=True, only the last two output arguments will be non-zero.

Example:

# Simulate some data for a Gaussian model
sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),
                             f(["x0"],nk=20),
                             f(["x1"],nk=20),
                             f(["x2"],nk=20),
                             f(["x3"],nk=20)],
                            data=sim_fit_dat,
                            print_warn=False)

model = GAMM(sim_fit_formula,Gaussian())
model.fit(exclude_lambda=False,progress_bar=False,max_outer=100)


# Compute correction from Wood et al. (2016) - will be approximate for more generic models
# V will be approximate covariance matrix of marginal posterior of coefficients
# LV is Cholesky of the former
# Vp is approximate covariance matrix of log regularization parameters
# Vpr is regularized version of the former
# edf is vector of estimated degrees of freedom (uncertainty corrected) per coefficient
# total_edf is sum of former (subjected to upper bounds so might not be exactly the same)
# ed2 is optionally smoothness bias corrected version of edf
# total_edf2 is optionally bias corrected version of total_edf (subjected to upper bounds)
# expected_edf is None here but for MC strategies (i.e., ``grid!=1``) will be an estimate
# of total_edf (**without being subjected to upper bounds**) that does not require forming
# V (only computed when ``only_expected_edf=True``).
# mean_coef is None here but for MC strategies will be an estimate of the mean of the
# marginal posterior of coefficients, only computed when setting ``recompute_H=True``

V,LV,Vp,Vpr,edf,total_edf,edf2,total_edf2,expected_edf,mean_coef = correct_VB(model,
    grid_type="JJJ1",verbose=True,seed=20)

# Compute MC estimate for generic model and given prior
prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior
V_MC,LV_MC,Vp_MC,Vpr_MC,edf_MC,        total_edf_MC,edf2_MC,total_edf2_MC,expected_edf_MC,mean_coef_MC = correct_VB(model2,
    grid_type="JJJ3", verbose=True, seed=20, df=10, prior=prior, recompute_H=True)

References:

Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)
nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250
grid_type (str, optional) – How to compute the smoothness uncertainty correction - see above for details, defaults to ‘JJJ1’
a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}), \mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate
b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}), \mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate
df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40
n_c (int, optional) – Number of cores to use during parallel parts of the correction. Note, if you want to use more than one core for more generic models it will most likely be necessary to install mssm with the extra mp dependency set. This installs the multiprocess package, which is necessary since most general models implement at least one local function that cannot be serialized by the standard multiprocessing library. To install the extra dependency set simply run pip install -U mssm[mp], defaults to 10
form_t1 (bool, optional) – Whether or not the smoothness uncertainty + smoothness bias corrected edf should be computed, defaults to False
verbose (bool, optional) – Whether to print progress information or not, defaults to False
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.
only_expected_edf (bool,optional) – Whether to compute edf. by explicitly forming covariance matrix (only_expected_edf=False) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense when grid_type!='JJJ1'. Defaults to False
Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)
prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None
recompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False
compute_Vcc (bool, optional) – Whether to compute the second correction term when strategy=’JJJ1’ (or when computing the lower-bound for the remaining strategies) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to True
seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None
VP_grid_type – Experimental. Optional parameter allowing control over the estimation of the covariance matrix of the \(log(\lambda)\) parameters, see the estimateVp() function for details. Defaults to ‘JJJ1’
bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of scipy.optimize.minimize(). If none are provided, the gtol argument will be initialized to 1e-3. Note also, that in any case the maxiter argument is automatically set to 100. Defaults to None.

Returns:

A tuple containing: V - an estimate of the unconditional covariance matrix, LV - the Cholesky of the former, Vp - an estimate of the covariance matrix for \(\boldsymbol{\rho}\), Vpr - a regularized version of the former, edf - smoothness uncertainty corrected coefficient-wise edf, total_edf - smoothness uncertainty corrected total (i.e., model) edf, edf2 - smoothness uncertainty + smoothness bias corrected coefficient-wise edf, total_edf2 - smoothness uncertainty + smoothness bias corrected total (i.e., model) edf, expected_edf - an optional estimate of total_edf that does not require forming V, mean_coef - an optional estimate of the mean of the posterior of the coefficients

Return type:

mssm.src.python.utils.estimateVp(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, drop_NA: bool = True, method: str = 'Chol', Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, seed: int | None = None, **bfgs_options) → tuple[ndarray, ndarray, ndarray, ndarray, ndarray, ndarray]

Estimate covariance matrix \(\mathbf{V}^{\boldsymbol{\rho}}\) of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\).

Either \(\mathbf{V}^{\boldsymbol{\rho}}\) is based on finite difference approximation or on a PQL approximation (see grid_type parameter), or it is estimated via numerical integration similar to what is done in the correct_VB() function (this is done when grid_type=='JJJ2'; see the aforementioned function for details).

Example:

# Simulate some data for a Gaussian model
sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21)

# Now fit nested models
sim_fit_formula = Formula(lhs("y"),
                            [i(),
                             f(["x0"],nk=20,rp=0),
                             f(["x1"],nk=20,rp=0),
                             f(["x2"],nk=20,rp=0),
                             f(["x3"],nk=20,rp=0)],
                            data=sim_fit_dat,
                            print_warn=False)

model = GAMM(sim_fit_formula,Gaussian())
model.fit(exclude_lambda=False,progress_bar=False,max_outer=100)

# Compute correction from Wood et al. (2016) - will be approximate for more generic models
# Vp is approximate covariance matrix of log regularization parameters
# Vpr is regularized version of the former
# Ri is a root of covariance matrix of log regularization parameters
# Rir is a root of regularized version of covariance matrix of log regularization parameters
# ep will be an estimate of the mean of the marginal posterior of log regularization
# parameters (for ``grid_type="JJJ1"`` this will simply be the log of the estimated
# regularization parameters)
Vp, Vpr, Ri, Rir, ep, _ = estimateVp(model,grid_type="JJJ1",verbose=True,seed=20)


# Compute MC estimate for generic model and given prior
prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior
Vp_MC, Vpr_MC, Ri_MC, Rir_MC, ep_MC, _ = estimateVp(model,
    strategy="JJJ2",verbose=True,seed=20,use_importance_weights=True,prior=prior)

References:

https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)
nR (int, optional) – In case grid!="JJJ1", nR samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250
grid_type (str, optional) – How to compute the smoothness uncertainty correction. Setting grid_type="JJJ1" means a PQL or finite difference approximation is obtained. Setting grid_type="JJJ2" means numerical integration is performed - see correct_VB() for details , defaults to ‘JJJ1’
a (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}), \mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimate
b (float, optional) – Any of the \(\lambda\) estimates obtained from model (used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}), \mathbf{V}^{\boldsymbol{\rho}})\) used to sample nR candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimate
df (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to np.inf means a multivariate normal is used for sampling, defaults to 40
n_c (int, optional) – Number of cores to use during parallel parts of the correction. Note, if you want to use more than one core for more generic models it will most likely be necessary to install mssm with the extra mp dependency set. This installs the multiprocess package, which is necessary since most general models implement at least one local function that cannot be serialized by the standard multiprocessing library. To install the extra dependency set simply run pip install -U mssm[mp], defaults to 10
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to 'qEFS', then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.
Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when grid_type != 'JJJ1' or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)
prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a .logpdf() method to compute the prior log density of a sampled candidate. If this is set to None, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored when use_importance_weights=False. Defaults to None
recompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False
seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None
bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of scipy.optimize.minimize. If none are provided, the gtol argument will be initialized to 1e-3. Note also, that in any case the maxiter argument is automatically set to 100. Defaults to None.

Returns:

A tuple with 6 elements: an estimate of the covariance matrix of the posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\), a regularized version of the former, a root of the covariance matrix, a root of the regularized covariance matrix, an estimate of the mean of the posterior, and a np.array of shape ((len(coef),len(penalties))) containing in each row the partial derivative of the coefficients with respect to an individual lambda parameter

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]

mssm.src.python.utils.print_parametric_terms(model, par: int = 0) → None

Prints summary output for linear/parametric terms in the model of a specific parameter, not unlike the one returned in R when using the summary function for mgcv models.

If the model has not been estimated yet, it prints the term names instead.

For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.

Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GSMM, GAMMLSS, or GAMM model
par (int, optional) – Parameter of the likelihood/family for which to print terms, defaults to 0

Raises:

NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.

Return type:

None

mssm.src.python.utils.print_smooth_terms(model, par: int = 0, pen_cutoff: float = 0.2, ps: list[float] | None = None, Trs: list[float] | None = None) → None

Prints the name of the smooth terms included in the model of a given parameter.

After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. < pen_cutoff will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. < pen_cutoff can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).

References:

Marra & Wood (2011). Practical variable selection for generalized additive models.

Parameters:

model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GSMM, GAMMLSS, or GAMM model
par (int, optional) – Distribution parameter for which to compute p-values. Ignored when model is a GAMM. Defaults to 0
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
ps ([float], optional) – Optional list of p-values per smooth term if these should be printed, defaults to None
Trs ([float], optional) – Optional list of test statistics (based on which the ps were computed) per smooth term if these should be printed, defaults to None

Return type:

None

Draw n samples from multivariate normal with mean \(\boldsymbol{\mu}\) (mu) and covariance matrix \(\boldsymbol{\Sigma}\).

\(\boldsymbol{\Sigma}\) does not need to be provided. Rather the function expects either L (\(\mathbf{L}\) in what follows) or LI (\(\mathbf{L}^{-1}\) in what follows) and scale (\(\phi\) in what follows). These relate to \(\boldsymbol{\Sigma}\) so that \(\boldsymbol{\Sigma}/\phi = \mathbf{L}^{-T}\mathbf{L}^{-1}\) or \(\mathbf{L}\mathbf{L}^T = [\boldsymbol{\Sigma}/\phi]^{-1}\) so that \(\mathbf{L}*(1/\phi)^{0.5}\) is the Cholesky of the precision matrix of \(\boldsymbol{\Sigma}\).

Notably, for models available in mssm L (and LI) have usually be computed for a permuted matrix, e.g., \(\mathbf{P}[\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}] \mathbf{P}^T\) (see Wood & Fasiolo, 2017). Hence for sampling we often need to correct for permutation matrix \(\mathbf{P}\) (P). if LI is provided, then P can be omitted and is assumed to have been used to un-pivot LI already.

Used for example sample the uncorrected posterior \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\boldsymbol{\mu} = \hat{\boldsymbol{\beta}},[\mathbf{X}^T \mathbf{X} + \mathbf{S}_{\lambda}]^{-1}\phi)\) for a GAMM (see Wood, 2017). Based on section 7.4 in Gentle (2009), assuming \(\boldsymbol{\Sigma}\) is \(p*p\) and covariance matrix of uncorrected posterior, samples \(\boldsymbol{\beta}\) are then obtained by computing:

\[\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + [\mathbf{P}^T \mathbf{L}^{-T}* \phi^{0.5}]\mathbf{z}\ \text{where}\ z_i \sim N(0,1)\ \forall i = 1,...,p\]

Alternatively, relying on the fact of equivalence that:

\[[\mathbf{L}^T*(1/\phi)^{0.5}]\mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] = \mathbf{z}\]

we can first solve for \(\mathbf{y}\) in:

\[[\mathbf{L}^T*(1/\phi)^{0.5}] \mathbf{y} = \mathbf{z}\]

followed by computing:

\[ \begin{align}\begin{aligned}\mathbf{y} = \mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}]\\\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + \mathbf{P}^T\mathbf{y}\end{aligned}\end{align} \]

The latter avoids forming \(\mathbf{L}^{-1}\) (which unlike \(\mathbf{L}\) might not benefit from the sparsity preserving permutation \(\mathbf{P}\)). If LI is None, L will thus be used for sampling as outlined in these alternative steps.

Often we care only about a handfull of elements in mu (e.g., the first ones corresponding to “fixed effects’” in a GAMM). In that case we can generate samles only for this sub-set of interest by only using a sub-block of rows of \(\mathbf{L}\) or \(\mathbf{L}^{-1}\) (all columns remain). Argument use can be a np.array containg the indices of elements in mu that should be sampled. Because this only works efficiently when LI is available an error is raised when not use is None and LI is None.

If mu is set to any integer (i.e., not a Numpy array/list) it is automatically treated as 0. For mssm.models.GAMMLSS or mssm.models.GSMM models, scale can be set to 1.

References:

Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).

Gentle, J. (2009). Computational Statistics.

Parameters:

n (int) – Number of samples to generate
mu (int | np.ndarray) – mean of normal distribution as described above
scale (float) – scaling parameter of covariance matrix as described above
P (scp.sparse.csc_array | None) – Permutation matrix or None.
L (scp.sparse.csc_array | None) – Cholesky of precision of scaled covariance matrix as described above.
LI (scp.sparse.csc_array | None, optional) – Inverse of cholesky factor of precision of scaled covariance matrix as described above.
use (list[int] | None, optional) – Indices of parameters in mu for which to generate samples, defaults to None in which case all parameters will be sampled
seed (int | None, optional) – Seed to use for random sample generation, defaults to None

Returns:

Samples from multi-variate normal distribution. In case use is not provided, the returned array will be of shape (p,n) where p==LI.shape[1]. Otherwise, the returned array will be of shape (len(use),n).

Return type:

np.ndarray

mssm.src.python.utils.updateVp(ep: ndarray, ws: ndarray, rGrid: ndarray) → ndarray

Update covariance matrix of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\). REML scores are used to approximate expectation, similar to what was suggested by Greven & Scheipl (2016).

References:

https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models

Parameters:

ep (np.ndarray) – Model estimate log(lambda), i.e., the expectation over rGrid
ws (np.ndarray) – weight associated with each log(lambda) value used for numerical integration
rGrid (np.ndarray) – A 2d array, holding all lambda samples considered so far. Each row is one sample

Returns:

An estimate of the covariance matrix of log(lambda) - 2d array of shape len(mp)*len(mp).

Return type:

np.ndarray