api
mssm.models module
- class mssm.models.GAMM(formula: Formula, family: Family)
Bases:
GAMMLSS
Class to fit Generalized Additive Mixed Models.
Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt #### Binomial model example #### Binomdat = sim3(10000,0.1,family=Binomial(),seed=20) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat) # By default, the Binomial family assumes binary data and uses the logit link. # Count data is also possible though - see the `Binomial` family. model = GAMM(formula,Binomial()) model.fit() # Plot estimated effects on scale of the log-odds plot(model) #### Gaussian model with tensor smooth and p-values #### sim_dat = sim3(n=500,scale=2,c=0,seed=20) formula = Formula(lhs("y"),[i(),f(["x0","x3"],te=True,nk=9),f(["x1"]),f(["x2"])],data=sim_dat) model = GAMM(formula,Gaussian()) model.fit() model.print_smooth_terms(p_values=True) #### Standard linear (mixed) models are also possible #### # *li() with three variables: three-way interaction sim_dat,_ = sim1(100,random_seed=100) # Specify formula with three-way linear interaction and random intercept term formula = Formula(lhs("y"),[i(),*li(["fact","x","time"]),ri("sub")],data=sim_dat) # ... and model model = GAMM(formula,Gaussian()) # then fit model.fit() # get estimates for linear terms model.print_parametric_terms()
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
- Variables:
formulas ([Formula]) – A list including the formula passed to the constructor.
lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with
None
.coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with
None
.preds ([[float]]) – The first index corresponds to the linear predictors for the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with
None
.mus ([[float]]) – The first index corresponds to the estimated value of the mean of the family evaluated for each observation in the training data (after removing NaNs). Initialized with
None
.hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood used during fitting - will be the expected hessian for non-canonical models. Initialized with
None
. :ivar float edf: The model estimated degrees of freedom as a float. Initialized withNone
.edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.res (np.ndarray) – The working residuals of the model (If applicable). Initialized with
None
.Wr (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the root of the Fisher weights at convergence. Initialized with
None
.WN (scp.sparse.csc_array) – For generalized models a diagonal matrix holding the Newton weights at convergence. Initialized with
None
.hessian_obs (scp.sparse.csc_array) – Observed hessian of the log-likelihood at final coefficient estimate. Not updated for strictly additive models (i.e., Gaussian with identity link). Initialized with
None
.rho (float) – Optional auto-correlation at lag 1 parameter used during estimation. Initialized with
None
.res_ar (np.ndarray) – Holding the working residuals of the model corrected for any auto-correlation parameter used during estimation. Initialized with
None
.
- fit(max_outer: int = 200, max_inner: int = None, conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 2, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', restart: bool = False, method: str = 'QR', check_cond: int = 1, progress_bar: bool = True, n_cores: int = 10, offset: float | ndarray | None = None, rho: float | None = None)
Fit the specified model.
Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see the ‘Big model’ example below.
Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * ########## Big Model ########## dat = pd.read_csv('https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv') # mssm requires that the data-type for variables used as factors is 'O'=object dat = dat.astype({'series': 'O', 'cond':'O', 'sub':'O', 'series':'O'}) formula = Formula(lhs=lhs("y"), # The dependent variable - here y! terms=[i(), # The intercept, a l(["cond"]), # For cond='b' f(["time"],by="cond",constraint=ConstType.QR), # to-way interaction between time and cond; one smooth over time per cond level f(["x"],by="cond",constraint=ConstType.QR), # to-way interaction between x and cond; one smooth over x per cond level f(["time","x"],by="cond",constraint=ConstType.QR,nk=9), # three-way interaction fs(["time"],rf="sub")], # Random non-linear effect of time - one smooth per level of factor sub data=dat, print_warn=False,find_nested=False) model = GAMM(formula,Gaussian()) # To speed up estimation, use the following key-word arguments: model.fit(method="Chol",max_inner=1) # max_inner only matters for Generalized models (i.e., non-Gaussian) - but for those will often be much faster ########## ar1 model (without resets per time-series) ########## formula = Formula(lhs=lhs("y"), terms=[i(), l(["cond"]), f(["time"],by="cond"), f(["x"],by="cond"), f(["time","x"],by="cond")], data=dat, print_warn=False, series_id=None) # No series identifier passed to formula -> ar1 model does not reset! model = GAMM(formula,Gaussian()) model.fit(rho=0.99) # Visualize the un-corrected residuals: plot_val(model,resid_type="Pearson") # And the corrected residuals: plot_val(model,resid_type="ar1") ########## ar1 model (with resets per time-series) ########## formula = Formula(lhs=lhs("y"), terms=[i(), l(["cond"]), f(["time"],by="cond"), f(["x"],by="cond"), f(["time","x"],by="cond")], data=dat, print_warn=False, series_id='series') # 'series' variable identifies individual time-series -> ar1 model resets per series! model = GAMM(formula,Gaussian()) model.fit(rho=0.99) # Visualize the un-corrected residuals: plot_val(model,resid_type="Pearson") # And the corrected residuals: plot_val(model,resid_type="ar1")
- Parameters:
max_outer (int,optional) – The maximum number of fitting iterations. Defaults to 200.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step updating the coefficients for Generalized models. Defaults to 500 for non ar1 models.
conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
exclude_lambda (bool,optional) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
method (str,optional) – Which method to use to solve for the coefficients. (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but then also pivots for stability in order to get an estimate of rank defficiency. This takes substantially longer. This argument is ignored if
len(self.formulas[0].file_paths)>0
that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to “QR”.check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Whencheck_cond=2
, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosenmethod
. Is ignored, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to 1.progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
offset (float or np.ndarray,optional) – Mimics the behavior of the
offset
argument forgam
inmgcv
in R. If a value is provided here (can either be a float or a numpy.array of shape (-1,1) - if it is an array, then the first dimension has to match the number of observations in the data. NANs present in the dependent variable will be excluded from the offset vector.) then it is consistently added to the linear predictor during estimation. It will not be used by any other function of theGAMM
class (e.g., for prediction). This argument is ignored iflen(self.formulas[0].file_paths)>0
that is, if \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Defaults to None.rho (float,optional) – Optional correlation parameter for an “ar1 residual model”. Essentially mimics the behavior of the
rho
paramter for thebam
function inmgcv
. Note, if you want to re-start the ar1 process multiple times (for example because you work with time-series data and have multiple time-series) then you must pass theseries.id
argument to theFormula
used for this model. Defaults to None.
- get_llk(penalized: bool = True, ext_scale: float | None = None) float | None
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data. LLK can optionally be evaluated for an external scale parameter
ext_scale
.Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
ext_scale (float, optional) – Optionally provide an external scale parameter at which to evaluate the log-likelihood, defaults to None
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- Returns:
llk score
- Return type:
float or None
- get_mmat(use_terms: list[int] | None = None) csc_array
Returns exaclty the model matrix used for fitting as a scipy.sparse.csc_array. Will throw an error when called for a model for which the model matrix was never former completely - i.e., when \(\mathbf{X}^T\mathbf{X}\) was formed iteratively for estimation, by setting the
file_paths
argument of theFormula
to a non-empty list.Optionally, all columns not corresponding to terms for which the indices are provided via
use_terms
can be zeroed.- Parameters:
use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely
- Returns:
Model matrix \(\mathbf{X}\) used for fitting.
- Return type:
scp.sparse.csc_array
- get_pars() tuple[ndarray | None, float | None]
Returns a tuple. The first entry is a np.ndarray with all estimated coefficients. The second entry is the estimated scale parameter.
Will instead return
(None,None)
if called before fitting.- Returns:
Model coefficients and scale parameter that were estimated
- Return type:
(np.ndarray,float) or (None, None)
- get_reml() float
Get’s the (Laplace approximate) REML (Restricted Maximum Likelihood) score (as a float) for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
TypeError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid(type: str = 'Pearson') ndarray
Get different types of residuals from the estimated model.
By default (
type='Pearson'
) this returns the residuals \(e_i = y_i - \mu_i\) for additive models and the pearson/working residuals \(w_i^{0.5}*(z_i - \eta_i)\) (see Wood, 2017 sections 3.1.5 & 3.1.7) for generalized additive models. Here \(w_i\) are the Fisher scoring weights, \(z_i\) the pseudo-data point for each observation, and \(\eta_i\) is the linear prediction (i.e., \(g(\mu_i)\) - where \(g()\) is the link function) for each observation.If
type= "Deviance"
, the deviance residuals are returned, which are equivalent to \(sign(y_i - \mu_i)*D_i^{0.5}\), where \(\sum_{i=1,...N} D_i\) equals the model deviance (see Wood 2017, section 3.1.7). Additionally, if the model was estimated withrho!=None
,type="ar1"
returns the standardized working residuals corrected for lag1 auto-correlation. These are best compared to the standard working residuals.Throws an error if called before model was fitted, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which
model.rho==None
.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
type (str,optional) – The type of residual to return for a Generalized model, “Pearson” by default, but can be set to “Deviance” and (for some models) to “ar1” as well.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed, when requesting an unsupported type, or when requesting ‘ar1’ residuals for a model for which
model.rho==None
.- Returns:
Empirical residual vector in a numpy array
- Return type:
np.ndarray
- predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]
Make a prediction using the fitted model for new data
n_dat
.But only using the terms indexed by
use_terms
. Importantly, predictions and standard errors are always returned on the scale of the linear predictor. When estimating a Generalized Additive Model, the mean predictions and standard errors (often referred to as the ‘response’-scale predictions) can be obtained by applying the link inverse function to the predictions and the CI-bounds on the linear predictor scale (DO NOT transform the standard error first and then add it to the transformed predictions - only on the scale of the linear predictor is the standard error additive). See examples below.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Fit a Gamma Gam Gammadat = sim3(500,2,family=Gamma(),seed=0) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat) # By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used. model = GAMM(formula,Gamma()) model.fit() # Now make prediction for `f["x0"]` new_dat = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":np.linspace(0,1,30), "x2":np.linspace(0,1,30), "x3":np.linspace(0,1,30)}) f0,X_f,ci = model.predict([1],new_dat,ci=True) # Can also use the plot function from mssmViz plot(model,which=[1])
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).ci (bool, optional) – Whether the standard error
se
for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred
-se
,pred
+se
]whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [
pred
-se
,pred
+se
]n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Returns:
A tuple with 3 entries. The first entry is the prediction
pred
based on the new datan_dat
. The second entry is the model matrix built forn_dat
that was post-multiplied with the model coefficients to obtainpred
. The third entry isNone
ifci``==``False
else the standard errorse
in the prediction.- Return type:
(np.ndarray,scp.sparse.csc_array,np.ndarray or None)
- predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]
Get the difference in the predictions for two datasets.
Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case,
dat1
anddat2
should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see thepredict()
method for details.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Fit a Gamma Gam Gammadat = sim3(500,2,family=Gamma(),seed=0) # Include tensor smooth in model of log(mean) formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],data=Gammadat) # By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used. model = GAMM(formula,Gamma()) model.fit() # Now we want to know whether the effect of x0 is different for two values of x1: new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":[0.25 for _ in range(30)], "x2":np.linspace(0,1,30), "x3":np.linspace(0,1,30)}) new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":[0.75 for _ in range(30)], "x2":np.linspace(0,1,30), "x3":np.linspace(0,1,30)}) # Now we can get the predicted difference of the effect of x0 for the two values of x1: pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0) # mssmViz also has a convenience function to visualize it: plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/get_difference.html
- Parameters:
dat1 – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this
dat1
will be returned.use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
- Returns:
A tuple with 2 entries. The first entry is the predicted difference (between the two data sets
dat1
&dat2
)diff
. The second entry is the standard errorse
of the predicted difference. The difference CI is then [diff
-se
,diff
+se
]- Return type:
(np.ndarray,np.ndarray)
- print_parametric_terms()
Prints summary output for linear/parametric terms in the model, not unlike the one returned in R when using the
summary
function formgcv
models.For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)
Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False
- sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray
Obtain
n_ps
samples from posterior \([\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where V is \([\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}*/\phi\) (see Wood, 2017; section 6.10). To obtain samples for \(\boldsymbol{\beta}\), setdeviations
to false.see
sample_MVN()
for more details.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Fit a Gamma Gam Gammadat = sim3(500,2,family=Gamma(),seed=0) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Gammadat) # By default, the Gamma family assumes that the model predictions match log(\mu_i), i.e., a log-link is used. model = GAMM(formula,Gamma()) model.fit() # Now get model matrix for a couple of example covariates new_dat = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":np.linspace(0,1,30), "x2":np.linspace(0,1,30), "x3":np.linspace(0,1,30)}) f0,X_f,ci = model.predict([1],new_dat,ci=True) # Get `use_post` to only identify coefficients related to `f(["x0"])` - that way we can efficiently sample the # posterior only for `f(["x0"])`. If you want to sample all coefficients, simply set `use_post=None`. use_post = X_f.sum(axis=0) != 0 use_post = np.arange(0,X_f.shape[1])[use_post] print(use_post) # `use_post` can now be passed to `sample_post`: post = model.sample_post(10000,use_post,deviations=False,seed=0,par=0) # Since we set deviations to false post has coefficient samples and can simply be post-multiplied to # get samples of `f(["x0"])` - importantly, post has a different shape than X_f, so we need to account for that post_f = X_f[:,use_post] @ post # Note: samples are also on scale of linear predictor! plt.plot(new_dat["x0"],f0,color="black",linewidth=2) for sidx in range(50): plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2) plt.show()
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. By default all coefficients are sampled.
deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
- Returns:
An np.ndarray of dimension
[len(use_post),n_ps]
containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves/predictions.- Return type:
np.ndarray
- class mssm.models.GAMMLSS(formulas: list[Formula], family: GAMLSSFamily)
Bases:
GSMM
Class to fit Generalized Additive Mixed Models of Location Scale and Shape (see Rigby & Stasinopoulos, 2005).
Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = \alpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = \alpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now define the model and fit! model = GAMMLSS(formulas,family) model.fit() # Get total coef vector & split them coef = model.coef split_coef = np.split(coef,model.coef_split_idx) # Get coef associated with the mean coef_m = split_coef[0] # and with the scale parameter coef_s = split_coef[1] # Similarly, `preds` holds linear predictions for m & s pred_m = model.preds[0] pred_s = model.preds[1] # While `mu` holds the estimated fitted parameters # (i.e., `preds` after applying the inverse of the link function of each parameter) mu_m = model.mus[0] mu_s = model.mus[1]
- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
formulas ([Formula]) – A list of formulas for the GAMMLS model
family (GAMLSSFamily) – A
GAMLSSFamily
. CurrentlyGAUMLSS
,MULNOMLSS
, andGAMMALS
are supported.
- Variables:
formulas ([Formula]) – The list of formulas passed to the constructor.
lvi (scp.sparse.csc_array) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix. Initialized with
None
.coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with
None
.preds ([[float]]) – The linear predictors for every parameter of
family
evaluated for each observation in the training data (after removing NaNs). Initialized withNone
.mus ([[float]]) – The predicted means for every parameter of
family
evaluated for each observation in the training data (after removing NaNs). Initialized withNone
.hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to
hessian - diag*eps
ifself.info.eps > 0
after fitting). Initialized withNone
.edf (float) – The model estimated degrees of freedom as a float. Initialized with
None
.edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of
family
. See the examples. Initialized after fitting!overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.res (np.ndarray) – The working residuals of the model (If applicable). Initialized with
None
.
- fit(max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int = 2, restart: bool = False, method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), should_keep_drop: bool = True, prefit_grad: bool = True, repara: bool = True, progress_bar: bool = True, n_cores: int = 10, seed: int = 0, init_lambda: list[float] | None = None)
Fit the specified model.
Note: Keyword arguments are initialized to maximise stability. For faster estimation set
method='Chol'
.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = \alpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = \alpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now define the model and fit! model = GAMMLSS(formulas,family) model.fit() # Now fit again via Cholesky model.fit(method="Chol")
- Parameters:
max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to
max_inner
.conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models involving heavily penalized functions. Disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. Defaults to “QR/Chol”.
check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Defaults to 1.piv_tol (float,optional) – Deprecated.
should_keep_drop (bool,optional) – Only used when
method in ["QR/Chol","LU/Chol","Direct/Chol"]
. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.
repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True.
progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None
- get_llk(penalized: bool = True) float | None
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.
Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
- Returns:
llk score
- Return type:
float or None
- get_mmat(use_terms: list[int] | None = None, par: int | None = None) list[csc_array] | csc_array
Returns a list containing exaclty the model matrices used for fitting as a
scipy.sparse.csc_array
. Will raise an error when fitting was not completed before calling this function.Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting
par
to the desired index, instead ofNone
. Additionally, all columns not corresponding to terms for which the indices are provided viause_terms
can optionally be zeroed.- Parameters:
use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
par (int or None, optional) – The index corresponding to the parameter of the distribution for which to obtain the model matrix. Setting this to
None
means all matrices are returned in a list, defaults to None.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
Model matrices \(\mathbf{X}\) used for fitting - one per parameter of
self.family
or a single model matrix for a specific parameter.- Return type:
[scp.sparse.csc_array] or scp.sparse.csc_array
- get_pars() ndarray
Returns a list containing all coefficients estimated for the model. Use
self.coef_split_idx
to split the vector into separate subsets per distribution parameter.Will return None if called before fitting was completed.
- Returns:
Model coefficients - before splitting!
- Return type:
[float] or None
- get_reml() float
Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid(**kwargs) ndarray
Returns standarized residuals for GAMMLSS models (Rigby & Stasinopoulos, 2005).
The computation of the residual vector will differ between different GAMMLSS models and is thus implemented as a method by each GAMMLSS family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking \(N\) independent samples from \(N(0,1)\).
Additional arguments required by the specific
GAMLSSFamily.get_resid()
method can be passed along viakwargs
.Note: Families for which no residuals are available can return None.
- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – An error is raised in case the residuals are to be computed for a Multinomial GAMMLSS model, which is currently not supported.
ValueError – An error is raised in case the residuals are requested before the model has been fit.
- Returns:
A np.ndarray of standardized residuals that should be \(\sim N(0,1)\) if the model is correct.
- Returns:
Standardized residual vector as array of shape (-1,1)
- Return type:
np.ndarray
- predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]
Make a prediction using the fitted model for new data
n_dat
using only the terms indexed byuse_terms
and for distribution parameterpar
.Importantly, predictions and standard errors are always returned on the scale of the linear predictor. For the Gaussian GAMMLSS model, the predictions for the standard deviation will for example usually (i.e., for the default link choices) reflect the log of the standard deviation. To get the predictions on the standard deviation scale, one could then apply the inverse log-link function to the predictions and the CI-bounds on the scale of the respective linear predictor. See the examples below.
Examples:
# Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = \alpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = \alpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now fit model = GAMMLSS(formulas,family) model.fit() new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)}) # Mean predictions don't have to be transformed since the Identity link is used for this predictor. mu_mean,_,b_mean = model.predict(None,new_dat,ci=True) # These can be used for confidence intervals: mean_upper_CI = mu_mean + b_mean mean_lower_CI = mu_mean - b_mean # Standard deviation predictions do have to be transformed - by default they are on the log-scale. eta_sd,_,b_sd = model.predict(None,new_dat,ci=True,par=1) mu_sd = model.family.links[1].fi(eta_sd) # Index to `links` is 1 because the sd is the second parameter! # These can be used for approximate confidence intervals: sd_upper_CI = model.family.links[1].fi(eta_sd + b_sd) sd_lower_CI = model.family.links[1].fi(eta_sd - b_sd)
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or
None
in which case all terms will be used.n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).ci (bool, optional) – Whether the standard error
se
for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred
-se
,pred
+se
]whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [
pred
-se
,pred
+se
]n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0
- Raises:
ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 3 entries. The first entry is the prediction
pred
based on the new datan_dat
. The second entry is the model matrix built forn_dat
that was post-multiplied with the model coefficients to obtainpred
. The third entry isNone
ifci``==``False
else the standard errorse
in the prediction.- Return type:
(np.ndarray,scp.sparse.csc_array,np.ndarray or None)
- predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]
Get the difference in the predictions for two datasets and for distribution parameter
par
. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case,dat1
anddat2
should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see thepredict()
method for details.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate 500 data points GAUMLSSDat = sim9(500,1,seed=20) # We include a tensor smooth in the model of the mean formula_m = Formula(lhs("y"), [i(),f(["x0","x1"],te=True)], data=GAUMLSSDat) # The model of the standard deviation remains the same formula_sd = Formula(lhs("y"), [i(),f(["x0"])], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now fit model = GAMMLSS(formulas,family) model.fit() # Now we want to know whether the effect of x0 is different for two values of x1: new_dat1 = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":[0.25 for _ in range(30)]}) new_dat2 = pd.DataFrame({"x0":np.linspace(0,1,30), "x1":[0.75 for _ in range(30)]}) # Now we can get the predicted difference of the effect of x0 for the two values of x1: pred_diff,se = model.predict_diff(new_dat1,new_dat2,use_terms=[1],par=0) # mssmViz also has a convenience function to visualize it: plot_diff(new_dat1,new_dat2,["x0"],model,use=[1],response_scale=False)
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/get_difference.html
- Parameters:
dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
use_terms (list[int] or None) – The indices corresponding to the terms in the formula of the parameter that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter for which to make the prediction (e.g., 0 = mean), defaults to 0
- Raises:
ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 2 entries. The first entry is the predicted difference (between the two data sets
dat1
&dat2
)diff
. The second entry is the standard errorse
of the predicted difference. The difference CI is then [diff
-se
,diff
+se
]- Return type:
(np.ndarray,np.ndarray)
- print_parametric_terms()
Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.
For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)
Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False
- sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray
Obtain
n_ps
samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the distribution (see argumentpar
), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}\), setdeviations
to false.see
sample_MVN()
for more details.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate 500 data points GAUMLSSDat = sim6(500,seed=20) # We need to model the mean: \mu_i = \alpha + f(x0) formula_m = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # and the standard deviation as well: log(\sigma_i) = \alpha + f(x0) formula_sd = Formula(lhs("y"), [i(),f(["x0"],nk=10)], data=GAUMLSSDat) # Collect both formulas formulas = [formula_m,formula_sd] # Create Gaussian GAMMLSS family with identity link for mean # and log link for sigma family = GAUMLSS([Identity(),LOG()]) # Now fit model = GAMMLSS(formulas,family) model.fit() new_dat = pd.DataFrame({"x0":np.linspace(0,1,30)}) # Now obtain the estimate for `f(["x0"],nk=10)` and the model matrix corresponding to it! # Note, that we set `use_terms = [1]` - so all columns in X_f not belonging to `f(["x0"],nk=10)` # (e.g., the first one, belonging to the offset) are zeroed. mu_f,X_f,_ = model.predict([1],new_dat,ci=True) # Now we can sample from the posterior of `f(["x0"],nk=10)` in the model of the mean: post = model.sample_post(10000,None,deviations=False,seed=0,par=0) # Since we set deviations to false post has coefficient samples and can simply be post-multiplied to # get samples of `f(["x0"],nk=10)` post_f = X_f @ post # Plot the estimated effect and 50 posterior samples plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2) for sidx in range(50): plt.plot(new_dat["x0"],post_f[:,sidx],alpha=0.2) plt.show() # In this case, we are not interested in the offset, so we can omit it during the sampling step (i.e., to not sample coefficients # for it): # `use_post` identifies only coefficients related to `f(["x0"],nk=10)` use_post = X_f.sum(axis=0) != 0 use_post = np.arange(0,X_f.shape[1])[use_post] print(use_post) # `use_post` can now be passed to `sample_post`: post2 = model.sample_post(10000,use_post,deviations=False,seed=0,par=0) # Importantly, post2 now has a different shape - which we have to take into account when multiplying. post_f2 = X_f[:,use_post] @ post2 plt.plot(new_dat["x0"],mu_f,color="black",linewidth=2) for sidx in range(50): plt.plot(new_dat["x0"],post_f2[:,sidx],alpha=0.2) plt.show()
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter
par
, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
par (int) – The index corresponding to the distribution parameter for which to make the prediction (e.g., 0 = mean)
- Returns:
An np.ndarray of dimension
[len(use_post),n_ps]
containing the posterior samples. Can simply be post-multiplied with model matrix \(\mathbf{X}\) to generate posterior sample curves.- Return type:
np.ndarray
- class mssm.models.GSMM(formulas: list[Formula], family: GSMMFamily)
Bases:
object
Class to fit General Smooth/Mixed Models (see Wood, Pya, & Säfken; 2016). Estimation is possible via exact Newton method for coefficients of via L-qEFS update (see Krause et al., (submitted) and example below).
Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt class NUMDIFFGENSMOOTHFamily(GSMMFamily): # Implementation of the ``GSMMFamily`` class that uses finite differencing to obtain the # gradient of the likelihood to estimate a Gaussian GAMLSS via the general smooth code and # the L-qEFS update by Krause et al. (in preparation). # References: # - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models. # - Nocedal & Wright (2006). Numerical Optimization. Springer New York. def __init__(self, pars: int, links:[Link]) -> None: super().__init__(pars, links) def llk(self, coef, coef_split_idx, ys, Xs): # Likelihood for a Gaussian GAM(LSS) - implemented so # that the model can be estimated using the general smooth code. y = ys[0] split_coef = np.split(coef,coef_split_idx) eta_mu = Xs[0]@split_coef[0] eta_sd = Xs[1]@split_coef[1] mu_mu = self.links[0].fi(eta_mu) mu_sd = self.links[1].fi(eta_sd) family = GAUMLSS(self.links) llk = family.llk(y,mu_mu,mu_sd) return llk # Simulate 500 data points sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False) # We need to model the mean: \mu_i formula_m = Formula(lhs("y"), [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])], data=sim_dat) # And for sd - here constant formula_sd = Formula(lhs("y"), [i()], data=sim_dat) # Collect both formulas formulas = [formula_m,formula_sd] links = [Identity(),LOGb(-0.001)] # Now define the general family + model and fit! gsmm_fam = NUMDIFFGENSMOOTHFamily(2,links) model = GSMM(formulas=formulas,family=gsmm_fam) # Fit with SR1 bfgs_opt={"gtol":1e-9, "ftol":1e-9, "maxcor":30, "maxls":200, "maxfun":1e7} model.fit(method='qEFS',bfgs_options=bfgs_opt) # Extract all coef coef = model.coef # Now split them to get separate lists per parameter of the log-likelihood (here mean and scale) # split_coef[0] then holds the coef associated with the first parameter (here the mean) and so on split_coef = np.split(coef,model.coef_split_idx)
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
formulas ([Formula]) – A list of formulas, one per parameter of the likelihood that is to be modeled as a smooth model
family (GSMMFamily) – A GSMMFamily family.
- Variables:
formulas ([Formula]) – The list of formulas passed to the constructor.
lvi (scp.sparse.csc_array | None) – The inverse of the Cholesky factor of the conditional model coefficient covariance matrix - or None, in case the
L-BFGS-B
optimizer was used andform_VH
was set to False when callingmodel.fit()
. Initialized withNone
.lvi_linop (scp.sparse.linalg.LinearOperator) – A
scipy.sparse.linalg.LinearOperator
of the conditional model coefficient covariance matrix (not the root) - or None. Only available in case theL-BFGS-B
optimizer was used andform_VH
was set to False when callingmodel.fit()
.coef (np.ndarray) – Contains all coefficients estimated for the model. Shape of the array is (-1,1). Initialized with
None
.preds ([[float]]) – The linear predictors for every parameter of
family
evaluated for each observation in the training data (after removing NaNs). Initialized withNone
.mus ([[float]]) – The predicted means for every parameter of
family
evaluated for each observation in the training data (after removing NaNs). Initialized withNone
.hessian (scp.sparse.csc_array) – Estimated hessian of the log-likelihood (will correspond to
hessian - diag*eps
ifself.info.eps > 0
after fitting). Initialized withNone
.edf (float) – The model estimated degrees of freedom as a float. Initialized with
None
.edf1 (float) – The model estimated degrees of freedom as a float corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.term_edf ([float]) – The estimated degrees of freedom per smooth term. Initialized with
None
.term_edf1 ([float]) – The estimated degrees of freedom per smooth term corrected for smoothness bias. Set by the
approx_smooth_p_values()
function, the first time it is called. Initialized withNone
.penalty (float) – The total penalty applied to the model deviance after fitting as a float. Initialized with
None
.coef_split_idx ([int]) – The index at which to split the overall coefficient vector into separate lists - one per parameter of
family
. See the examples. Initialized after fitting!overall_penalties ([LambdaTerm]) – Contains all penalties estimated for the model. Initialized with
None
.info (Fit_info) – A
Fit_info
instance, with information about convergence (speed) of the model.
- fit(init_coef: ndarray | None = None, max_outer: int = 200, max_inner: int = 500, min_inner: int | None = None, conv_tol: float = 1e-07, extend_lambda: bool = False, extension_method_lam: str = 'nesterov2', control_lambda: int | None = None, restart: bool = False, optimizer: str = 'Newton', method: str = 'QR/Chol', check_cond: int = 1, piv_tol: float = np.float64(0.23651441168139897), progress_bar: bool = True, n_cores: int = 10, seed: int = 0, drop_NA: bool = True, init_lambda: list[float] | None = None, form_VH: bool = True, use_grad: bool = False, build_mat: list[bool] | None = None, should_keep_drop: bool = True, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = False, prefit_grad: bool = True, repara: bool = None, init_bfgs_options: dict | None = None, bfgs_options: dict | None = None)
Fit the specified model.
Note: Keyword arguments are initialized to maximise stability. For faster configurations (necessary for larger models) see examples below.
- Parameters:
init_coef (np.ndarray,optional) – An initial estimate for the coefficients. Must be a numpy array of shape (-1,1). Defaults to None.
max_outer (int,optional) – The maximum number of fitting iterations.
max_inner (int,optional) – The maximum number of fitting iterations to use by the inner Newton step for coefficients.
min_inner (int,optional) – The minimum number of fitting iterations to use by the inner Newton step for coefficients. By default set to
max_inner
.conv_tol (float,optional) – The relative (change in penalized deviance is compared against
conv_tol
* previous penalized deviance) criterion used to determine convergence.extend_lambda (bool,optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary for models with heavily penalized functions. Disabled by default.
extension_method_lam (str,optional) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov2’ by default.
control_lambda (int,optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For
method != 'qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when settingextend_lambda=True
). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. Formethod=='qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed. Set to 2 by default ifmethod != 'qEFS'
and otherwise to 1.restart (bool,optional) – Whether fitting should be resumed. Only possible if the same model has previously completed at least one fitting iteration.
optimizer (str,optional) – Deprecated. Defaults to “Newton”
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). “Chol” relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol” or “LU/Chol”. In that case the coefficients are still obtained via a Cholesky decomposition but a QR/LU decomposition is formed afterwards to check for rank deficiencies and to drop coefficients that cannot be estimated given the current smoothing parameter values. This takes substantially longer. If this is set to
'qEFS'
, then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “QR/Chol”.check_cond (int,optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Defaults to 1.piv_tol (float,optional) – Deprecated.
progress_bar (bool,optional) – Whether progress should be displayed (convergence info and time estimate). Defaults to True.
n_cores (int,optional) – Number of cores to use during parts of the estimation that can be done in parallel. Defaults to 10.
seed (int,optional) – Seed to use for random parameter initialization. Defaults to 0
drop_NA (bool,optional) – Whether to drop rows in the model matrices and observations vectors corresponding to NAs in the observation vectors. Set this to False if you want to handle NAs yourself in the likelihood function. Defaults to True.
init_lambda ([float],optional) – A set of initial \(\lambda\) parameters to use by the model. Length of list must match number of parameters to be estimated. Defaults to None
form_VH (bool,optional) – Whether to explicitly form matrix
V
- the estimated inverse of the negative Hessian of the penalized likelihood - andH
- the estimate of the Hessian of the log-likelihood - when using theqEFS
method. If set to False, onlyV
is returned - as ascipy.sparse.linalg.LinearOperator
- and available inself.lvi
. Additionally,self.hessian
will then be equal toNone
. Note, that this will break default prediction/confidence interval methods - so do not call them. Defaults to Trueuse_grad (bool,optional) – Deprecated.
build_mat ([bool], optional) – An (optional) list, containing one bool per
mssm.src.python.formula.Formula
inself.formulas
- indicating whether the corresponding model matrix should be built. Useful if multiple formulas specify the same model matrix, in which case only one needs to be built. Only the matrices actually built are then passed down to the likelihood/gradient/hessian function inXs
. Defaults to None, which means all model matrices are built.should_keep_drop (bool,optional) – Only used when
method in ["QR/Chol","LU/Chol","Direct/Chol"]
. If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations. If set to False, this is determined anew at every iteration - costly! Defaults to True.gamma (float,optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models! Defaults to 1.
qEFSH (str,optional) – Should the hessian approximation use a symmetric rank 1 update (
qEFSH='SR1'
) that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'
) . Defaults to ‘SR1’.overwrite_coef (bool,optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when
method='qEFS'
. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect whenqEFS_init_converge=True
. Defaults to True.max_restarts (int,optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when
method='qEFS'
. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more. Defaults to 0.qEFS_init_converge (bool,optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if
overwrite_coef=True
) to initialize the q-EFS solver. Ignored ifmethod!='qEFS'
. Defaults to False.prefit_grad (bool,optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients. Defaults to True.
repara (bool,optional) – Whether to re-parameterize the model (for every proposed update to the regularization parameters) via the steps outlined in Appendix B of Wood (2011) and suggested by Wood et al., (2016). This greatly increases the stability of the fitting iteration. Defaults to True if
method != 'qEFS'
else False.init_bfgs_options (dict,optional) – An optional dictionary holding the same key:value pairs that can be passed to
bfgs_options
but pased to the optimizer of the un-penalized problem. If this is None, it will be set to a copy ofbfgs_options
. Only has an effect whenqEFS_init_converge=True
. Defaults to None.bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
. If none are provided, thegtol
argument will be initialized toconv_tol
. Note also, that in any case themaxiter
argument is automatically set tomax_inner
. Defaults to None.
- Raises:
ValueError – Will throw an error when
optimizer
is not ‘Newton’.
- get_llk(penalized: bool = True, drop_NA: bool = True) float | None
Get the (penalized) log-likelihood of the estimated model (float or None) given the trainings data.
Will instead return
None
if called before fitting.- Parameters:
penalized (bool, optional) – Whether the penalized log-likelihood should be returned or the regular log-likelihood, defaults to True
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped, defaults to True
- Returns:
llk score
- Return type:
float or None
- get_mmat(use_terms: list[int] | None = None, drop_NA: bool = True, par: int | None = None) list[csc_array] | csc_array
By default, returns a list containing exactly the model matrices used for fitting as a
scipy.sparse.csc_array
. Will raise an error when fitting was not completed before calling this function.Optionally, the model matrix associated with a specific parameter of the log-likelihood can be obtained by setting
par
to the desired index, instead ofNone
. Additionally, all columns not corresponding to terms for which the indices are provided viause_terms
are zeroed in caseuse_terms is not None
.- Parameters:
use_terms ([int], optional) – Optionally provide indices of terms in the formual that should be created. If this argument is provided columns corresponding to any term not included in this list will be zeroed, defaults to None
drop_NA (bool, optional) – Whether rows in the model matrix corresponding to NAs in the dependent variable vector should be dropped, defaults to True
par (int or None, optional) – The index corresponding to the parameter of the log-likelihood for which to obtain the model matrix. Setting this to
None
means all matrices are returned in a list, defaults to None.
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
Model matrices \(\mathbf{X}\) used for fitting - one per parameter of
self.family
or a single model matrix for a specific parameter.- Return type:
[scp.sparse.csc_array] or scp.sparse.csc_array
- get_pars() ndarray
Returns a list containing all coefficients estimated for the model. Use
self.coef_split_idx
to split the vector into separate subsets per parameter of the log-likelihood.Will return None if called before fitting was completed.
- Returns:
Model coefficients - before splitting!
- Return type:
[float] or None
- get_reml(drop_NA: bool = True) float
Get’s the Laplcae approximate REML (Restrcited Maximum Likelihood) score for the estimated lambda values (see Wood, 2011).
References:
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped when computing the log-likelihood, defaults to True
- Raises:
ValueError – Will throw an error when called before the model was fitted/before model penalties were formed.
- Returns:
REML score
- Return type:
float
- get_resid(drop_NA: bool = True, **kwargs) ndarray
The computation of the residual vector will differ between different
GSMM
models and is thus implemented as a method by eachGSMMFamily
family. These should be consulted to get more details. In general, if the model is specified correctly, the returned vector should approximately look like what could be expected from taking independent samples from \(N(0,1)\).Additional arguments required by the specific
GSMMFamily.get_resid()
method can be passed along viakwargs
.Note: Families for which no residuals are available can return None.
- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
drop_NA (bool, optional) – Whether rows in the model matrices corresponding to NAs in the dependent variable vector should be dropped from the model matrices, defaults to True
- Raises:
ValueError – An error is raised in case the residuals are requested before the model has been fit.
- Returns:
vector of standardized residuals of shape (-1,1). Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation).
- Return type:
np.ndarray
- predict(use_terms: list[int] | None, n_dat: DataFrame, alpha: float = 0.05, ci: bool = False, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, csc_array, ndarray | None]
Make a prediction using the fitted model for new data
n_dat
using only the terms indexed byuse_terms
and for parameterpar
of the log-likelihood.Importantly, predictions and standard errors are always returned on the scale of the linear predictor.
See the
GAMMLSS.predict()
function for code examples.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.n_dat (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the
use_terms
argument.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (
alpha
/2) will be used to determine the critical cut-off value according to a N(0,1).ci (bool, optional) – Whether the standard error
se
for credible interval (CI; see Wood, 2017) calculation should be returned. The CI is then [pred
-se
,pred
+se
]whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False. The CI is then [
pred
-se
,pred
+se
]n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0
- Raises:
ValueError – An error is raised in case the standard error is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 3 entries. The first entry is the prediction
pred
based on the new datan_dat
. The second entry is the model matrix built forn_dat
that was post-multiplied with the model coefficients to obtainpred
. The third entry isNone
ifci``==``False
else the standard errorse
in the prediction.- Return type:
(np.ndarray,scp.sparse.csc_array,np.ndarray or None)
- predict_diff(dat1: DataFrame, dat2: DataFrame, use_terms: list[int] | None, alpha: float = 0.05, whole_interval: bool = False, n_ps: int = 10000, seed: int | None = None, par: int = 0) tuple[ndarray, ndarray]
Get the difference in the predictions for two datasets and for parameter
par
of the log-likelihood. Useful to compare a smooth estimated for one level of a factor to the smooth estimated for another level of a factor. In that case,dat1
anddat2
should only differ in the level of said factor. Importantly, predictions and standard errors are again always returned on the scale of the linear predictor - see thepredict()
method for details.See the
GAMMLSS.predict_diff()
function for code examples.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
get_difference
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/get_difference.html
- Parameters:
dat1 (pd.DataFrame) – A pandas DataFrame containing new data for which to make the prediction. Importantly, all variables present in the data used to fit the model also need to be present in this DataFrame. Additionally, factor variables must only include levels also present in the data used to fit the model. If you want to exclude a specific factor from the prediction (for example the factor subject) don’t include the terms that involve it in the use_terms argument.
dat2 (pd.DataFrame) – A second pandas DataFrame for which to also make a prediction. The difference in the prediction between this dat1 will be returned.
use_terms (list[int] or None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float, optional) – The alpha level to use for the standard error calculation. Specifically, 1 - (alpha/2) will be used to determine the critical cut-off value according to a N(0,1).
whole_interval (bool, optional) – Whether or not to adjuste the point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016). Defaults to False.
n_ps (int, optional) – How many samples to draw from the posterior in case the point-wise CI is adjusted to behave like a whole-function interval CI.
seed (int or None, optional) – Can be used to provide a seed for the posterior sampling step in case the point-wise CI is adjusted to behave like a whole-function interval CI.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which to make the prediction, defaults to 0
- Raises:
ValueError – An error is raised in case the predicted difference is to be computed for a Multinomial GAMMLSS model, which is currently not supported.
- Returns:
A tuple with 2 entries. The first entry is the predicted difference (between the two data sets
dat1
&dat2
)diff
. The second entry is the standard errorse
of the predicted difference. The difference CI is then [diff
-se
,diff
+se
]- Return type:
(np.ndarray,np.ndarray)
- print_parametric_terms()
Prints summary output for linear/parametric terms in the model, separately for each parameter of the family’s distribution.
For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows approximately a standardized normal distribution. The corresponding z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- print_smooth_terms(pen_cutoff: float = 0.2, p_values: bool = False, edf1: bool = True)
Prints the name of the smooth terms included in the model. After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
p_values (bool, optional) – Whether approximate p-values should be printed for the smooth terms, defaults to False
edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal, defaults to False
- sample_post(n_ps: int, use_post: list[int] | None = None, deviations: bool = False, seed: int | None = None, par: int = 0) ndarray
Obtain
n_ps
samples from posterior \([\boldsymbol{\beta}_m - \hat{\boldsymbol{\beta}}_m] | \mathbf{y},\boldsymbol{\lambda} \sim N(0,\mathbf{V})\), where \(\mathbf{V}=[-\mathbf{H} + \mathbf{S}_{\lambda}]^{-1}\) (see Wood et al., 2016; Wood 2017, section 6.10), \(\boldsymbol{\beta}_m\) is the set of coefficients in the model of parameter \(m\) of the log-likelihood (see argumentpar
), and \(\mathbf{H}\) is the hessian of the log-likelihood (Wood et al., 2016;). To obtain samples for \(\boldsymbol{\beta}_m\), setdeviations
to false.see
sample_MVN()
for more details and theGAMMLSS.sample_post()
function for code examples.References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n_ps (int,optional) – Number of samples to obtain from posterior.
use_post ([int],optional) – The indices corresponding to coefficients for which to actually obtain samples. Note: an index of 0 indexes the first coefficient in the model of parameter
par
, that is indices have to correspond to columns in the parameter-specific model matrix. By default all coefficients are sampled.deviations (bool,optional) – Whether to return samples of deviations from the estimated coefficients (i.e., \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) or actual samples of coefficients (i.e., \(\boldsymbol{\beta}\)), defaults to False
seed (int,optional) – A seed to use for the sampling, defaults to None
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.
- Returns:
An np.ndarray of dimension
[len(use_post),n_ps]
containing the posterior samples. Ifuse_post is None
,len(use_post)
will match the number of coefficients associated with parameterpar
of the log-likelihood instead. Can simply be post-multiplied with (the subset of columns indicated byuse_post
of) the model matrix \(\mathbf{X}^m\) associated with the parameter \(m\) of the log-likelihood to generate posterior sample curves.- Return type:
np.ndarray
mssm.src.python.compact_rep module
- mssm.src.python.compact_rep.computeH(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray, ndarray]
Computes (explicitly or implicitly) the quasi-Newton approximation to the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)) from the L-BFGS-B optimizer info.
Relies on equations 2.16 in Byrd, Nocdeal & Schnabel (1992).
- References:
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise
`1/y.T@s
from Byrd, Nocdeal & Schnabel (1992).H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by
omega
).explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of four update matrices.
- Returns:
H, either as np.ndarray (
explicit=='True'
) or represented implicitly via four update vectors (also np.ndarrays)- Return type:
np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]
- mssm.src.python.compact_rep.computeHSR1(s: ndarray, y: ndarray, rho: ndarray, H0: csc_array, omega: float = 1, make_psd: bool = False, make_pd: bool = False, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]
Computes, (explicitly or implicitly) the symmetric rank one (SR1) approximation of the negative Hessian of the (penalized) likelihood \(\mathbf{H}\) (\(\mathcal{H}\)).
Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the
make_psd
andmake_pd
arguments.- References:
Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise
`1/y.T@s
from Byrd, Nocdeal & Schnabel (1992).H0 (scipy.sparse.csc_array) – Initial estimate for the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by
omega
).omega (float, optional) – Multiple of the identity matrix used as initial estimate.
make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.
make_pd (bool, optional) – Whether to enforce numeric positive definiteness, not just PSD. Ignored if
make_psd=False
. By default set to False.explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.
- Returns:
H, either as np.ndarray (
explicit=='True'
) or represented implicitly via three update vectors (also np.ndarrays)- Return type:
np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]
- mssm.src.python.compact_rep.computeV(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]
Computes (explicitly or implicitly) the quasi-Newton approximation to the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)) from the L-BFGS-B optimizer info.
Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992).
- References:
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise
`1/y.T@s
from Byrd, Nocdeal & Schnabel (1992).V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by
omega
).explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.
- Returns:
V, either as np.ndarray (
explicit=='True'
) or represented implicitly via three update vectors (also np.ndarrays)- Return type:
np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]
- mssm.src.python.compact_rep.computeVSR1(s: ndarray, y: ndarray, rho: ndarray, V0: csc_array, omega: float = 1, make_psd: bool = False, explicit: bool = True) ndarray | tuple[ndarray, ndarray, ndarray]
Computes (explicitly or implicitly) the symmetric rank one (SR1) approximation of the inverse of the negative Hessian of the (penalized) likelihood \(\mathcal{I}\) (\(\mathbf{V}\)).
Relies on equations 2.16 and 3.13 in Byrd, Nocdeal & Schnabel (1992). Can ensure positive (semi) definiteness of the approximation via an eigen decomposition as shown by Burdakov et al. (2017). This is enforced via the
make_psd
argument.- References:
Burdakov, O., Gong, L., Zikrin, S., & Yuan, Y. (2017). On efficiently combining limited-memory and trust-region techniques. Mathematical Programming Computation, 9(1), 101–134. https://doi.org/10.1007/s12532-016-0109-7
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
s (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the first set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).y (np.ndarray) – np.ndarray of shape (m,p), where p is the number of coefficients, holding the second set
m
of update vectors from Byrd, Nocdeal & Schnabel (1992).rho (np.ndarray) – flattened numpy.array of shape (m,), holding element-wise
`1/y.T@s
from Byrd, Nocdeal & Schnabel (1992).V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian of the negative (penalized) likelihood. Here some multiple of the identity (multiplied by
omega
).omega (float, optional) – Multiple of the identity matrix used as initial estimate.
make_psd (bool, optional) – Whether to enforce PSD as mentioned in the description. By default set to False.
explicit (bool) – Whether or not to return the approximate matrix explicitly or implicitly in form of three update matrices.
- Returns:
V, either as np.ndarray (
explicit=='True'
) or represented implicitly via three update vectors (also np.ndarrays)- Return type:
np.ndarray | tuple[np.ndarray, np.ndarray, np.ndarray]
mssm.src.python.compare module
- mssm.src.python.compare.compare_CDL(model1: GAMM | GAMMLSS | GSMM, model2: GAMM | GAMMLSS | GSMM, correct_V: bool = True, correct_t1: bool | None = None, perform_GLRT: bool = False, nR: int = 250, n_c: int = 1, alpha: int = 0.05, grid: str | None = None, a: float = 1e-07, b: float = 10000000.0, df: int = 40, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', seed: int | None = None, only_expected_edf: bool | None = None, Vp_fidiff: bool = False, use_importance_weights: bool | None = None, prior: Callable | None = None, recompute_H: bool | None = None, compute_Vcc: bool | None = None, bfgs_options: dict = {}) dict
Computes the AIC difference and (optionally) performs an approximate GLRT on twice the difference in unpenalized likelihood between models
model1
andmodel2
(see Wood et al., 2016).For the GLRT to be appropriate
model1
should be set to the model containing more effects andmodel2
should be a nested, simpler, variant ofmodel1
. For the degrees of freedom for the test, the expected degrees of freedom (EDF) of each model are used (i.e., this is the conditional test discussed in Wood (2017: 6.12.4)). The difference between the models in EDF serves as DoF for computing the Chi-Square statistic. In addition,correct_t1
should be set to True, when computing the GLRT.To get the AIC for each model, 2*edf is added to twice the negative (conditional) likelihood (see Wood et al., 2016).
By default (
correct_V=True
),mssm
will attempt to correct the edf for uncertainty in the estimated \(\lambda\) parameters. Which correction is computed depends on the choice for thegrid
argument. Approximately the analytic solution for the correction proposed by Wood, Pya, & Säfken (2016) is computed whengrid='JJJ1'
(the default) - which is exact for strictly Gaussian and some canonical Generalized additive models. This is too costly for very large sparse multi-level models and not exact for more generic models. The MC based alternative available viagrid = 'JJJ2'
addresses the first problem (Important, set:use_importance_weights=False
andonly_expected_edf=True
.). The second MC based alternative available viagrid_type = 'JJJ3'
is most appropriate for more generic models (Theprior
argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set:use_importance_weights=True
andonly_expected_edf=False
). For more details consult themssm.src.python.utils.correct_VB()
function, the examples below, and Krause et al. (submitted).In case any of those correction strategies is too expensive, it might be better to rely on hypothesis tests for individual smooths, confidence intervals, and penalty-based selection approaches instead (see Marra & Wood, 2011 for details on the latter).
In case
correct_t1=True
the EDF will be set to the (smoothness uncertainty corrected in casecorrect_V=True
) smoothness bias corrected exprected degrees of freedom (t1 in section 6.1.2 of Wood, 2017), for the GLRT (based on recomendation given in section 6.12.4 in Wood, 2017). The AIC (Wood, 2017) of both models will still be based on the regular (smoothness uncertainty corrected) edf.The computation here is different to the one performed by the
compareML
function in the R-packageitsadug
- which rather performs a version of the marginal GLRT (also discussed in Wood, 2017: 6.12.4) - and more similar to theanova.gam
implementation provided bymgcv
(particularly ifgrid='JJJ1'). The returned p-value is approximate - very **very** much so if ``correct_V=False
(this should really never be done). Also, the GLRT should not be used to compare models differing in their random effect structures - the AIC is more appropriate for this (see Wood, 2017: 6.12.4).Examples:
### Model comparison and smoothness uncertainty correction for strictly additive model # Simulate some data sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gaussian(),seed=21) # Now fit nested models sim_fit_formula = Formula(lhs("y"), [i(),f(["x0"],nk=20,rp=1),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)], data=sim_fit_dat, print_warn=False) sim_fit_model = GAMM(sim_fit_formula,Gaussian()) sim_fit_model.fit() sim_fit_formula2 = Formula(lhs("y"), [i(),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)], data=sim_fit_dat, print_warn=False) sim_fit_model2 = GAMM(sim_fit_formula2,Gaussian()) sim_fit_model2.fit() # And perform a smoothness uncertainty corrected comparisons cor_result1 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22) # To perform a GLRT and correct the edf for smoothness bias as well (e.g., Wood, 2017) run: cor_result2 = compare_CDL(sim_fit_model,sim_fit_model2,grid='JJJ1',seed=22,perform_GLRT=True,correct_t1=True) ### Model comparison and smoothness uncertainty correction for very large strictly additive model # If the models are quite large (many coefficients) the following (this is the first MC strategy discussed in # section 5.2 of Krause et al. (submitted)) can be much faster: nR = 250 # Number of samples to use for the numeric integration cor_result3 = compare_CDL(sim_fit_model,sim_fit_model2,nR=nR,n_c=10,correct_t1=False,grid='JJJ2', seed=22,only_expected_edf=True,use_importance_weights=False) ### Model comparison and smoothness uncertainty correction for more generic smooth model (GAMM, GAMMLSS, etc.) # We can still rely on grid='JJJ1' (which is why it is the default) but this will be approximate. # See section 5.1 in the manuscript by Krause et al. (submitted) for justification or section 3.4.3 in the book # by Wood (2017)). An alternative is the second MC strategy discussed in section 5.3 of Krause et al. (submitted). # The code below shows how to get mssm to rely on this strategy: # Simulate some data sim_fit_dat = sim3(n=500,scale=2,c=0.1,family=Gamma(),seed=21) # Now fit nested models sim_fit_formula = Formula(lhs("y"), [i(),f(["x0"],nk=20,rp=1),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)], data=sim_fit_dat, print_warn=False) sim_fit_formula_sd = Formula(lhs("y"), [i()], data=sim_fit_dat, print_warn=False) sim_fit_model = GAMMLSS([sim_fit_formula,copy.deepcopy(sim_fit_formula_sd)],family = GAMMALS([LOG(),LOGb(-0.01)])) sim_fit_model.fit() sim_fit_formula2 = Formula(lhs("y"), [i(),f(["x1"],nk=20,rp=1),f(["x2"],nk=20,rp=1),f(["x3"],nk=20,rp=1)], data=sim_fit_dat, print_warn=False) sim_fit_model2 = GAMMLSS([sim_fit_formula2,copy.deepcopy(sim_fit_formula_sd)],family = GAMMALS([LOG(),LOGb(-0.01)])) sim_fit_model2.fit() # Set up a uniform prior from log(1e-7) to log(1e12) for each regularization parameter prior = DummyRhoPrior(b=np.log(1e12)) # Now correct for uncertainty in regularization parameters using the second MC strategy discussed by Krause et al. (submitted): # You can also set prior to ``None`` in which case the proposal distribution (by default a T-distribution with 40 degrees of freedom) is used as prior. cor_result_gs_1 = compare_CDL(sim_fit_model,sim_fit_model2,n_c=10,grid='JJJ3',seed=22,only_expected_edf=False,use_importance_weights=True,prior=prior,recompute_H=True)
- References:
Marra, G., & Wood, S. N. (2011) Practical variable selection for generalized additive models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
compareML
function fromitsadug
R-package: https://rdrr.io/cran/itsadug/man/compareML.htmlanova.gam
function frommgcv
, see: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/anova.gam
- Parameters:
correct_V (bool, optional) – Whether or not to correct for smoothness uncertainty. Defaults to True
correct_t1 (bool | None, optional) – Whether or not to also correct the smoothness bias corrected edf for smoothness uncertainty. Defaults to None - meaning that
mssm
will select an appropriate value.perform_GLRT (bool, optional) – Whether to perform both a GLRT and to compute the AIC or to only compute the AIC. Defaults to True.
nR (int, optional) – In case
grid!="JJJ1"
,nR
samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 1
alpha (float, optional) – alpha level of the GLRT. Defaults to 0.05
grid (str | None, optional) – How to compute the smoothness uncertainty correction, defaults to None - meaning that
mssm
will select an appropriate value.a (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to samplenR
candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimateb (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}_{\boldsymbol{\rho}})\) used to samplenR
candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimatedf (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample/propose the next set of candidates. Setting this to
np.inf
means a multivariate normal is used for sampling, defaults to 40verbose (bool, optional) – Whether to print progress information or not, defaults to False
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to
'qEFS'
, then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.seed (int,optional) – Seed to use for random parts of the correction. Defaults to None
only_expected_edf (bool|None, optional) – Whether to compute edf. by explicitly forming covariance matrix (
only_expected_edf=False
) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense whengrid_type!='JJJ1'
. Defaults to None - meaning thatmssm
will select an appropriate value.Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}_{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool | None,optional) – Whether to rely importance weights to compute the numerical integration when
grid_type != 'JJJ1'
or on the log-densities of \(\mathbf{V}_{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to None - meaning thatmssm
will select an appropriate value.prior (any, optional) – An (optional) instance of an arbitrary class that has a
.logpdf()
method to compute the prior log density of a sampled candidate. If this is set toNone
, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored whenuse_importance_weights=False
. Defaults to Nonerecompute_H (bool | None, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to None - meaning that
mssm
will select an appropriate value.compute_Vcc (bool | None, optional) – Whether to compute the second correction term when grid=’JJJ1’ (or when computing the lower-bound for the remaining grids) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to None - meaning that
mssm
will select an appropriate value.bfgs_options (dict,optional) – An optional dictionary holding arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
. If none are provided, thegtol
argument will be initialized toconv_tol
. Note also, that in any case themaxiter
argument is automatically set tomax_inner
. Defaults to None.
- Raises:
ValueError – If both models are from different families.
ValueError – If
perform_GLRT=True
andmodel1
has fewer coef thanmodel2
- i.e.,model1
has to be the notationally more complex one.
- Returns:
A dictionary with outcomes of all tests. Key
H1
will be a bool indicating whether Null hypothesis was rejected or not,p
will be the p-value,test_stat
will be the test statistic used,Res. DOF
will be the degrees of freedom used by the test,aic1
andaic2
will be the aic scores for both models.- Return type:
dict
mssm.src.python.custom_types module
- class mssm.src.python.custom_types.ConstType(*values)
Bases:
Enum
Custom Constraint data type used by internal functions.
- DIFF = 3
- DROP = 1
- QR = 2
- class mssm.src.python.custom_types.Constraint(Z: ndarray | int | None = None, type: ConstType | None = None)
Bases:
object
Constraint storage.
Z
, either holds the Qr-based correction matrix that needs to be multiplied with \(\mathbf{X}\), \(\mathbf{S}\), and \(\mathbf{D}\) (where \(\mathbf{D}\mathbf{D}^T = \mathbf{S}\)) to make terms subject to the conventional sum-to-zero constraints applied also in mgcv (Wood, 2017), the column/row that should be dropped from those - then \(\mathbf{X}\) can also no longer take on a constant, orNone
indicating that the model should be “difference re-coded” to enable sparse sum-to-zero constraints. The latter two are available in mgcv’ssmoothCon
function by setting thesparse.cons
argument to 1 or 2 respectively.The QR-based approach is described in detail by Wood (2017) and is similar to just mean centering every basis function involved in the smooth and then dropping one column from the corresponding centered model matrix. The column-dropping approach is self-explanatory. The difference re-coding re-codes bases functions to correspond to differences of bases functions. The resulting basis remains sparser than the alternatives, but this is not a true centering constraint: \(f(x)\) will not necessarily be orthogonal to the intercept, i.e., \(\mathbf{1}^T \mathbf{f(x)}\) will not necessarily be 0. Hence, confidence intervals will usually be wider when using ConstType.DIFF (also when using ConstType.DROP, for the same reason) instead of
ConstType.QR
(see Wood; 2017,2020)!A final note regards the use of tensor smooths when
te==False
. Since the value of any constant estimated for a smooth depends on the type of constraint used, the marginal functions estimated for the “main effects” (\(f(x)\), \(f(z)\)) and “interaction effect” (\(f(x,z)\)) in a model: \(y = a + f(x) + f(z) + f(x,z)\) will differ depending on the type of constraint used. The “Anova-like” decomposition described in detail in Wood (2017) is achievable only when usingConstType.QR
.Thus,
ConstType.QR
is the default by allmssm
functions, and the other two options should be considered experimental.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2020). Inference and computation with generalized additive models and their extensions. TEST, 29(2), 307–339. https://doi.org/10.1007/s11749-020-00711-5
Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655
- Z: ndarray | int | None = None
- class mssm.src.python.custom_types.Fit_info(lambda_updates: int = 0, iter: int = 0, code: int = 1, eps: float | None = None, K2: float | None = None, dropped: list[int] | None = None)
Bases:
object
Holds information related to convergence (speed) for GAMMs, GAMMLSS, and GSMMs.
- Variables:
lambda_updates (int) – The total number of lambda updates computed during estimation. Initialized with 0.
iter (int) – The number of outer iterations (a single outer iteration can involve multiple lambda updates) completed during estimation. Initialized with 0.
code (int) – Convergence status. Anything above 0 indicates that the model did not converge and estimates should be considered carefully. Initialized with 1.
eps (float) – The fraction added to the last estimate of the negative Hessian of the penalized likelihood during GAMMLSS or GSMM estimation. If this is not 0 - the model should not be considered as converged, irrespective of what
code
indicates. This most likely implies that the model is not identifiable. Initialized withNone
and ignored for GAMM estimation.K2 (float) – An estimate for the condition number of matrix
A
, whereA.T@A=H
andH
is the final estimate of the negative Hessian of the penalized likelihood. Only available ifcheck_cond>0
whenmodel.fit()
is called for any model (i.e., GAMM, GAMMLSS, GSMM). Initialized withNone
.dropped ([int]) – The final set of coefficients dropped during GAMMLSS/GSMM estimation when using
method in ["QR/Chol","LU/Chol","Direct/Chol"]
orNone
in which case no coefficients were dropped. Initialized with 0.
- K2: float | None = None
- code: int = 1
- dropped: list[int] | None = None
- eps: float | None = None
- iter: int = 0
- lambda_updates: int = 0
- class mssm.src.python.custom_types.LambdaTerm(S_J: csc_array | None = None, S_J_emb: csc_array | None = None, D_J_emb: csc_array | None = None, rep_sj: int = 1, lam: float = 1.1, start_index: int | None = None, frozen: bool = False, type: PenType | None = None, rank: int | None = None, term: int | None = None, clust_series: list[int] | None = None, clust_weights: list[list[float]] | None = None, dist_param: int = 0, rp_idx: int | None = None, S_J_lam: csc_array | None = None)
Bases:
object
\(\lambda\) storage term.
Usually
model.overall_penalties
holds a list of these.- Variables:
S_J (scp.sparse.csc_array) – The penalty matrix associated with this lambda term. Note, in case multiple penalty matrices share the same lambda value, the
rep_sj
argument determines how many diagonal blocks we need to fill with this penalty matrix to getS_J_emb
. Initialized withNone
.S_J_emb (scp.sparse.csc_array) – A zero-embedded version of the penalty matrix associated with this lambda term. Note, this matrix contains
rep_sj
diagonal sub-blocks each filled withS_J
. Initialized withNone
.D_J_emb (scp.sparse.csc_array) – Root of
S_J_emb
, so thatD_J_emb@D_J_emb.T=S_J_emb
. Initialized withNone
.rep_sj (int) – How many sequential sub-blocks of
S_J_emb
need to be filled withS_J
. Useful if all levels of a categorical variable for which a separate smooth is to be estimated are assumed to share the same lambda value. Initialized with 1.lam (float) – The current estimate for \(\lambda\). Initialized with 1.1.
start_index (int) – The first row and column in the overall penalty matrix taken up by
S_J
. Initialized withNone
.type (PenType) – The type of this penalty term. Initialized with
None
.rank (int) – The rank of
S_J
. Initialized withNone
.term (int) – The index of the term in a
mssm.src.python.formula.Formula
with which this penalty is associated. Initialized withNone
.
- D_J_emb: csc_array | None = None
- S_J: csc_array | None = None
- S_J_emb: csc_array | None = None
- S_J_lam: csc_array | None = None
- clust_series: list[int] | None = None
- clust_weights: list[list[float]] | None = None
- dist_param: int = 0
- frozen: bool = False
- lam: float = 1.1
- rank: int | None = None
- rep_sj: int = 1
- rp_idx: int | None = None
- start_index: int | None = None
- term: int | None = None
- class mssm.src.python.custom_types.PenType(*values)
Bases:
Enum
Custom Penalty data type used by internal functions.
- COEFFICIENTS = 7
- CUSTOM = 8
- DERIVATIVE = 6
- DIFFERENCE = 2
- DISTANCE = 3
- IDENTITY = 1
- NULL = 5
- REPARAM1 = 4
- class mssm.src.python.custom_types.Reparameterization(Srp: csc_array | None = None, Drp: csc_array | None = None, C: csc_array | None = None, scale: float | None = None, IRrp: csc_array | None = None, rms1: float | None = None, rms2: float | None = None, rank: int | None = None)
Bases:
object
Holds information necessary to re-parameterize a smooth term.
- Variables:
Srp (scp.sparse.csc_array) – The transformed penalty matrix
Drp (scp.sparse.csc_array) – The root of the transformed penalty matrix
C (scp.sparse.csc_array) – Transformation matrix for model matrix and/or penalty.
- C: csc_array | None = None
- Drp: csc_array | None = None
- IRrp: csc_array | None = None
- Srp: csc_array | None = None
- rank: int | None = None
- rms1: float | None = None
- rms2: float | None = None
- scale: float | None = None
mssm.src.python.exp_fam module
- class mssm.src.python.exp_fam.Binomial(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Logit object>, n: int | list[int] = 1)
Bases:
Family
Binomial family. For this implementation we assume that we have collected proportions of success, i.e., the dependent variables specified in the model Formula needs to hold observed proportions and not counts! If we assume that each observation \(y_i\) reflects a single independent draw from a binomial, (with \(n=1\), and \(p_i\) being the probability that the result is 1) then the dependent variable should either hold 1 or 0. If we have multiple independent draws from the binomial per observation (i.e., row in our data-frame), then \(n\) will usually differ between observations/rows in our data-frame (i.e., we observe \(k_i\) counts of success out of \(n_i\) draws - so that \(y_i=k_i/n_i\)). In that case, the Binomial() family accepts a vector for argument \(\mathbf{n}\) (which is simply set to 1 by default, assuming binary data), containing \(n_i\) for every observation \(y_i\).
In this implementation, the scale parameter is kept fixed/known at 1.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical logit link.
n (int or [int], optional) – Number of independent draws from a Binomial per observation/row of data-frame. For binary data this can simply be set to 1, which is the default.
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the model deviance
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
The variance function (of the mean; see Wood, 2017, 3.1.2) for the Binomial model. Variance is minimal for \(\mu=1\) and \(\mu=0\), maximal for \(\mu=0.5\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
Deviance of the model
- Return type:
float
- init_mu(y: ndarray) ndarray
Function providing initial \(\boldsymbol{\mu}\) vector for GAMM.
Estimation assumes proportions as dep. variable. According to: https://stackoverflow.com/questions/60526586/ the glm() function in R always initializes \(\mu\) = 0.75 for observed proportions (i.e., elements in \(\mathbf{y}\)) of 1 and \(\mu\) = 0.25 for proportions of zero. This can be achieved by adding 0.5 to the observed proportion of success (and adding one observation).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing an initial estimate of the probability of success per observation
- Return type:
np.ndarray
- llk(y: ndarray, mu: ndarray) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
log-likelihood of the model
- Return type:
float
- lp(y: ndarray, mu: ndarray) ndarray
Log-probability of observing every proportion in \(\mathbf{y}\) under their respective binomial with mean = \(\boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed proportion.
mu (np.ndarray) – A numpy array containing the predicted probability for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.Family(link: Link, twopar: bool, scale: float = None)
Bases:
object
Base class to be implemented by Exp. family member.
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family.
twopar (bool) – Whether the family has two parameters (mean,scale) to be estimated (i.e., whether the likelihood is a function of two parameters), or only a single one (usually the mean).
scale (float or None, optional) – Known/fixed scale parameter for this family. Setting this to None means the parameter has to be estimated. Must be set to 1 if the family has no scale parameter (i.e., when
twopar = False
)
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the contribution of each observation to the overall deviance.
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
The variance function (of the mean; see Wood, 2017, 3.1.2). Different exponential families allow for different relationships between the variance in our random response variable and the mean of it. For the normal model this is assumed to be constant.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
Deviance of the model under this family
- Return type:
float
- init_mu(y: ndarray) ndarray | None
Convenience function to compute an initial \(\boldsymbol{\mu}\) estimate passed to the GAMM/PIRLS estimation routine.
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing an initial estimate of the mean
- Return type:
np.ndarray
- llk(y: ndarray, mu: ndarray, **kwargs) float
log-probability of \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\). Essentially sum over all elements in the vector returned by the
lp()
method.Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e.,
two_par=True
) must pass as key-word argument ascale
parameter with a default value, e.g.,:def llk(self, mu, scale=1): ...
You can check the implementation of the
Gaussian
Family for an example.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
log-likelihood of the model under this family
- Return type:
float
- lp(y: ndarray, mu: ndarray, **kwargs) ndarray
Log-probability of observing every value in \(\mathbf{y}\) under this family with mean = \(\boldsymbol{\mu}\).
Families with more than one parameter that needs to be estimated in order to evaluate the model’s log-likelihood (i.e.,
two_par=True
) must pass as key-word argument ascale
parameter with a default value, e.g.,:def lp(self, mu, scale=1): ...
You can check the implementation of the
Gaussian
Family for an example.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mu (np.ndarray) – A numpy array of shape (-1,1) containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.GAMLSSFamily(pars: int, links: list[Link])
Bases:
object
Base-class to be implemented by families of Generalized Additive Mixed Models of Location, Scale, and Shape (GAMMLSS; Rigby & Stasinopoulos, 2005).
Apart from the required methods, three mandatory attributes need to be defined by the
__init__()
constructor of implementations of this class. These are required to evaluate the first and second (pure & mixed) derivative of the log-likelihood with respect to any of the log-likelihood’s parameters (alternatively the linear predictors of the parameters - see the description of thed_eta
instance variable.). See the variables below.Optionally, a
mean_init_fam
attribute can be defined - specfiying aFamily
member that is fitted to the data to get an initial estimate of the mean parameter of the assumed distribution.- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
pars (int) – Number of parameters of the distribution belonging to the random variables assumed to have generated the observations, e.g., 2 for the Normal: mean and standard deviation.
links ([Link]) – Link functions for each of the parameters of the distribution.
- Variables:
d_eta (bool) – A boolean indicating whether partial derivatives of llk are provided with respect to the linear predictor instead of parameters (i.e., the mean), defaults to False (derivatives are provided with respect to parameters)
d1 ([Callable]) – A list holding
n_par
functions to evaluate the first partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling__init__()
.d2 ([Callable]) – A list holding
n_par
functions to evaluate the second (pure) partial derivatives of llk with respect to each parameter of the llk. Needs to be initialized when calling__init__()
.d2m ([Callable]) – A list holding
n_par*(n_par-1)/2
functions to evaluate the second mixed partial derivatives of llk with respect to each parameter of the llk in order:d2m[0]
= \(\partial l/\partial \mu_1 \partial \mu_2\),d2m[1]
= \(\partial l/\partial \mu_1 \partial \mu_3\), …,d2m[n_par-1]
= \(\partial l/\partial \mu_1 \partial \mu_{n_{par}}\),d2m[n_par]
= \(\partial l/\partial \mu_2 \partial \mu_3\),d2m[n_par+1]
= \(\partial l/\partial \mu_2 \partial \mu_4\), … . Needs to be initialized when calling__init__()
.
- get_resid(y: ndarray, *mus: list[ndarray], **kwargs) ndarray | None
Get standardized residuals for a GAMMLSS model (Rigby & Stasinopoulos, 2005).
Any implementation of this function should return a vector that looks like what could be expected from taking
len(y)
independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along viakwargs
.Note: Families for which no residuals are available can return None.
- References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.
- Returns:
a vector of shape (-1,1) containing standardized residuals under the current model or None in case residuals are not readily available.
- Return type:
np.ndarray | None
- init_coef(models: list[Callable]) ndarray
(Optional) Function to initialize the coefficients of the model.
Can return
None
, in which case random initialization will be used.- Parameters:
models ([mssm.models.GAMM]) – A list of
mssm.models.GAMM
’s, - each based on one of the formulas provided to a model.- Returns:
A numpy array of shape (-1,1), holding initial values for all model coefficients.
- Return type:
np.ndarray
- init_lambda(penalties: list[Callable]) list[float]
(Optional) Function to initialize the smoothing parameters of the model.
Can return
None
, in which case random initialization will be used.- Parameters:
penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.
- Returns:
A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.
- Return type:
[float]
- llk(y: ndarray, *mus: list[ndarray]) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observation.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, *mus: list[ndarray]) ndarray
Log-probability of observing every element in \(\mathbf{y}\) under their respective distribution parameterized by
mus
.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array of shape (-1,1) containing each observed value.
mus ([np.ndarray]) – A list including self.n_par lists - one for each parameter of the distribution. Each of those lists contains a numpy array of shape (-1,1) holding the expected value for a particular parmeter for each of the N observations.
- Returns:
a N-dimensional vector of shape (-1,1) containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.GAMMALS(links: list[Link])
Bases:
GAMLSSFamily
Family for a GAMMA GAMMLSS model (Rigby & Stasinopoulos, 2005).
This Family follows the
Gamma
family, in that we assume: \(Y_i \sim \Gamma(\mu_i,\phi_i)\). The difference to theGamma
family is that we now also model \(\phi\) as an additive combination of smooth variables and other parametric terms. The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code forlp()
method of theGamma
family for details).References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
links ([Link]) – Link functions for the mean and standard deviation. Standard would be
links=[LOG(),LOG()]
.
- get_resid(y: ndarray, mu: ndarray, scale: ndarray) ndarray
Get standardized residuals for a Gamma GAMMLSS model (Rigby & Stasinopoulos, 2005).
Essentially, to get a standaridzed residual vector we first have to account for the mean-variance relationship of our RVs (which we also have to do for the
Gamma
family) - for this we can simply compute deviance residuals again (see Wood, 2017). These should be \(\sim N(0,\phi_i)\) (where \(\phi_i\) is the element inscale
for a specific observation) - so if we divide each of those by the observation-specific scale we can expect the resulting standardized residuals to be :math:` sim N(0,1)` if the model is correct.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.
- Returns:
A list of standardized residuals that should be ~ N(0,1) if the model is correct.
- Return type:
np.ndarray
- init_coef(models: list[Callable]) ndarray
Function to initialize the coefficients of the model.
Fits a GAMM for the mean and initializes all coef. for the scale parameter to 1.
- Parameters:
models ([mssm.models.GAMM]) – A list of
mssm.models.GAMM
’s, - each based on one of the formulas provided to a model.- Returns:
A numpy array of shape (-1,1), holding initial values for all model coefficients.
- Return type:
np.ndarray
- llk(y: ndarray, mu: ndarray, scale: ndarray) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray, scale: ndarray) ndarray
Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\) and scale = \(\boldsymbol{\phi}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (np.ndarray) – A numpy array containing the predicted scale parameter for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.GAUMLSS(links: list[Link])
Bases:
GAMLSSFamily
Family for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).
This Family follows the
Gaussian
family, in that we assume: \(Y_i \sim N(\mu_i,\sigma_i)\). i.e., each of the \(N\) observations is still believed to have been generated from an independent normally distributed RV with observation-specific mean.The important difference is that the scale parameter, \(\sigma\), is now also observation-specific and modeled as an additive combination of smooth functions and other parametric terms, just like the mean is in a Normal GAM. Note, that this explicitly models heteroscedasticity - the residuals are no longer assumed to be i.i.d samples from \(\sim N(0,\sigma)\), since \(\sigma\) can now differ between residual realizations.
References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
links ([Link]) – Link functions for the mean and standard deviation. Standard would be
links=[Identity(),LOG()]
.
- get_resid(y: ndarray, mu: ndarray, sigma: ndarray) float
Get standardized residuals for a Normal GAMMLSS model (Rigby & Stasinopoulos, 2005).
Essentially, each residual should reflect a realization of a normal with mean zero and observation-specific standard deviation. After scaling each residual by their observation-specific standard deviation we should end up with standardized residuals that can be expected to be i.i.d \(\sim N(0,1)\) - assuming that our model is correct.
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.
- Returns:
A list of standardized residuals that should be ~ N(0,1) if the model is correct.
- Return type:
np.ndarray
- init_coef(models: list[Callable]) ndarray
Function to initialize the coefficients of the model.
Fits a GAMM for the mean and initializes all coef. for the standard deviation to 1.
- Parameters:
models ([mssm.models.GAMM]) – A list of
mssm.models.GAMM
’s, - each based on one of the formulas provided to a model.- Returns:
A numpy array of shape (-1,1), holding initial values for all model coefficients.
- Return type:
np.ndarray
- llk(y: ndarray, mu: ndarray, sigma: ndarray) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray, sigma: ndarray) ndarray
Log-probability of observing every proportion in y under their respective Normal with observation-specific mean and standard deviation.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (np.ndarray) – A numpy array containing the predicted stdandard deviation for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.GSMMFamily(pars: int, links: list[Link], *llkargs)
Bases:
object
Base-class for General Smooth “families” as discussed by Wood, Pya, & Säfken (2016). For estimation of
mssm.models.GSMM
models viaL-qEFS
(Krause et al., submitted) it is sufficient to implementllk()
.gradient()
andhessian()
can then simply returnNone
. For exact estimation via Newton’s method, the latter two functions need to be implemented and have to return the gradient and hessian at the current coefficient estimate respectively.Additional parameters needed for likelihood, gradient, or hessian evaluation can be passed along via the
llkargs
. They are then made available inself.llkargs
.- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
pars (int) – Number of parameters of the likelihood.
links ([Link]) – List of Link functions for each parameter of the likelihood, e.g., links=[Identity(),LOG()].
- Variables:
extra_coef (int, optional) – Number of extra coefficients required by specific family or
None
. By default set toNone
and changed toint
by specific families requiring this.
- get_resid(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array], **kwargs) ndarray | None
Get standardized residuals for a GSMM model.
Any implementation of this function should return a vector that looks like what could be expected from taking independent draws from \(N(0,1)\). Any additional arguments required by a specific implementation can be passed along via
kwargs
.Note: Families for which no residuals are available can return None.
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
a vector of shape (-1,1) containing standardized residuals under the current model (Note, the first axis will not necessarily match the dimension of any of the response vectors (this will depend on the specific Family’s implementation)) or None in case residuals are not readily available.
- Return type:
np.ndarray | None
- gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray
Function to evaluate the gradient of the llk at current coefficient estimate
coef
.By default relies on numerical differentiation as implemented in scipy to approximate the Gradient from the implemented log-likelihood function. See the link in the references for more details.
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
scipy.optimize.approx_fprime
: at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.htmlKrause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The Gradient of the log-likelihood evaluated at
coef
as numpy array of shape (-1,1).- Return type:
np.ndarray
- hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array | None
Function to evaluate the hessian of the llk at current coefficient estimate
coef
.Only has to be implemented if full Newton is to be used to estimate coefficients. If the L-qEFS update by Krause et al. (in preparation) is to be used insetad, this method does not have to be implemented.
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
scipy.optimize.approx_fprime
: at https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.htmlKrause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The Hessian of the log-likelihood evaluated at
coef
.- Return type:
scp.sparse.csc_array
- init_coef(models: list[Callable]) ndarray
(Optional) Function to initialize the coefficients of the model.
Can return
None
, in which case random initialization will be used.- Parameters:
models ([mssm.models.GAMM]) – A list of
mssm.models.GAMM
’s, - each based on one of the formulas provided to a model.- Returns:
A numpy array of shape (-1,1), holding initial values for all model coefficients.
- Return type:
np.ndarray
- init_lambda(penalties: list[Callable]) list[float]
(Optional) Function to initialize the smoothing parameters of the model.
Can return
None
, in which case random initialization will be used.- Parameters:
penalties ([mssm.src.python.penalties.LambdaTerm]) – A list of all penalties to be estimated by the model.
- Returns:
A list, holding - for each \(\lambda\) parameter to be estimated - an initial value.
- Return type:
np.ndarray
- llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float
log-probability of data under given model.
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations (each of shape (-1,1)) passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The log-likelihood evaluated at
coef
.- Return type:
float
- class mssm.src.python.exp_fam.Gamma(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float = None)
Bases:
Family
Gamma Family.
We assume: \(Y_i \sim \Gamma(\mu_i,\phi)\). The Gamma distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and rate parameter - called \(\alpha\) and \(\beta\) respectively. Wood (2017) provides \(\alpha = 1/\phi\). With this we can obtain \(\beta = 1/\phi/\mu\) (see the source-code for
lp()
method for details).References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
A N-dimensional vector containing the contribution of each data-point to the overall model deviance.
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
Variance function for the Gamma family.
The variance of random variable \(Y\) is proportional to it’s mean raised to the second power.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
- Returns:
mu raised to the power of 2
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
The model deviance.
- Return type:
float
- llk(y: ndarray, mu: ndarray, scale: float = 1) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray, scale: float = 1) ndarray
Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Gamma with mean = \(\boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.Gaussian(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.Identity object>, scale: float = None)
Bases:
Family
Normal/Gaussian Family.
We assume: \(Y_i \sim N(\mu_i,\sigma)\) - i.e., each of the \(N\) observations is generated from a normally distributed RV with observation-specific mean and shared scale parameter \(\sigma\). Equivalent to the assumption that the observed residual vector - the difference between the model prediction and the observed data - should look like what could be expected from drawing \(N\) independent samples from a Normal with mean zero and standard deviation equal to \(\sigma\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family. By default set to the canonical identity link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
A N-dimensional vector containing the contribution of each data-point to the overall model deviance.
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
Variance function for the Normal family.
Not really a function since the link between variance and mean of the RVs is assumed constant for this model.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
- Returns:
a N-dimensional vector of 1s
- Return type:
np.ndarray
- Returns:
a N-dimensional vector of shape (-1,1) containing the variance function evaluated for each mean
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
The model deviance.
- Return type:
float
- llk(y: ndarray, mu: ndarray, sigma: float = 1) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (float, optional) – The (estimated) sigma parameter, defaults to 1
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray, sigma: float = 1) ndarray
Log-probability of observing every proportion in \(\mathbf{y}\) under their respective Normal with mean = \(\boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
sigma (float, optional) – The (estimated) sigma parameter, defaults to 1
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.Identity
Bases:
Link
Identity Link function. \(\boldsymbol{\mu}=\boldsymbol{\eta}\) and so this link is trivial.
- dy1(mu: ndarray) ndarray
First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- dy2(mu: ndarray) ndarray
Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).
References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- f(mu: ndarray) ndarray
Canonical link for normal distribution with \(\boldsymbol{\eta} = \boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- fi(eta: ndarray) ndarray
For the identity link, \(\boldsymbol{\eta} = \boldsymbol{\mu}\), so the inverse is also just the identity. see Faraway (2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.
- class mssm.src.python.exp_fam.InvGauss(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>, scale: float | None = None)
Bases:
Family
Inverse Gaussian Family.
We assume: \(Y_i \sim IG(\mu_i,\phi)\). The Inverse Gaussian distribution is usually not expressed in terms of the mean and scale (\(\phi\)) parameter but rather in terms of a shape and scale parameter - called \(\nu\) and \(\lambda\) respectively (see the scipy implementation). We can simply set \(\nu=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017). Wood (2017) shows that \(\phi=1/\lambda\), so this provides \(\lambda=1/\phi\)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.invgauss.html
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.
scale (float or None, optional) – Known scale parameter for this family - by default set to None so that the scale parameter is estimated.
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
A N-dimensional vector containing the contribution of each data-point to the overall model deviance.
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
Variance function for the Inverse Gaussian family.
The variance of random variable \(Y\) is proportional to it’s mean raised to the third power.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
- Returns:
mu raised to the power of 3
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
The model deviance.
- Return type:
float
- llk(y: ndarray, mu: ndarray, scale: float = 1) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray, scale: float = 1) ndarray
Log-probability of observing every value in \(\mathbf{y}\) under their respective inverse Gaussian with mean = \(\boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.LOG
Bases:
Link
Log Link function. \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).
- dy1(mu: ndarray) ndarray
First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- dy2(mu: ndarray) ndarray
Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).
References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- f(mu: ndarray) ndarray
Non-canonical link for Gamma distribution with \(log(\boldsymbol{\mu}) = \boldsymbol{\eta}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- fi(eta: ndarray) ndarray
For the log link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu})\), so \(exp(\boldsymbol{\eta})=\boldsymbol{\mu}\). see Faraway (2016)
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.
- class mssm.src.python.exp_fam.LOGb(b: float)
Bases:
Link
Log + b Link function. \(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).
- Parameters:
b (float) – The constant to add to \(\mu\) before taking the log.
- dy1(mu: ndarray) ndarray
First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017).
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- dy2(mu: ndarray) ndarray
Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).
References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- f(mu: ndarray) ndarray
\(log(\boldsymbol{\mu} + b) = \boldsymbol{\eta}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- fi(eta: ndarray) ndarray
For the logb link, \(\boldsymbol{\eta} = log(\boldsymbol{\mu} + b)\), so \(exp(\boldsymbol{\eta})-b =\boldsymbol{\mu}\)
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.
- class mssm.src.python.exp_fam.Link
Bases:
object
Link function base class. To be implemented by any link functiion used for GAMMs and GAMMLSS models. Only links used by
GAMLSS
models require implementing the dy2 function. Note, that care must be taken that every method returns only valid values. Specifically, no returned element may benumpy.nan
ornumpy.inf
.- dy1(mu: ndarray) ndarray
First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\) Needed for Fisher scoring/PIRLS (Wood, 2017).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- dy2(mu: ndarray) ndarray
Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).
References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- f(mu: ndarray) ndarray
Link function \(f()\) mapping mean \(\boldsymbol{\mu}\) of an exponential family to the model prediction \(\boldsymbol{\eta}\), so that \(f(\boldsymbol{\mu}) = \boldsymbol{\eta}\). see Wood (2017, 3.1.2) and Faraway (2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- fi(eta: ndarray) ndarray
Inverse of the link function mapping \(\boldsymbol{\eta} = f(\boldsymbol{\mu})\) to the mean \(fi(\boldsymbol{\eta}) = fi(f(\boldsymbol{\mu})) = \boldsymbol{\mu}\). see Faraway (2016) and the
Link.f
function.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.
- class mssm.src.python.exp_fam.Logit
Bases:
Link
Logit Link function, which is canonical for the binomial model. \(\boldsymbol{\eta}\) = log-odds of success.
- dy1(mu: ndarray) ndarray
First derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for Fisher scoring/PIRLS (Wood, 2017):
\[ \begin{align}\begin{aligned}f(\mu) = log(\mu / (1 - \mu))\\f(\mu) = log(\mu) - log(1 - \mu)\\\partial f(\mu)/ \partial \mu = 1/\mu - 1/(1 - \mu)\end{aligned}\end{align} \]Faraway (2016) simplifies this to: \(\partial f(\mu)/ \partial \mu = 1 / (\mu - \mu^2) = 1/ ((1-\mu)\mu)\)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- dy2(mu: ndarray) ndarray
Second derivative of \(f(\boldsymbol{\mu})\) with respect to \(\boldsymbol{\mu}\). Needed for GAMMLSS models (Wood, 2017).
References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- f(mu: ndarray) ndarray
Canonical link for binomial distribution with \(\boldsymbol{\mu}\) holding the probabilities of success, so that the model prediction \(\boldsymbol{\eta}\) is equal to the log-odds.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- fi(eta: ndarray) ndarray
For the logit link and the binomial model, \(\boldsymbol{\eta}\) = log-odds, so the inverse to go from \(\boldsymbol{\eta}\) to \(\boldsymbol{\mu}\) is \(\boldsymbol{\mu} = exp(\boldsymbol{\eta}) / (1 + exp(\boldsymbol{\eta}))\). see Faraway (2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
eta (np.ndarray) – A numpy array containing the model prediction corresponding to each observation.
- class mssm.src.python.exp_fam.MULNOMLSS(pars: int)
Bases:
GAMLSSFamily
Family for a Multinomial GAMMLSS model (Rigby & Stasinopoulos, 2005).
This Family assumes that each observation \(y_i\) corresponds to one of \(K\) classes (labeled as 0, …, \(K\)) and reflects a realization of an independent RV \(Y_i\) with observation-specific probability mass function defined over the \(K\) classes. These \(K\) probabilities - that \(Y_i\) takes on class 1, …, \(K\) - are modeled as additive combinations of smooth functions of covariates and other parametric terms.
As an example, consider a visual search experiment where \(K-1\) distractors are presented on a computer screen together with a single target and subjects are instructed to find the target and fixate it. With a Multinomial model we can estimate how the probability of looking at each of the \(K\) stimuli on the screen changes (smoothly) over time and as a function of other predictor variables of interest (e.g., contrast of stimuli, dependening on whether parfticipants are instructed to be fast or accurate).
References:
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
pars (int) – K-1, i.e., 1- Number of classes or the number of linear predictors.
- get_resid(y: ndarray, *mus: list[ndarray]) None
Placeholder function for residuals of a Multinomial model - yet to be implemented.
- Parameters:
y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.
- Returns:
Currently None - since no residuals are implemented
- llk(y: ndarray, *mus: list[ndarray])
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, *mus: list[ndarray]) ndarray
Log-probability of observing class k under current model.
Our DV consists of K classes but we essentially enforce a sum-to zero constraint on the DV so that we end up modeling only K-1 (non-normalized) probabilities of observing class k (for all k except k==K) as an additive combination of smooth functions of our covariates and other parametric terms. The probability of observing class K as well as the normalized probabilities of observing each other class can readily be computed from these K-1 non-normalized probabilities. This is explained quite well on Wikipedia (see refs).
Specifically, the probability of the outcome being class k is simply:
\(p(Y_i == k) = \mu_k / (1 + \sum_j^{K-1} \mu_j)\) where \(\mu_k\) is the aforementioned non-normalized probability of observing class \(k\) - which is simply set to 1 for class \(K\) (this follows from the sum-to-zero constraint; see Wikipedia).
So, the log-prob of the outcome being class k is:
\(log(p(Y_i == k)) = log(\mu_k) - log(1 + \sum_j^{K-1} \mu_j)\)
References:
Wikipedia. https://en.wikipedia.org/wiki/Multinomial_logistic_regression
gamlss.dist on Github (see Rigby & Stasinopoulos, 2005). https://github.com/gamlss-dev/gamlss.dist/blob/main/R/MN4.R
Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized Additive Models for Location, Scale and Shape.
- Parameters:
y (np.ndarray) – A numpy array containing each observed class, every element must be larger than or equal to 0 and smaller than self.n_par + 1.
mus ([np.ndarray]) – A list containing K-1 (self.n_par) lists, each containing the non-normalized probabilities of observing class k for every observation.
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.Poisson(link: ~mssm.src.python.exp_fam.Link = <mssm.src.python.exp_fam.LOG object>)
Bases:
Family
Poisson Family.
We assume: \(Y_i \sim P(\lambda)\). We can simply set \(\lambda=\mu\) (compare scipy density to the one in table 3.1 of Wood, 2017) and treat the scale parameter of a GAMM (\(\phi\)) as fixed/known at 1.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html
- Parameters:
link (Link) – The link function to be used by the model of the mean of this family. By default set to the log link.
- D(y: ndarray, mu: ndarray) ndarray
Contribution of each observation to model Deviance (Wood, 2017; Faraway, 2016)
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
A N-dimensional vector containing the contribution of each data-point to the overall model deviance.
- Return type:
np.ndarray
- V(mu: ndarray) ndarray
Variance function for the Poisson family.
The variance of random variable \(Y\) is proportional to it’s mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – a N-dimensional vector of the model prediction/the predicted mean
- Returns:
mu
- Return type:
np.ndarray
- dVy1(mu: ndarray) ndarray
The first derivative of the variance function (of the mean; see Wood, 2017, 3.1.2) with respect ot the mean.
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing the first derivative of the variance function with respect to each mean
- Return type:
np.ndarray
- deviance(y: ndarray, mu: ndarray) float
Deviance of the model under this family: 2 * (llk_max - llk_c) * scale (Wood, 2017; Faraway, 2016).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
The model deviance.
- Return type:
float
- init_mu(y: ndarray) ndarray
Function providing initial \(\boldsymbol{\mu}\) vector for Poisson GAMM.
We shrink extreme observed counts towards mean.
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
- Returns:
a N-dimensional vector of shape (-1,1) containing an intial estimate of the mean of the response variables
- Return type:
np.ndarray
- llk(y: ndarray, mu: ndarray) float
log-probability of data under given model. Essentially sum over all elements in the vector returned by the
lp()
method.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observation.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
scale (float, optional) – The (estimated) scale parameter, defaults to 1
- Returns:
The log-probability of observing all data under the current model.
- Return type:
float
- lp(y: ndarray, mu: ndarray) ndarray
Log-probability of observing every value in \(\mathbf{y}\) under their respective Poisson with mean = \(\boldsymbol{\mu}\).
References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – A numpy array containing each observed value.
mu (np.ndarray) – A numpy array containing the predicted mean for the response distribution corresponding to each observation.
- Returns:
a N-dimensional vector containing the log-probability of observing each data-point under the current model.
- Return type:
np.ndarray
- class mssm.src.python.exp_fam.PropHaz(ut: ndarray, r: ndarray)
Bases:
GSMMFamily
Family for proportional Hazard model - a type of General Smooth model as discussed by Wood, Pya, & Säfken (2016).
Based on Supplementary materials G in Wood, Pya, & Säfken (2016). The dependent variable passed to the
mssm.src.python.formula.Formula
needs to holddelta
indicating whether the event was observed or not (i.e., only values in{0,1}
).Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate some data sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False) # Prep everything for prophaz model sim_dat = sim_dat.sort_values(['y'],ascending=[False]) sim_dat = sim_dat.reset_index(drop=True) print(sim_dat.head(),np.mean(sim_dat["delta"])) u,inv = np.unique(sim_dat["y"],return_inverse=True) ut = np.flip(u) r = np.abs(inv - max(inv)) # Now specify formula and model sim_formula_m = Formula(lhs("delta"), [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])], data=sim_dat) PropHaz_fam = PropHaz(ut,r) model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam) # Fit with Newton model.fit() # Can plot the estimated effects on the scale of the linear predictor (i.e., log hazard) via mssmViz plot(model)
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
- Parameters:
ut (np.ndarray) – Unique event time vector (each time represnted as
int
) as described by WPS (2016), holding unique event times in decreasing order.r (np.ndarray) – Index vector as described by WPS (2016), holding for each data-point (i.e., for each row in
Xs[0
]) the index to it’s corresponding event time inut
.
- get_baseline_hazard(coef: ndarray, delta: ndarray, Xs: list[csc_array]) ndarray
Get the cumulative baseline hazard function as defined by Wood, Pya, & Säfken (2016).
The function is evaluated for all
k
unique event times that were available in the data.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate some data sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False) # Prep everything for prophaz model sim_dat = sim_dat.sort_values(['y'],ascending=[False]) sim_dat = sim_dat.reset_index(drop=True) print(sim_dat.head(),np.mean(sim_dat["delta"])) u,inv = np.unique(sim_dat["y"],return_inverse=True) ut = np.flip(u) r = np.abs(inv - max(inv)) # Now specify formula and model sim_formula_m = Formula(lhs("delta"), [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])], data=sim_dat) PropHaz_fam = PropHaz(ut,r) model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam) # Fit with Newton model.fit() # Now get cumulative baseline hazard estimate H = PropHaz_fam.get_baseline_hazard(model.coef,sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat],model.get_mmat()) # And plot it plt.plot(ut,H) plt.xlabel("Time") plt.ylabel("Cumulative Baseline Hazard")
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).
Xs ([scp.sparse.csc_array]) – The list model matrices (here holding a single model matrix) obtained from
mssm.models.GAMMLSS.get_mmat()
.delta (np.ndarray) – Dependent variable passed to
mssm.src.python.formula.Formula()
, holds (for each row inXs[0
]) a value in{0,1}
, indicating whether for that observation the event was observed or not.
- Returns:
numpy array, holding
k
baseline hazard function estimates- Return type:
np.ndarray
- get_resid(coef, coef_split_idx, ys, Xs, resid_type: str = 'Martingale', reorder: ndarray | None = None) ndarray
Get Martingale or Deviance residuals for a proportional Hazard model.
See the
PropHaz.get_survival()
function for examples.- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.ys ([np.ndarray]) – List containing the
delta
vector at the first and only index - see description of the model family.Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.
resid_type (str, optional) – The type of residual to compute, supported are “Martingale” and “Deviance”.
reorder (np.ndarray) – A flattened np.ndarray containing for each data point the original index in the data-set before sorting. Used to re-order the residual vector into the original order. If this is set to None, the residual vector is not re-ordered and instead returned in the order of the sorted data-frame passed to the model formula.
- Returns:
The residual vector of shape (-1,1)
- Return type:
np.ndarray
- get_survival(coef: ndarray, Xs: list[csc_array], delta: ndarray, t: int, x: ndarray | csc_array, V: csc_array, compute_var: bool = True) tuple[ndarray, ndarray | None]
Compute survival function + variance at time-point
t
, givenk
optional covariate vector(s) x as defined by Wood, Pya, & Säfken (2016).Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt # Simulate some data sim_dat = sim3(500,2,c=1,seed=0,family=PropHaz([0],[0]),binom_offset = 0.1,correlate=False) # Prep everything for prophaz model # Create index variable for residual ordering sim_dat["index"] = np.arange(sim_dat.shape[0]) # Now sort sim_dat = sim_dat.sort_values(['y'],ascending=[False]) sim_dat = sim_dat.reset_index(drop=True) print(sim_dat.head(),np.mean(sim_dat["delta"])) u,inv = np.unique(sim_dat["y"],return_inverse=True) ut = np.flip(u) r = np.abs(inv - max(inv)) res_idx = np.argsort(sim_dat["index"].values) # Now specify formula and model sim_formula_m = Formula(lhs("delta"), [f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])], data=sim_dat) PropHaz_fam = PropHaz(ut,r) model = GSMM([copy.deepcopy(sim_formula_m)],PropHaz_fam) # Fit with Newton model.fit() # Now get estimate of survival function and see how it changes with x0 new_dat = pd.DataFrame({"x0":np.linspace(0,1,5), "x1":np.linspace(0,1,5), "x2":np.linspace(0,1,5), "x3":np.linspace(0,1,5)}) # Get model matrix using only f0 _,Xt,_ = model.predict(use_terms=[0],n_dat=new_dat) # Now iterate over all time-points and obtain the predicted survival function + standard error estimate # for all 5 values of x0: S = np.zeros((len(ut),Xt.shape[0])) VS = np.zeros((len(ut),Xt.shape[0])) for idx,ti in enumerate(ut): # Su and VSu are of shape (5,1) here but will generally be of shape (Xt.shape[0],1) Su,VSu = PropHaz_fam.get_survival(model.coef,model.get_mmat(),sim_formula_m.y_flat[sim_formula_m.NOT_NA_flat], ti,Xt,model.lvi.T@model.lvi) S[idx,:] = Su.flatten() VS[idx,:] = VSu.flatten() # Now we can plot the estimated survival functions + approximate cis: for xi in range(Xt.shape[0]): plt.fill([*ut,*np.flip(ut)], [*(S[:,xi] + 1.96*VS[:,xi]),*np.flip(S[:,xi] - 1.96*VS[:,xi])],alpha=0.5) plt.plot(ut,S[:,xi],label=f"x0 = {new_dat["x0"][xi]}") plt.legend() plt.xlabel("Time") plt.ylabel("Survival") plt.show() # Note how the main effect of x0 is reflected in the plot above: plot(model,which=[0]) # Residual plots can be created via `plot_val` from `mssmViz` - by default Martingale residuals are returned (see Wood, 2017) fig = plt.figure(figsize=(10,3),layout='constrained') axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2}) # Note the use of `gsmm_kwargs_pred={}` to ensure that the re-ordering is not applied to the plot against predicted values plot_val(model,gsmm_kwargs={"reorder":res_idx},gsmm_kwargs_pred={},ar_lag=25,axs=axs) # Can also get Deviance residuals: fig = plt.figure(figsize=(10,3),layout='constrained') axs = fig.subplots(1,3,gridspec_kw={"wspace":0.2}) plot_val(model,gsmm_kwargs={"reorder":res_idx,"resid_type":"Deviance"},gsmm_kwargs_pred={"resid_type":"Deviance"},ar_lag=25,axs=axs)
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – Coefficient vector as numpy array of shape (-1,1).
Xs ([scp.sparse.csc_array]) – The list model matrices (here holding a single model matrix) obtained from
mssm.models.GAMMLSS.get_mmat()
.delta (np.ndarray) – Dependent variable passed to
mssm.src.python.formula.Formula()
, holds (for each row inXs[0
]) a value in{0,1}
, indicating whether for that observation the event was observed or not.t (int) – Time-point at which to evaluate the survival function.
x (np.ndarray or scp.sparse.csc_array) – Optional vector (or matrix - can also be sparse) of covariate values. Needs to be of shape
(k,len(coef))
.V (scp.sparse.csc_array) – Estimated Co-variance matrix of posterior for
coef
compute_var (bool, optional) – Whether to compue the variance estimate of the survival as well. Otherwise None will be returned as the second argument.
- Returns:
Two arrays, the first holds
k
survival function estimates, the latter holdsk
variance estimates for each of the survival function estimates. The second argument will be None instead ifcompute_var = False
.- Return type:
tuple[np.ndarray, np.ndarray | None]
- gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray
Gradient as defined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.ys ([np.ndarray]) – List containing the
delta
vector at the first and only index - see description of the model family.Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.
- Returns:
The Gradient of the log-likelihood evaluated at
coef
as numpy array of shape (-1,1).- Return type:
np.ndarray
- hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array
Hessian as defined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.ys ([np.ndarray]) – List containing the
delta
vector at the first and only index - see description of the model family.Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.
- Returns:
The Hessian of the log-likelihood evaluated at
coef
.- Return type:
scp.sparse.csc_array
- init_coef(models: list[Callable]) ndarray
Function to initialize the coefficients of the model.
- Parameters:
models ([mssm.models.GAMM]) – A list of GAMMs, - each based on one of the formulas provided to a model.
- Returns:
A numpy array of shape (-1,1), holding initial values for all model coefficients.
- Return type:
np.ndarray
- llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float
Log-likelihood function as defined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk - not required by this family, which has a single parameter.ys ([np.ndarray]) – List containing the
delta
vector at the first and only index - see description of the model family.Xs ([scp.sparse.csc_array]) – A list containing the sparse model matrix at the first and only index.
- Returns:
The log-likelihood evaluated at
coef
.- Return type:
float
- mssm.src.python.exp_fam.est_scale(res: ndarray, rows_X: int, total_edf: float) float
Scale estimate from Wood & Fasiolo (2017).
- Refereces:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.
- Parameters:
res (np.ndarray) – A numpy array containing the difference between the model prediction and the (pseudo) data.
rows_X (int) – The number of observations collected.
total_edf (float) – The expected degrees of freedom for the model.
mssm.src.python.file_loading module
- mssm.src.python.file_loading.clear_cache(cache_dir: str, should_cache: bool) None
Clear up cache for row-subsets of model matrix.
- Parameters:
cache_dir (str) – path to cache directory
should_cache (bool) – whether or not the directory should actually be created
- mssm.src.python.file_loading.read_cor_cov_single(y: str, x: str, file: str, file_loading_kwargs: dict) ndarray
Read values of covariate
x
fromfile
correcting for NaNs iny
.- Parameters:
y (str) – name of covariate potentially having NaNs
x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding values in
x
for whichy
is not NaN- Return type:
np.ndarray
- mssm.src.python.file_loading.read_cov(y: str, x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray
Read values of covariate
x
fromfiles
correcting for NaNs iny
.- Parameters:
y (str) – name of covariate potentially having NaNs
x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding values in
x
for whichy
is not NaN- Return type:
np.ndarray
- mssm.src.python.file_loading.read_cov_no_cor(x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray
Read values of covariate
x
fromfiles
.- Parameters:
x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding values in
x
- Return type:
np.ndarray
- mssm.src.python.file_loading.read_dtype(column: str, file: str, file_loading_kwargs: dict) dtype
Read datatype of variable
column
infile
.- Parameters:
column (str) – Name of covariate
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
Datatype (numpy) of
colum
- Return type:
np.dtype
- mssm.src.python.file_loading.read_no_cor_cov_single(x: str, file: str, file_loading_kwargs: dict) ndarray
Read values of covariate
x
fromfile
.- Parameters:
x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding values in
x
- Return type:
np.ndarray
- mssm.src.python.file_loading.read_unique(x: str, files: list[str], nc: int, file_loading_kwargs: dict) ndarray
Read unique values of covariate
x
fromfiles
.- Parameters:
x (str) – covariate name
files (list[str]) – list of file names
nc (int) – Number of cores to use to read in parallel
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding unique values
- Return type:
np.ndarray
- mssm.src.python.file_loading.read_unique_single(x: str, file: str, file_loading_kwargs: dict) ndarray
Read unique values of covariate
x
fromfile
.- Parameters:
x (str) – covariate name
file (str) – file name
file_loading_kwargs (dict) – Any optional file loading key-word arguments.
- Returns:
numpy array holding unique values
- Return type:
np.ndarray
- mssm.src.python.file_loading.setup_cache(cache_dir: str, should_cache: bool) None
Set up cache for row-subsets of model matrix.
- Parameters:
cache_dir (str) – path to cache directory
should_cache (bool) – whether or not the directory should actually be created
- Raises:
ValueError – if the directory already exists
mssm.src.python.formula module
- class mssm.src.python.formula.Formula(lhs: lhs, terms: list[GammTerm], data: DataFrame, series_id: str | None = None, codebook: dict | None = None, print_warn: bool = True, keep_cov: bool = False, find_nested: bool = True, file_paths: list[str] = [], file_loading_nc: int = 1, file_loading_kwargs: dict = {'header': 0, 'index_col': False})
Bases:
object
The formula of a regression equation.
Note: The class implements multiple
get_*
functions to access attributes stored in instance variables. The get functions always return a copy of the instance variable and the results are thus safe to manipulate.Examples:
from mssm.models import * from mssmViz.sim import * from mssm.src.python.formula import build_penalties,build_model_matrix # Get some data and formula Binomdat = sim3(10000,0.1,family=Binomial(),seed=20) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat) # Now with a tensor smooth formula = Formula(lhs("y"),[i(),f(["x0","x1"],te=True),f(["x2"]),f(["x3"])],data=Binomdat) # Now with a tensor smooth anova style formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x0","x1"]),f(["x2"]),f(["x3"])],data=Binomdat) ######## Stream data from file and set up custom codebook ######### file_paths = [f'https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat_cond_{cond}.csv' for cond in ["a","b"]] # Set up specific coding for factor 'cond' codebook = {'cond':{'a': 0, 'b': 1}} formula = Formula(lhs=lhs("y"), # The dependent variable - here y! terms=[i(), # The intercept, a l(["cond"]), # For cond='b' f(["time"],by="cond"), # to-way interaction between time and cond; one smooth over time per cond level f(["x"],by="cond"), # to-way interaction between x and cond; one smooth over x per cond level f(["time","x"],by="cond"), # three-way interaction fs(["time"],rf="sub")], # Random non-linear effect of time - one smooth per level of factor sub data=None, # No data frame! file_paths=file_paths, # Just a list with paths to files. print_warn=False, codebook=codebook) # Alternative: formula = Formula(lhs=lhs("y"), terms=[i(), l(["cond"]), f(["time"],by="cond"), f(["x"],by="cond"), f(["time","x"],by="cond"), fs(["time"],rf="sub")], data=None, file_paths=file_paths, print_warn=False, keep_cov=True, # Keep encoded data structure in memory codebook=codebook) ########## preparing for ar1 model (with resets per time-series) and data type requirements ########## dat = pd.read_csv('https://raw.githubusercontent.com/JoKra1/mssmViz/main/data/GAMM/sim_dat.csv') # mssm requires that the data-type for variables used as factors is 'O'=object dat = dat.astype({'series': 'O', 'cond':'O', 'sub':'O', 'series':'O'}) formula = Formula(lhs=lhs("y"), terms=[i(), l(["cond"]), f(["time"],by="cond"), f(["x"],by="cond"), f(["time","x"],by="cond")], data=dat, print_warn=False, series_id='series') # 'series' variable identifies individual time-series
- Parameters:
lhs – The lhs object defining the dependent variable.
terms ([GammTerm]) – A list of the terms which should be added to the model. See
mssm.src.python.terms
for info on which terms can be added.data (pd.DataFrame or None) – A pandas dataframe (with header!) of the data which should be used to estimate the model. The variable specified for
lhs
as well as all variables included for aterm
interms
need to be present in the data, otherwise the call to Formula will throw an error.series_id (str, optional) – A string identifying the individual experimental units. Usually a unique trial identifier. Only necessary if approximate derivative computations are to be utilized for random smooth terms or if you need to estimate an ‘ar1’ model for multiple time-series data.
codebook (dict or None) – Codebook - keys should correspond to factor variable names specified in terms. Values should again be a
dict
, with keys for each of K levels of the factor and value corresponding to an integer in {0,K}.print_warn (bool,optional) – Whether warnings should be printed. Useful when fitting models from terminal. Defaults to True.
keep_cov (bool,optional) – Whether or not the internal encoding structure of all predictor variables should be created when forming \(\mathbf{X}^T\mathbf{X}\) iteratively instead of forming \(\mathbf{X}\) directly. Can speed up estimation but increases memory footprint. Defaults to True.
find_nested (bool,optional) – Whether or not to check for nested smooth terms. This only has an effect if you include at least one smooth term with more than two variables. Additionally, this check is often not necessary if you correctly use the
te
key-word of smooth terms and ensure that the marginals used to construct ti smooth terms have far fewer basis functions than the “main effect” univariate smooths. Thus, if you know what you’re doing and you’re working with large models, you might want to disable this (i.e., set to False) because this check can get quite expensive for larger models. Defaults to True.file_paths ([str],optional) – A list of paths to .csv files from which \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively. Setting this to a non-empty list will prevent fitting X as a whole.
data
should then be set toNone
. Defaults to an empty list.file_loading_nc (int,optional) – How many cores to use to a) accumulate \(\mathbf{X}\) in parallel (if
data
is notNone
andfile_paths
is an empty list) or b) to accumulate \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) (and \(\mathbf{\eta}\) during estimation) (ifdata
isNone
andfile_paths
is a non-empty list). For case b, this should really be set to the maimum number of cores available. For a this only really speeds up accumulating \(\mathbf{X}\) if \(\mathbf{X}\) has many many columns and/or rows. Defaults to 1.file_loading_kwargs (dict,optional) – Any key-word arguments to pass to pandas.read_csv when \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) should be created iteratively (if
data
isNone
andfile_paths
is a non-empty list). Defaults to{"header":0,"index_col":False}
.
- Variables:
lhs (lhs) – The left-hand side object of the regression formula passed to the constructor. Initialized at construction.
terms ([GammTerm]) – The list of terms passed to the constructor. Initialized at construction.
data (pd.DataFrame) – The dataframe passed to the constructor. Initialized at construction.
coef_per_term ([int]) – A list containing the number of coefficients corresponding to each term included in
terms
. Initialized at construction.coef_names ([str]) – A list containing a named identifier (e.g., “Intercept”) for each coefficient estimated by the model. Initialized at construction.
n_coef (int) – The number of coefficients estimated by the model in total. Initialized at construction.
unpenalized_coef (int) – The number of un-penalized coefficients estimated by the model. Initialized at construction.
y_flat (np.ndarray or None) – An array, containing all values on the dependent variable (i.e., specified by
lhs.variable
) in order of the data-frame passed todata
. This variable will be initialized at construction but only iffile_paths=None
, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.cov_flat (np.ndarray or None) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of theterms
in order of the data-frame passed todata
. This variable will be initialized at construction but only iffile_paths=None
, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.NOT_NA_flat (np.ndarray or None) – An array, containing an indication (as bool) for each value on the dependent variable (i.e., specified by
lhs.variable
) whether the corresponding value is not a number (“NA”) or not. In order of the data-frame passed todata
. This variable will be initialized at construction but only iffile_paths=None
, i.e., in case \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}^T\mathbf{y}\) are not created iteratively.
- encode_data(data: DataFrame, prediction: bool = False) tuple[ndarray | None, ndarray, ndarray | None, list[ndarray] | None, list[ndarray] | None, list[ndarray] | None, ndarray | None]
Encodes
data
, which needs to be apd.DataFrame
and by default (ifprediction==False
) builds an index of which rows indata
are NA in the column of the dependent variable described byself.lhs
.- Parameters:
data (pd.DataFrame) – The data to encode.
prediction (bool, optional) – Whether or not a NA index and a column for the dependent variable should be generated.
- Returns:
A tuple with 7 (optional) entries: the dependent variable described by
self.lhs
, the encoded predictor variables as a (N,k) array (number of rows matches the number of rows of the first entry returned, the number of columns matches the number of k variables present in the formula), an indication for each row whether the dependent variable described byself.lhs
is NA, like the first entry but split into a list of lists byself.series_id
, like the second entry but split into a list of lists byself.series_id
, ike the third entry but split into a list of lists byself.series_id
, start and end points for the splits used to split the previous three elements (identifying the start and end point of every level ofself.series_id
).- Return type:
(np.ndarray|None, np.ndarray, np.ndarray|None, list[np.ndarray]|None, list[np.ndarray]|None, list[np.ndarray]|None, np.ndarray|None)
- get_coding_factors() dict
Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).
- get_data() DataFrame
Get a copy of the
data
specified for this formula.
- get_depvar() ndarray
Get a copy of the encoded dependent variable (defined via
self.lhs
).
- get_factor_codings() dict
Get a copy of the factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the levels (str) of the factor and the values to their encoded levels (int).
- get_factor_levels() dict
Get a copy of the factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
- get_has_intercept() bool
Does this formula include an intercept or not.
- get_ir_smooth_term_idx() list[int]
Get a copy of the list of indices that identify impulse response terms in
self.terms
.
- get_linear_term_idx() list[int]
Get a copy of the list of indices that identify linear terms in
self.terms
.
- get_n_coef() int
Get the number of coefficients that are implied by the formula.
- get_notNA() ndarray
Get a copy of the encoded ‘not a NA’ vector for the dependent variable (defined via
self.lhs
).
- get_random_term_idx() list[int]
Get a copy of the list of indices that identify random terms in
self.terms
.
- get_smooth_term_idx() list[int]
Get a copy of the list of indices that identify smooth terms in
self.terms
.
- get_subgroup_variables() list
Returns a copy of sub-group variables for factor smooths.
- get_term_names() list[str]
Returns a copy of the list with the names of the terms specified for this formula.
- get_var_map() dict
Get a copy of the var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix returned by
self.encode_data
.
- get_var_maxs() dict
Get a copy of the var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in
self.data
for continuous variables orNone
for categorical variables.
- get_var_mins() dict
Get a copy of the var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on in
self.data
for continuous variables orNone
for categorical variables.
- get_var_mins_maxs() tuple[dict, dict]
Get a tuple containing copies of both the mins and maxs directory. See
self.get_var_mins
andself.get_var_maxs
.
- get_var_types() dict
Get a copy of the var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.
- has_ir_terms() bool
Does this formula include impulse response terms or not.
- mssm.src.python.formula.build_model_matrix(formula: Formula, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) csc_array
Function to build the model matrix implied by
formula
.Important: A small selection of smooth terms, requires that the penalty matrices are built at least once before the model matrix can be build. For this reason, you generally must call
build_penalties(formula)
before callingbuild_model_matrix(formula)
(interally, mssm checks whetherformula.built_penalties==True
.). See the example below.Examples:
from mssm.models import * from mssmViz.sim import * from mssmViz.plot import * import matplotlib.pyplot as plt from mssm.src.python.formula import build_penalties,build_model_matrix # Get some data and formula Binomdat = sim3(10000,0.1,family=Binomial(),seed=20) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat) # First extract the penalties penalties = build_penalties(formula) # Then the model matrix: X = build_model_matrix(formula)
- Parameters:
formula (Formula) – A Formula
pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None
use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If
None
, then all terms are build. Terms not built are set to zero columns, defaults to Nonetol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0
- Raises:
ValueError – If
formula.built_penalties == False
- i.e., it is required thatbuild_penalties(formula)
was called before callingbuild_model_matrix(formula)
.NotImplementedError – If the
formula
was set up to read data from file, rather than from a pd.Dataframe.
- Returns:
The model matrix implied by a
Formula
andcov_flat
.- Return type:
scp.sparse.csc_array
- mssm.src.python.formula.build_penalties(formula) list[LambdaTerm]
Function to build all penalty matrices required by a
Formula
.The function is called whenever it is needed, but the example below shows you how to use it in case you want to extract the penalties directly.
Examples:
from mssm.models import * from mssmViz.sim import * from mssm.src.python.formula import build_penalties # Get some data and formula Binomdat = sim3(10000,0.1,family=Binomial(),seed=20) formula = Formula(lhs("y"),[i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])],data=Binomdat) # Now extract the penalties penalties = build_penalties(formula) print(penalties)
- Parameters:
formula (Formula) – A Formula
- Raises:
KeyError – If an un-penalized irf term is included in the formula after penalized terms.
KeyError – If an un-penalized smooth term is included in the formula after penalized terms.
ValueError – If no start index has been defined by the formula. For testing only.
- Returns:
A list of all penalties (encoded as
LambdaTerm
) required by the formula- Return type:
list[LambdaTerm]
- mssm.src.python.formula.build_sparse_matrix_from_formula(terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat: ndarray, cov: ndarray | None, pool: Pool | None = None, use_only: list[int] | None = None, tol: float = 0) csc_array
Build model matrix from formula properties.
This function is used internally to construct model matrices from
Formula
objects. For greater convenience see thebuild_model_matrix()
function.Important, make sure to only ever call this when
formula.built_penalties==True
- see thebuild_model_matrix()
function description.- Parameters:
has_intercept (bool) – Indicator of whether the Formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat (np.ndarray) – Encoded data
cov (np.ndarray | None, optional) – Encoded data split by levels of the factor in
Formula.series_id
pool (mp.pool.Pool | None, optional) – An instance of a multiprocessing pool, defaults to None
use_only (list[int] | None, optional) – A list of indices corresponding to which terms should actually be built. If
None
, then all terms are build. Terms not built are set to zero columns, defaults to Nonetol (float, optional) – Optional tolerance. Absolute values in the model matrix smaller than this are set to actual zeroes, defaults to 0
- Returns:
The model matrix implied by a
Formula
andcov_flat
.- Return type:
scp.sparse.csc_array
- class mssm.src.python.formula.lhs(variable: str, f: Callable = None)
Bases:
object
The Left-hand side of a regression equation.
See the
Formula
class for examples.- Parameters:
variable (str) – The name of the dependent/response variable in the dataframe passed to a
Formula
. Can point to continuous and categorical variables. Formssm..models.GSMM
models, the variable can also be set to any placeholder variable in the data, since not everyFormula
will be associated with a particular response variable.f (Callable, optional) – A function that will be applied to the
variable
before fitting. For example: np.log(). By default no function is applied to thevariable
.
mssm.src.python.gamm_solvers module
- mssm.src.python.gamm_solvers.PIRLS_newton_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) tuple[ndarray, ndarray, ndarray]
Internal function. Compute pseudo-data and newton weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1 and 3.1.2)
Calculation reflects full Newton scoring!
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – vector of observations
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
family (Family) – Family of model
- Raises:
ValueError – If not a single observation provided information for newton weights.
- Returns:
the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations
- Return type:
tuple[np.ndarray,np.ndarray,np.ndarray]
- mssm.src.python.gamm_solvers.PIRLS_pdat_weights(y: ndarray, mu: ndarray, eta: ndarray, family: Family) tuple[ndarray, ndarray, ndarray]
Internal function. Compute pseudo-data and weights for Penalized Reweighted Least Squares iteration (Wood, 2017, 6.1.1)
Calculation is based on a(mu) = 1, so reflects Fisher scoring!
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – vector of observations
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
family (Family) – Family of model
- Raises:
ValueError – If not a single observation provided information for Fisher weights.
- Returns:
the pesudo-data, weights, and a boolean array indicating invalid weights/pseudo-observations
- Return type:
tuple[np.ndarray,np.ndarray,np.ndarray]
- mssm.src.python.gamm_solvers.apply_eigen_perm(Pr: list[int], InvCholXXSP: csc_array) csc_array
Internal function. Unpivots columns of
InvCholXXSP
(usually the inverse of a Cholesky factor) and returns the unpivoted version.- Parameters:
Pr (list[int]) – List of column indices
InvCholXXSP (scp.sparse.csc_array) – Pivoted matrix
- Returns:
Unpivoted matrix
- Return type:
scp.sparse.csc_array
- mssm.src.python.gamm_solvers.back_track_alpha(coef: ndarray, step: ndarray, llk_fun: Callable, grad_fun: Callable, *llk_args, alpha_max: float = 1, c1: float = 0.0001, max_iter: int = 100) float | None
Simple step-size backtracking function that enforces Armijo condition (Nocedal & Wright, 2004)
- References:
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
- Parameters:
coef (np.ndarray) – coefficient estimate
step (np.ndarray) – step to take to update coefficients
llk_fun (Callable) – llk function
grad_fun (Callable) – function to evaluate gradient of llk
alpha_max (float, optional) – Parameter by Nocedal & Wright, defaults to 1
c1 (float, optional) – 2nd Parameter by Nocedal & Wright, defaults to 1e-4
max_iter (int, optional) – Number of maximum iterations, defaults to 100
- Returns:
The step-length meeting the Armijo condition or None in case none such was found
- Return type:
float | None
- mssm.src.python.gamm_solvers.calculate_edf(LP: csc_array | None, Pr: list[int], InvCholXXS: csc_array | LinearOperator | None, penalties: list[LambdaTerm], lgdetDs: list[float] | None, colsX: int, n_c: int, drop: list[int] | None, S_emb: csc_array) tuple[float, list[float], list[csc_array]]
Internal function. Follows steps outlined by Wood & Fasiolo (2017) to compute total degrees of freedom by the model.
Generates the B matrix also required for the derivative of the log-determinant of X.T@X+S_lambda. This is either done exactly - as described by Wood & Fasiolo (2017) - or approximately. The latter is much faster.
Also implements the L-qEFS trace computations described by Krause et al. (submitted) based on a quasi-newton approximation to the negative hessian of the log-likelihood.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None
Pr (list[int]) – Permutation list of
LP
InvCholXXS (scp.sparse.csc_array | scp.sparse.linalg.LinearOperator | None) – Unpivoted Inverse of
LP
, or a quasi-newton approximation of it (for the L-qEFS update), or Nonepenalties (list[LambdaTerm]) – list of penalties
lgdetDs (list[float]) – list of Derivatives of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
colsX (int) – Number of columns of model matrix
n_c (int) – Number of cores to use for computations
drop (list[int]) – List of dropped coefficients - can be None
S_emb (scp.sparse.csc_array) – Total penalty matrix
- Returns:
A tuple containing the total estimated degrees of freedom, the amount of parameters penalized away by individual penalties in a list, and a list of the aforementioned B matrices
- Return type:
tuple[float,list[float],list[scp.sparse.csc_array]]
- mssm.src.python.gamm_solvers.calculate_term_edf(penalties: list[LambdaTerm], param_penalized: list[float]) list[float]
Internal function. Computes the smooth-term (and random term) specific estimated degrees of freedom.
See Wood (2017) for a definition and Wood, S. N., & Fasiolo, M. (2017). for the computations.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
penalties (list[LambdaTerm]) – List of penalties
param_penalized (list[float]) – List holding the amount of parameters penalized away by individual penalties - obtained from
calculate_edf()
.
- Returns:
A list holding the estimated degrees of freedom per smooth/random term in the model
- Return type:
list[float]
- mssm.src.python.gamm_solvers.check_drop_valid_gammlss(y: ndarray, coef: ndarray, coef_split_idx: list[int], Xs: list[csc_array], S_emb: csc_array, keep: list[int], family: GAMLSSFamily) tuple[bool, float]
Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.
- Parameters:
y (np.ndarray) – Vector of response variable
coef (np.ndarray) – Vector of coefficientss
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
Xs (list[scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
keep (list[int]) – List of coefficients to retain
family (GAMLSSFamily) – Model family
- Returns:
tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set.
- Return type:
tuple[bool,float]
- mssm.src.python.gamm_solvers.check_drop_valid_gensmooth(ys: list[ndarray], coef: ndarray, Xs: list[csc_array], S_emb: csc_array, keep: list[int], family: GSMMFamily) tuple[bool, float | None]
Checks whether an identified set of coefficients to be dropped from the model results in a valid log-likelihood.
- Parameters:
ys (list[np.ndarray]) – List holding vectors of observations
coef (np.ndarray) – Vector of coefficients
Xs (list[scp.sparse.csc_array]) – List of model matrices - one per parameter
S_emb (scp.sparse.csc_array) – Total Penalty matrix
keep (list[int]) – List of coefficients to retain
family (GSMMFamily) – Model family
- Returns:
tuple holding bool indicating if likelihood is valid and penalized log-likelihood under dropped set.
- Return type:
tuple[bool,float|None]
- mssm.src.python.gamm_solvers.compute_S_emb_pinv_det(col_S: int, penalties: list[LambdaTerm], pinv: str, root: bool = False) tuple[csc_array, csc_array, csc_array | None, list[bool]]
Internal function. Compute the total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and determines for which EFS updates the rank rather than the generalized inverse should be used.
- Parameters:
col_S (int) – Number of columns of total penalty matrix
penalties (list[LambdaTerm]) – List of penalties
pinv (str) – Strategy to use to compute the generalized inverse. Set this to ‘svd’.
root (bool, optional) – Whther to compute a root of the generalized inverse, defaults to False
- Returns:
A tuple holding total embedded penalty matrix, a generalized inverse of the former, optionally a root of the total penalty matrix, and a list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
- Return type:
tuple[scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array|None, list[bool]]
- mssm.src.python.gamm_solvers.compute_eigen_perm(Pr: list[int]) csc_array
Internal function. Computes column permutation matrix obtained from Eigen.
- Parameters:
Pr (list[int]) – List of column indices
- Returns:
Permutation matrix as sparse array
- Return type:
scp.sparse.csc_array
- mssm.src.python.gamm_solvers.compute_lgdetD_bsb(rank: int | None, cLam: float, gInv: csc_array, emb_SJ: csc_array, cCoef: ndarray) tuple[float, float]
Internal function. Computes derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.
See Wood, Shaddick, & Augustin, (2017) and Wood & Fasiolo (2017), and Wood (2017), and Wood (2011)
- References:
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
rank (int | None) – Known rank of penalty matrix or None (should only be set to int for single penalty terms)
cLam (float) – Current lambda value
gInv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix
emb_SJ (scp.sparse.csc_array) – Embedded penalty matrix
cCoef (np.ndarray) – coefficient vector
- Returns:
Tuple, first element is aforementioned derivative, second is
cCoef.T@emb_SJ@cCoef
- Return type:
tuple[float,float]
- mssm.src.python.gamm_solvers.computetrVS3(t1: ndarray | None, t2: ndarray | None, t3: ndarray | None, lTerm: LambdaTerm, V0: csc_array) float
Internal function. Compute
tr(V@lTerm.S_j)
from linear operator ofV
obtained from L-BFGS-B optimizer.Relies on equation 3.13 in Byrd, Nocdeal & Schnabel (1992). Adapted to ensure positive semi-definitiness required by EFS update.
- References:
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
t1 (np.ndarray or None) –
nCoef*2m
matrix from Byrd, Nocdeal & Schnabel (1992). Ift2 is None
, thenV
is treated like an identity matrix.t2 (np.ndarray or None) –
2m*2m
matrix from Byrd, Nocdeal & Schnabel (1992). Ift2 is None
, thenV
is treated like an identity matrix.t3 (np.ndarray or None) –
2m*nCoef
matrix from Byrd, Nocdeal & Schnabel (1992). Ift2 is None
, thent1
is treated like an identity matrix.lTerm (LambdaTerm) – Current lambda term for which to compute the trace.
V0 (scipy.sparse.csc_array) – Initial estimate for the inverse of the hessian fo the negative penalized likelihood.
- Returns:
trace
- Return type:
float
- mssm.src.python.gamm_solvers.correct_coef_step(coef: ndarray, n_coef: ndarray, dev: float, pen_dev: float, c_dev_prev: float, family: Family, eta: ndarray, mu: ndarray, y: ndarray, X: csc_array, n_pen: float, S_emb: csc_array, formula: Formula, n_c: int, offset: float | ndarray) tuple[float, float, ndarray, ndarray, ndarray]
Internal function. Performs step-length control on the coefficient vector.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
coef (np.ndarray) – Current coefficient estimate
n_coef (np.ndarray) – New coefficient estimate
dev (float) – new deviance
pen_dev (float) – new penalized deviance
c_dev_prev (float) – previous penalized deviance
family (Family) – Family of model
eta (np.ndarray) – vector of linear predictors - under new coefficient estimate
mu (np.ndarray) – vector of mean estimates - under new coefficient estimate
y (np.ndarray) – vector of observations of the working model
X (scp.sparse.csc_array) – Model matrix of working model
n_pen (float) – total penalty under new coefficient estimate
S_emb (scp.sparse.csc_array) – Total penalty matrix
formula (Formula) – Formula of model
n_c (int) – Number of cores
offset (float | np.ndarray) – Offset (fixed effect) to add to
eta
- Returns:
Updated versions of dev,pen_dev,mu,eta,coef
- Return type:
tuple[float,float,np.ndarray,np.ndarray,np.ndarray]
- mssm.src.python.gamm_solvers.correct_coef_step_gammlss(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], float, float, float]
Apply step size correction to Newton update for GAMLSS models, as discussed by WPS (2016).
References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
family (GAMLSSFamily) – Family of model
y (np.ndarray) – Vector of observations
Xs (list[scp.sparse.csc_array]) – List of model matrices
coef (np.ndarray) – Current coefficient estimate
next_coef (np.ndarray) – Updated coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
c_llk (float) – Current log likelihood
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update
- Returns:
A tuple containing the corrected coefficient estimate
next_coef
,``next_coef`` split viacoef_split_idx
,next mus,next etas,next llk,nex penalized llk, updated step length fro next gradient update- Return type:
tuple[np.ndarray,list[np.ndarray],list[np.ndarray],list[np.ndarray],float,float,float]
- mssm.src.python.gamm_solvers.correct_coef_step_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], coef: ndarray, next_coef: ndarray, coef_split_idx: list[int], c_llk: float, S_emb: csc_array, a: float) tuple[ndarray, float, float, float]
Apply step size correction to Newton update for general smooth models, as discussed by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
family (GSMMFamily) – Model family
ys (list[np.ndarray]) – List of vectors of observations
Xs (list[scp.sparse.csc_array]) – List of model matrices
coef (np.ndarray) – Coefficient estimate
next_coef (np.ndarray) – Proposed next coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
c_llk (float) – Current log likelihood
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update
- Returns:
A tuple containing the corrected coefficient estimate
next_coef
,next llk, next penalized llk, updated step length for next gradient update- Return type:
tuple[np.ndarray,float,float,float]
- mssm.src.python.gamm_solvers.correct_lambda_step(y: ndarray, yb: ndarray, z: ndarray, Wr: csc_array, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, coef: ndarray, Lrhoi: csc_array | None, family: Family, col_S: int, S_emb: csc_array, penalties: list[LambdaTerm], was_extended: list[bool], pinv: str, lam_delta: ndarray, extend_by: dict, o_iter: int, dev_check: float, n_c: int, control_lambda: int, extend_lambda: bool, exclude_lambda: bool, extension_method_lam: str, formula: Formula, form_Linv: bool, method: str, offset: float | ndarray, max_inner: int) tuple[ndarray, csc_array, ndarray, csc_array, ndarray, ndarray, ndarray, csc_array, csc_array | None, float, list[float], float, ndarray, ndarray, dict, list[LambdaTerm], list[bool], csc_array, int, list[int] | None, list[int] | None]
Performs step-length control for lambda.
Lambda update is based on EFS update by Wood & Fasiolo (2017), step-length control is partially based on Wood et al. (2017) - Krause et al. (submitted) has the specific implementation.
- References:
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
z (np.ndarray) – pseudo-data (can have NaNs for invalid observations)
Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
coef (np.ndarray) – Current coefficient estimate
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Model family
col_S (int) – Columns of total penalty matrix
S_emb (scp.sparse.csc_array) – Total penalty matrix
penalties (list[LambdaTerm]) – List of penalties
was_extended (bool) – List holding indication per lambda parameter whether it was extended or not
pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!
lam_delta (np.ndarray) – Proposed update to lambda parameters
extend_by (dict) – Extension info dictionary
o_iter (int) – Outer iteration index
dev_check (float) – Multiple of previous deviance used for convergence check
n_c (int) – Number of cores to use
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 2 by default.
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str) – Experimental - do not change! Which method to use to extend lambda proposals. Set to ‘nesterov’ by default.
formula (Formula) – Formula of model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)
offset (float | np.ndarray) – Offset (fixed effect) to add to
eta
max_inner (int) – Maximum number of iterations to use to update the coefficient estimate
- Returns:
Tuple containing updated values for yb, Xb, z, Wr, eta, mu, n_coef, the Cholesky fo the penalzied hessian
CholXXS
, the inverse of the formerInvCholXXS
, total edf, term-wse edfs, updated scale, working residuals, accepted update to lambda, extend_by, penalties, was_extended, updated S_emb, number of lambda updates, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop- Return type:
tuple[np.ndarray, scp.sparse.csc_array, np.ndarray, scp.sparse.csc_array, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array|None, float, list[float], float, np.ndarray, np.ndarray, dict, list[LambdaTerm], list[bool], scp.sparse.csc_array, int, list[int]|None, list[int]|None]
- mssm.src.python.gamm_solvers.correct_lambda_step_gamlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs: list[csc_array], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, list[int], list[int], csc_array, list[LambdaTerm], float, list[float], ndarray]
Updates and performs step-length control for the vector of lambda parameters of a GAMMLSS model. Essentially completes the steps described in section 3.3 of the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GAMLSSFamily) – Family of model
mus (list[np.ndarray]) – List of estimated means
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
n_coef (int) – Number of coefficients
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
gamlss_pen (list[LambdaTerm]) – List of penalties
lam_delta (np.ndarray) – Update to vector of lambda parameters
extend_by (dict) – Extension info dictionary
was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not
c_llk (float) – Current llk
fit_info (Fit_info) – A
Fit_info
objectouter (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary
extension_method_lam (str) – Which method to use to extend lambda proposals.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML.
repara (bool) – Whether to apply a stabilizing re-parameterization to the model
n_c (int) – Number of cores to use
- Returns:
coef estimate under corrected lambda, split version of next coef estimate, next mus, next etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop, the new total penalty matrix, the new list of penalties, total edf, term-wise edfs, the update to the lambda vector
- Return type:
tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, list[int], list[int], scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]
- mssm.src.python.gamm_solvers.correct_lambda_step_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], S_norm: csc_array, n_coef: int, form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], lam_delta: ndarray, extend_by: dict, was_extended: list[bool], c_llk: float, fit_info: Fit_info, outer: int, max_inner: int, min_inner: int, conv_tol: float, gamma: float, method: str, qEFSH: str, overwrite_coef: bool, qEFS_init_converge: bool, optimizer: str, __old_opt: LinearOperator | None, use_grad: bool, __neg_pen_llk: Callable, __neg_pen_grad: Callable, piv_tol: float, keep_drop: list[list[int], list[int]] | None, extend_lambda: bool, extension_method_lam: str, control_lambda: int, repara: bool, n_c: int, init_bfgs_options: dict, bfgs_options: dict) tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, csc_array | None, float, float, LinearOperator | None, list[int], list[int], csc_array, list[LambdaTerm], float, list[float], ndarray]
Updates and performs step-length control for the vector of lambda parameters of a GSMM model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GSMMFamily) – Model family
ys (list[np.ndarray]) – List of observation vectors
Xs (list[scp.sparse.csc_array]) – List of model matrices
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
n_coef (int) – Number of coefficients
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
smooth_pen (list[LambdaTerm]) – List of penalties
lam_delta (np.ndarray) – Update to vector of lambda parameters
extend_by (dict) – Extension info dictionary
was_extended (list[bool]) – List holding indication per lambda parameter whether it was extended or not
c_llk (float) – Current llk
fit_info (Fit_info) – A
Fit_info
objectouter (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
gamma (float) – Weight factor determining whether we should look for smoother or less smooth models
method (str) – Method to use to estimate coefficients (and lambda parameter)
qEFSH (str) – Should the hessian approximation use a symmetric rank 1 update (
qEFSH='SR1'
) that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'
)overwrite_coef (bool) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when
method='qEFS'
. Setting this to False will be useful when passing coefficients from a simpler model to initialize a more complex one. Only has an effect whenqEFS_init_converge=True
.qEFS_init_converge (bool) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if
overwrite_coef=True
) to initialize the q-EFS solver. Ignored ifmethod!='qEFS'
.optimizer (str) – Deprecated
__old_opt (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood
use_grad (bool) – Deprecated
__neg_pen_llk (Callable) – Function to evaluate negative penalized log-likelihood
__neg_pen_grad (Callable) – Function to evaluate gradient of negative penalized log-likelihood
piv_tol (float) – Deprecated
keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary
extension_method_lam (str) – Which method to use to extend lambda proposals.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For
method != 'qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when settingextend_lambda=True
). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. Formethod=='qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed.repara (bool) – Whether to apply a stabilizing re-parameterization to the model
n_c (int) – Number of cores to use
init_bfgs_options (dict) – An optional dictionary holding the same key:value pairs that can be passed to
bfgs_options
but pased to the optimizer of the un-penalized problem. Only has an effect whenqEFS_init_converge=True
.bfgs_options (dict) – An optional dictionary holding arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
.
- Returns:
coef estimate under corrected lambda, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former (or another instance of
scp.sparse.linalg.LinearOperator
representing the new quasi-newton approximation), covariance matrix of coefficients, next llk, next penalized llk, if the L-qEFS update is used to estimate coefficients/lambda parameters ascp.sparse.linalg.LinearOperator
holding the previous quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop, new total penalty matrix, new list of penalties, total edf, term-wise edfs, the update to the lambda vector- Return type:
tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, scp.sparse.csc_array|None, float, float, scp.sparse.linalg.LinearOperator|None, list[int], list[int], scp.sparse.csc_array, list[LambdaTerm], float, list[float], np.ndarray]
- mssm.src.python.gamm_solvers.deriv_transform_eta_beta(d1eta: list[ndarray], d2eta: list[ndarray], d2meta: list[ndarray], Xs, only_grad=False)
Further transforms derivatives of llk with respect to eta to get derivatives of llk with respect to coefficients Based on section 3.2 and Appendix A in Wood, Pya, & Säfken (2016)
References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- mssm.src.python.gamm_solvers.deriv_transform_mu_eta(y: ndarray, means: list[ndarray], family: GAMLSSFamily) tuple[list[ndarray], list[ndarray], list[ndarray]]
Compute derivatives (first and second order) of llk with respect to each linear predictor based on their respective mean for all observations following steps outlined by Wood, Pya, & Säfken (2016)
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
y (np.ndarray) – Vector of observations
means (list[np.ndarray]) – List holding vectors of mean estimates
family (GAMLSSFamily) – Family of the model
- Returns:
A tuple containing a list containing the first order partial derivatives with respect to each parameter, the same for pure second derivatives, and a list containing mixed derivatives
- Return type:
tuple[list[np.ndarray],list[np.ndarray],list[np.ndarray]]
- mssm.src.python.gamm_solvers.drop_terms_S(penalties: list[LambdaTerm], keep: list[int]) list[LambdaTerm]
Zeros out rows and cols of penalty matrices corresponding to dropped terms. Roots are re-computed as well.
- Parameters:
penalties (list[LambdaTerm]) – List of Lambda terms included in the model formula
keep (list[int]) – List of columns/rows to keep.
- Returns:
List of updated penalties - a copy is made.
- Return type:
list[LambdaTerm]
- mssm.src.python.gamm_solvers.drop_terms_X(Xs: list[csc_array], keep: list[int]) tuple[list[csc_array], list[int]]
Drops cols of model matrices corresponding to dropped terms.
- Parameters:
Xs (list[scp.sparse.csc_array]) – List of model matrices included in the model formula.
keep (list[int]) – List of columns to keep.
- Returns:
Tuple, containing a list of updated model matrices - a copy is made - and a new list conatining the indices by which to split the coefficient vector.
- Return type:
tuple[list[scp.sparse.csc_array],list[int]]
- mssm.src.python.gamm_solvers.extend_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str) tuple[float, dict, bool]
Internal function. Performs an update to the lambda parameter, ideally extending the step aken without overshooting the objective.
- Parameters:
lti (int) – Penalty index
lam (float) – Current lamda value
dLam (float) – The lambda update
extend_by (dict) – Extension info dictionary
was_extended (bool) – List holding indication per lambda parameter whether it was extended or not
method (str) – Extension method to use.
- Raises:
ValueError – If requested method is not implemented
- Returns:
Updated values for dLam,extend_by,was_extended
- Return type:
tuple[float,dict,bool]
- mssm.src.python.gamm_solvers.form_cross_prod_mp(should_cache: bool, cache_dir: str, file: str, fi: int, y_flat: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) tuple[csc_array, ndarray]
Computes X.T@X and X.T@y based on the data in
file
.- Parameters:
should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
y_flat (np.ndarray) – Observation vector
terms (list[GammTerm]) – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on
file
cov (list[np.ndarray]) – Essentially
[cov_flat_file]
- Returns:
X.T@X, X.T@y
- Return type:
tuple[scp.sparse.csc_array,np.ndarray]
- mssm.src.python.gamm_solvers.form_eta_mp(should_cache: bool, cache_dir: str, file: str, fi: int, coef: ndarray, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) ndarray
Computed
X@coef
, whereX
is model matrix forfile
.- Parameters:
should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
coef (np.ndarray) – Current coefficient estimate
terms (list[GammTerm]) – _description_
terms – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on
file
cov (list[np.ndarray]) – Essentially
[cov_flat_file]
- Returns:
X@coef for this file
- Return type:
np.ndarray
- mssm.src.python.gamm_solvers.gd_coef_smooth(coef: ndarray, grad: ndarray, S_emb: csc_array, a: float) ndarray
Follows sections 3.1.2 and 3.14 in WPS (2016) to update the coefficients of a GAMLSS/GSMM model via a Gradient descent (ascent actually) step.
1) Computes gradient of the penalized likelihood (grad - S_emb@coef) 3) Uses this to compute update 4) Step size control - happens outside
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
coef (np.ndarray) – Current coefficient estimate
grad (np.ndarray) – gradient of llk with respect to coef
S_emb (scp.sparse.csc_array) – Total penalty matrix
a (float) – Step length for gradient descent update
- Returns:
An updated estimate of the coefficients
- Return type:
np.ndarray
- mssm.src.python.gamm_solvers.grad_lambda(lgdet_deriv: float, ldet_deriv: float, bSb: float, scale: float) ndarray
Internal function. Computes gradient of REML criterion with respect to all lambda paraemters.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.
ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
bSb (float) –
cCoef.T@emb_SJ@cCoef
wherecCoef
is current coefficient estimatescale (float) – Optional scale parameter (or 1)
- Returns:
The gradient of the reml criterion
- Return type:
np.ndarray
- mssm.src.python.gamm_solvers.handle_drop_gammlss(family: GAMLSSFamily, y: ndarray, coef: ndarray, keep: list[int], Xs: list[csc_array], S_emb: csc_array) tuple[ndarray, list[ndarray], list[int], list[csc_array], csc_array, list[ndarray], list[ndarray], float, float]
Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.
- Parameters:
family (GAMLSSFamily) – Model family
y (np.ndarray) – Vector of observations
coef (np.ndarray) – Vector of coefficients
keep (list[int]) – List of parameter indices to keep.
Xs (list[scp.sparse.csc_array]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix.
- Returns:
A tuple holding: reduced coef vector, split version of the reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated etas, mus, llk, and penalzied llk
- Return type:
tuple[np.ndarray, list[np.ndarray], list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, list[np.ndarray], list[np.ndarray], float, float]
- mssm.src.python.gamm_solvers.handle_drop_gsmm(family: GSMMFamily, ys: list[ndarray], coef: ndarray, keep: list[int], Xs: list[csc_array], S_emb: csc_array) tuple[ndarray, list[int], list[csc_array], csc_array, float, float]
Drop coefficients and make sure this is reflected in the model matrices, total penalty, llk, and penalized llk.
- Parameters:
family (GSMMFamily) – Model family
ys (list[np.ndarray]) – List with vector of observations
coef (np.ndarray) – Vector of coefficients
keep (list[int]) – List of parameter indices to keep.
Xs (list[scp.sparse.csc_array]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix.
- Returns:
A tuple holding: reduced coef vector, a new list of indices determining where to split the reduced coef vector, list with reduced model matrices, reduced total penalty matrix, updated llk, and penalized llk
- Return type:
tuple[np.ndarray, list[int], list[scp.sparse.csc_array], scp.sparse.csc_array, float, float]
- mssm.src.python.gamm_solvers.identify_drop(H: csc_array, S_scaled: csc_array, method: str = 'QR') tuple[list[int] | None, list[int] | None]
Routine to (approximately) identify the rank of the scaled negative hessian of the penalized likelihood based on a rank revealing QR decomposition or the methods by Foster (1986) and Gotsman & Toledo (2008).
If
method=="QR"
, a rank revealing QR decomposition is performed for the scaled penalized Hessian. The latter has to be transformed to a dense matrix for this. This is essentially the approach by Wood et al. (2016) and is the most accurate. Alternatively, we can rely on a variant of Foster’s method. This is done whenmethod=="LU"
ormethod=="Direct"
.method=="LU"
requiresp
LU decompositions - wherep
is approximately the Kernel size of the matrix. Essentially continues to find vectors forming a basis of the Kernel of the balanced penalzied Hessian from the upper matrix of the LU decomposition and successively drops columns corresponding to the maximum absolute value of the Kernel vectors (see Foster, 1986). This is repeated until we can form a cholesky of the scaled penalized hessian which as an acceptable condition number. Ifmethod=="Direct"
, the same procedure is completed, but Kernel vectors are found directly based on the balanced penalized Hessian, which can be less precise.References: - Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models. - Foster (1986). Rank and null space calculations using matrix decomposition without column interchanges. - Gotsman & Toledo (2008). On the Computation of Null Spaces of Sparse Rectangular Matrices. - mgcv source code, in particular: https://github.com/cran/mgcv/blob/master/R/gam.fit4.r
- Parameters:
H (scp.sparse.csc_array) – Estimate of the hessian of the log-likelihood.
S_scaled (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
method (str, optional) – Which method to use to check for rank deficiency, defaults to ‘QR’
- Returns:
A tuple containing lists of the coefficients to keep and to drop, both of which are None when we don’t need to drop any.
- Return type:
tuple[list[int]|None,list[int]|None]
- mssm.src.python.gamm_solvers.init_step_gam(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, family: Family, col_S: int, penalties: list[LambdaTerm], pinv: str, n_c: int, formula: Formula, form_Linv: bool, method: str, offset: float | ndarray, Lrhoi: csc_array | None) tuple[float, float, ndarray, ndarray, ndarray, csc_array, csc_array, float, list[float], float, ndarray, ndarray, csc_array]
Internal function. Gets initial estimates for a GAM model for coefficients and proposes first lambda update.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of model
col_S (int) – Cols of penalty matrix
penalties (list[LambdaTerm]) – List of penalties
pinv (str) – Method to use to compute generalzied inverse of total penalty, set to ‘svd’!
n_c (int) – Number of cores to use
formula (Formula) – Formula of the model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
method (str) – Which method to use to solve for the coefficients (“Chol” or “Qr”)
offset (float | np.ndarray) – Offset (fixed effect) to add to
eta
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
- Returns:
A tuple containing the deviance
dev
, penalized deviancepen_dev
,eta, mu, coef, CholXXS, InvCholXXS, total_edf, term_edfs, scale, wres, lam_delta, S_emb- Return type:
tuple[float, float, np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, np.ndarray, np.ndarray, scp.sparse.csc_array]
- mssm.src.python.gamm_solvers.initialize_extension(method: str, penalties: list[LambdaTerm]) dict
Internal function. Initializes a dictionary holding all the necessary information to compute the lambda extensions at every iteration of the fitting iteration.
- Parameters:
method (str) – Which extension method to use
penalties (list[LambdaTerm]) – List of penalties
- Returns:
extension info dictionary
- Return type:
dict
- mssm.src.python.gamm_solvers.keep_XTX(cov_flat: ndarray, y_flat: ndarray, formula: Formula, nc: int, progress_bar: bool) tuple[csc_array, ndarray]
Computes X.T@X and X.T@y in blocks.
- Parameters:
cov_flat (np.ndarray) – Encoded data as np.array
y_flat (np.ndarray) – vector of observations
formula (Formula) – Formula of model
nc (int) – Number of cores to use
progress_bar (bool) – Whether to print progress or not
- Returns:
X.T@X, X.T@y
- Return type:
tuple[scp.sparse.csc_array,np.ndarray]
- mssm.src.python.gamm_solvers.keep_eta(formula: Formula, coef: ndarray, nc: int) ndarray
Computes
X@coef
in parallel, whereX
is the overall model matrix andcoef
is current coefficient estimate.
- mssm.src.python.gamm_solvers.newton_coef_smooth(coef: ndarray, grad: ndarray, H: csc_array, S_emb: csc_array) tuple[ndarray, csc_array, csc_array, float]
Follows sections 3.1.2 and 3.14 in Wood, Pya, & Säfken (2016) to update the coefficients of a GAMLSS/GSMM model via a newton step.
Computes gradient of the penalized likelihood (grad - S_emb@coef)
Computes negative Hessian of the penalized likelihood (-1*H + S_emb) and it’s inverse.
Uses these two to compute the Netwon step.
Step size control - happens outside
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
mgcv source code, in particular: https://github.com/cran/mgcv/blob/master/R/gam.fit4.r
- Parameters:
coef (np.ndarray) – Current coefficient estimate
grad (np.ndarray) – gradient of llk with respect to coef
H (scp.sparse.csc_array) – hessian of the llk
S_emb (scp.sparse.csc_array) – Total penalty matrix
- Returns:
A tuple containing an estimate of the coefficients, the un-pivoted cholesky of the penalized negative hessian, the inverse of the former, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible
- Return type:
tuple[np.ndarray,scp.sparse.csc_array,scp.sparse.csc_array,float]
- mssm.src.python.gamm_solvers.read_XTX(file: str, formula: Formula, nc: int) tuple[csc_array, ndarray, int]
Computes X.T@X and X.T@y for this file in parallel, reading data from file.
- Parameters:
file (str) – File name
formula (Formula) – Formula of model
nc (int) – Number of cores to use
- Returns:
X.T@X, X.T@y
- Return type:
tuple[scp.sparse.csc_array,np.ndarray,int]
- mssm.src.python.gamm_solvers.read_eta(file, formula: Formula, coef: ndarray, nc: int) ndarray
Computes
X@coef
in parallel, whereX
is the model matrix based on thisfile
andcoef
is the current coefficient estimate.
- mssm.src.python.gamm_solvers.read_mmat(should_cache: bool, cache_dir: str, file: str, fi: int, terms: list[GammTerm], has_intercept: bool, ltx: list[int], irstx: list[int], stx: list[int], rtx: list[int], var_types: dict, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, cov_flat_file: ndarray, cov: list[ndarray]) csc_array
Creates model matrix for that dataset. The model-matrix is either cached or not. If the former is the case, the matrix is read in on subsequent calls to this function.
- Parameters:
should_cache (bool) – whether or not the directory should actually be created
cache_dir (str) – path to cache directory
file (str) – File name
fi (int) – File index in all files
terms (list[GammTerm]) – List of terms in model formula
has_intercept (bool) – Whether the formula has an intercept or not
ltx (list[int]) – Linear term indices
irstx (list[int]) – Impulse response function term indices
stx (list[int]) – Smooth term indices
rtx (list[int]) – Random term indices
var_types (dict) – Dictionary holding variable types
var_map (dict) – Dictionary mapping variable names to column indices in the encoded data
var_mins (dict) – Dictionary with variable minimums
var_maxs (dict) – Dictionary with variable maximums
factor_levels (dict) – Dictionary with levels associated with each factor
cov_flat_file (np.ndarray) – Encoded data based on
file
cov (list[np.ndarray]) – Essentially
[cov_flat_file]
- Returns:
model matrix associated with this file
- Return type:
scp.sparse.csc_array
- mssm.src.python.gamm_solvers.restart_coef(coef: ndarray, c_llk: float, c_pen_llk: float, n_coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array], S_emb: csc_array, family: GSMMFamily, outer: int, restart_counter: int) tuple[ndarray, float, float]
Shrink coef towards random vector to restart algorithm if it get’s stuck.
- Parameters:
coef (np.ndarray) – Coefficient estimate
c_llk (float) – Current llk
c_pen_llk (float) – Current penalized llk
n_coef (np.ndarray) – Number of coefficients
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
ys (list[np.ndarray]) – List of observation vectors
Xs (list[scp.sparse.csc_array]) – List of model matrices
S_emb (scp.sparse.csc_array) – Total penalty matrix
family (GSMMFamily) – Model family
outer (int) – Outer iteration index
restart_counter (int) – Number of restarts already handled previously
- Returns:
Updates for coef, c_llk, c_pen_llk
- Return type:
tuple[np.ndarray, float, float]
- mssm.src.python.gamm_solvers.restart_coef_gammlss(coef: ndarray, split_coef: list[ndarray], c_llk: float, c_pen_llk: float, etas: list[ndarray], mus: list[ndarray], n_coef: int, coef_split_idx: list[int], y: ndarray, Xs: list[csc_array], S_emb: csc_array, family: GAMLSSFamily, outer: int, restart_counter: int) tuple[ndarray, list[ndarray], float, float, list[ndarray], list[ndarray]]
Shrink coef towards random vector to restart algorithm if it get’s stuck.
- Parameters:
coef (np.ndarray) – Coefficient estimate
split_coef (list[np.ndarray]) – Split of coefficient estimate
c_llk (float) – Current llk
c_pen_llk (float) – Current penalized llk
etas (list[np.ndarray]) – List of linear predictors
mus (list[np.ndarray]) – List of estimated means
n_coef (int) – Number of coefficients
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
family (GAMLSSFamily) – Model family
outer (int) – Outer iteration index
restart_counter (int) – Number of restarts already handled previously
- Returns:
Updates for coef, split_coef, c_llk, c_pen_llk, etas, mus
- Return type:
tuple[np.ndarray, list[np.ndarray], float, float, list[np.ndarray], list[np.ndarray]]
- mssm.src.python.gamm_solvers.solve_gamm_sparse(mu_init: ndarray, y: ndarray, X: csc_array, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, max_inner: int = 100, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, method: str = 'Chol', check_cond: int = 2, progress_bar: bool = False, n_c: int = 10, offset: int = 0, Lrhoi: csc_array | None = None) tuple[ndarray, ndarray, ndarray, csc_array, csc_array, float, csc_array, float, list[float], float, Fit_info]
Estimates a Generalized Additive Mixed model. Implements the algorithms discussed in section 3.2 of the paper by Krause et al. (submitted).
Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017).
- References:
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
mu_init (np.ndarray) – Initial values for means
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Model matrix
penalties (list[LambdaTerm]) – List of penalties
col_S (int) – Columns of total penalty matrix
family (Family) – Family of model
maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10
max_inner (int, optional) – Maximum number of iterations for inner algorithm updating coefficients, defaults to 100
pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str, optional) – _description_, defaults to “nesterov”
form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True
method (str, optional) – Which method to use to solve for the coefficients (“Chol” or “Qr”), defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
). Whencheck_cond=2
, an estimate of the condition number will be performed for each new system (at each iteration of the algorithm) and an error will be raised if the condition number is estimated as too high given the chosenmethod
., defaults to 2progress_bar (bool, optional) – Whether to print progress or not, defaults to False
n_c (int, optional) – Number of cores to use, defaults to 10
offset (int, optional) – Offset (fixed effect) to add to
eta
, defaults to 0Lrhoi (scp.sparse.csc_array | None, optional) – Optional covariance matrix of an ar1 model, defaults to None
- Raises:
ArithmeticError – _description_
ArithmeticError – _description_
ArithmeticError – _description_
ArithmeticError – _description_
warnings.warn – _description_
- Returns:
An estimate of the coefficients coef,the linear predictor eta, the working residuals wres, the root of the Fisher weights as matrix Wr, the matrix with Newton weights at convergence WN, an estimate of the scale parameter, an inverse of the cholesky of the penalized negative hessian InvCholXXS, total edf, term-wise edf, total penalty, a
Fit_info
object- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, scp.sparse.csc_array, float, list[float], float, Fit_info]
- mssm.src.python.gamm_solvers.solve_gamm_sparse2(formula: Formula, penalties: list[LambdaTerm], col_S: int, family: Family, maxiter: int = 10, pinv: str = 'svd', conv_tol: float = 1e-07, extend_lambda: bool = False, control_lambda: int = 1, exclude_lambda: bool = False, extension_method_lam: str = 'nesterov', form_Linv: bool = True, progress_bar: bool = False, n_c: int = 10) tuple[ndarray, ndarray, ndarray, csc_array, float, csc_array | None, float, list[float], float, Fit_info]
Estimates an Additive Mixed model. Implements the algorithms discussed in section 3.1 of the paper by Krause et al. (submitted).
Relies on methods proposed by Wood et al. (2017), Wood & Fasiolo (2017), Wood (2011), and Wood (2017). In addition, this function builds the products involving the model matrix only once (iteratively) as described by Wood et al. (2015).
- References:
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data. https://doi.org/10.1080/01621459.2016.1195744
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models: Estimation of Semiparametric Generalized Linear Models. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N., Goude, Y., & Shaw, S. (2015). Generalized additive models for large data sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 64(1), 139–155. https://doi.org/10.1111/rssc.12068
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
formula (Formula) – Formula of the model
penalties (list[LambdaTerm]) – List of penalties
col_S (int) – Columns of total penalty matrix
family (Family) – Family of model
maxiter (int, optional) – Maximum number of iterations for outer algorithm updating lambda, defaults to 10
pinv (str, optional) – Method to use to compute generalzied inverse of total penalty,, defaults to “svd”
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary. Disabled by default.
control_lambda (int) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML. Set to 1 by default.
exclude_lambda (bool) – Whether selective lambda terms should be excluded heuristically from updates. Can make each iteration a bit cheaper but is problematic when using additional Kernel penalties on terms. Thus, disabled by default.
extension_method_lam (str, optional) – Which method to use to extend lambda proposals., defaults to “nesterov”
form_Linv (bool, optional) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not, defaults to True
progress_bar (bool, optional) – Whether to print progress or not, defaults to False
n_c (int, optional) – Number of cores to use, defaults to 10
- Returns:
An estimate of the coefficients coef, the linear predictor eta, the working residuals wres, the negative hessian, the estimated scale, an inverse of the cholesky of the negative penalized hessian, total edf, term-wise edfs, total penalty, a
Fit_info
object- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array, float, scp.sparse.csc_array|None, float, list[float], float, Fit_info]
- mssm.src.python.gamm_solvers.solve_gammlss_sparse(family: GAMLSSFamily, y: ndarray, Xs: list[csc_array], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], gamlss_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 30, min_inner: int = 1, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10) tuple[ndarray, list[ndarray], list[ndarray], ndarray, csc_array, csc_array, float, list[float], float, list[LambdaTerm], Fit_info]
Fits a GAMLSS model - essentially completes the steps discussed in section 3.3 of the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016)
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GAMLSSFamily) – Model family
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
gamlss_pen (list[LambdaTerm]) – List of penalties
max_outer (int, optional) – Maximum number of outer iterations, defaults to 50
max_inner (int, optional) – Maximum number of inner iterations, defaults to 30
min_inner (int, optional) – Minimum number of inner iterations, defaults to 1
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True
extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”
control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. Setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded. Setting it to 2 means that steps will be halved if it fails to increase the approximate REML., defaults to 1
method (str, optional) – Method to use to estimate coefficients, defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
)., defaults to 1piv_tol (float, optional) – Deprecated, defaults to 0.175
repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True
should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True
prefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients., defaults to False
progress_bar (bool, optional) – Whether progress should be displayed, defaults to True
n_c (int, optional) – Number of cores to use, defaults to 10
- Returns:
coef estimate, etas, mus, working residuals, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, total edf, term-wise edfs, total penalty, final list of penalties, a
Fit_info
object- Return type:
tuple[np.ndarray, list[np.ndarray], list[np.ndarray], np.ndarray, scp.sparse.csc_array, scp.sparse.csc_array, float, list[float], float, list[LambdaTerm], Fit_info]
- mssm.src.python.gamm_solvers.solve_generalSmooth_sparse(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], form_n_coef: list[int], form_up_coef: list[int], coef: ndarray, coef_split_idx: list[int], smooth_pen: list[LambdaTerm], max_outer: int = 50, max_inner: int = 50, min_inner: int = 50, conv_tol: float = 1e-07, extend_lambda: bool = True, extension_method_lam: str = 'nesterov2', control_lambda: int = 1, optimizer: str = 'Newton', method: str = 'Chol', check_cond: int = 1, piv_tol: float = 0.175, repara: bool = True, should_keep_drop: bool = True, form_VH: bool = True, use_grad: bool = False, gamma: float = 1, qEFSH: str = 'SR1', overwrite_coef: bool = True, max_restarts: int = 0, qEFS_init_converge: bool = True, prefit_grad: bool = False, progress_bar: bool = True, n_c: int = 10, init_bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}, bfgs_options: dict = {'ftol': 1e-09, 'gtol': 1e-09, 'maxcor': 30, 'maxfun': 10000000.0, 'maxls': 100}) tuple[ndarray, csc_array | None, csc_array | LinearOperator, LinearOperator | None, float, list[float], float, list[LambdaTerm], Fit_info]
Fits a general smooth model. Essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016). An even more general version of :func:
solve_gammlss_sparse
that can use the L-qEFS update by Krause et al. (submitted) to estimate the coefficients and lambda parameters. The update requires only a function to compute the log-likelihood and a function to compute the gradient of said likelihood with respect to the coefficients. Alternatively full Newton can be used - requiring a function to compute the hessian as well.References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GSMMFamily) – Model family
ys (list[np.ndarray]) – List of observation vectors
Xs (list[scp.sparse.csc_array]) – List of model matrices
form_n_coef (list[int]) – List of number of coefficients per formula
form_up_coef (list[int]) – List of un-penalized number of coefficients per formula
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
smooth_pen (list[LambdaTerm]) – List of penalties
max_outer (int, optional) – Maximum number of outer iterations, defaults to 50
max_inner (int, optional) – Maximum number of inner iterations, defaults to 50
min_inner (int, optional) – Minimum number of inner iterations, defaults to 50
conv_tol (float, optional) – Convergence tolerance, defaults to 1e-7
extend_lambda (bool, optional) – Whether lambda proposals should be accelerated or not. Can lower the number of new smoothing penalty proposals necessary, defaults to True
extension_method_lam (str, optional) – Which method to use to extend lambda proposals, defaults to “nesterov2”
control_lambda (int, optional) – Whether lambda proposals should be checked (and if necessary decreased) for whether or not they (approxiately) increase the Laplace approximate restricted maximum likelihood of the model. For
method != 'qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the step will never be smaller than the original EFS update but extensions will be removed in case the objective was exceeded (only has an effect when settingextend_lambda=True
). Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion. Formethod=='qEFS'
the following options are available: setting this to 0 disables control. Setting it to 1 means the check described by Krause et al. (submitted) will be performed to control updates to lambda. Setting it to 2 means that steps will generally be halved when they fail to increase the aproximate REML criterion (note, that the gradient is based on quasi-newton approximations as well and thus less accurate). Setting it to 3 means both checks (i.e., 1 and 2) are performed, defaults to 1optimizer (str, optional) – Deprecated, defaults to “Newton”
method (str, optional) – Which method to use to estimate the coefficients (and lambda parameters), defaults to “Chol”
check_cond (int, optional) – Whether to obtain an estimate of the condition number for the linear system that is solved. When
check_cond=0
, no check will be performed. Whencheck_cond=1
, an estimate of the condition number for the final system (at convergence) will be computed and warnings will be issued based on the outcome (seemssm.src.python.gamm_solvers.est_condition()
), defaults to 1piv_tol (float, optional) – Deprecated, defaults to 0.175
repara (bool, optional) – Whether to apply a stabilizing re-parameterization to the model, defaults to True
should_keep_drop (bool, optional) – If set to True, any coefficients that are dropped during fitting - are permanently excluded from all subsequent iterations, defaults to True
form_VH (bool, optional) – Whether to explicitly form matrix
V
- the estimated inverse of the negative Hessian of the penalized likelihood - andH
- the estimate of the Hessian of the log-likelihood - when using theqEFS
method, defaults to Trueuse_grad (bool, optional) – Deprecated, defaults to False
gamma (float, optional) – Setting this to a value larger than 1 promotes more complex (less smooth) models. Setting this to a value smaller than 1 (but must be > 0) promotes smoother models, defaults to 1
qEFSH (str, optional) – Should the hessian approximation use a symmetric rank 1 update (
qEFSH='SR1'
) that is forced to result in positive semi-definiteness of the approximation or the standard bfgs update (qEFSH='BFGS'
), defaults to ‘SR1’overwrite_coef (bool, optional) – Whether the initial coefficients passed to the optimization routine should be over-written by the solution obtained for the un-penalized version of the problem when
method='qEFS'
, defaults to Truemax_restarts (int, optional) – How often to shrink the coefficient estimate back to a random vector when convergence is reached and when
method='qEFS'
. The optimizer might get stuck in local minima so it can be helpful to set this to 1-3. What happens is that if we converge, we shrink the coefficients back to a random vector and then continue optimizing once more, defaults to 0qEFS_init_converge (bool, optional) – Whether to optimize the un-penalzied version of the model and to use the hessian (and optionally coefficients, if
overwrite_coef=True
) to initialize the q-EFS solver. Ignored ifmethod!='qEFS'
, defaults to Trueprefit_grad (bool, optional) – Whether to rely on Gradient Descent to improve the initial starting estimate for coefficients, defaults to False
progress_bar (bool, optional) – Whether progress should be printed or not, defaults to True
n_c (int, optional) – Number of cores to use, defaults to 10
init_bfgs_options (_type_, optional) – An optional dictionary holding the same key:value pairs that can be passed to
bfgs_options
but pased to the optimizer of the un-penalized problem, defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}bfgs_options (_type_, optional) – An optional dictionary holding arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
, defaults to {“gtol”:1e-9,”ftol”:1e-9,”maxcor”:30,”maxls”:100,”maxfun”:1e7}
- Returns:
coef estimate, the negative hessian of the log-likelihood, inverse of cholesky of negative hessian of the penalized log-likelihood, if
method=='qEFS'
an instance ofscp.sparse.linalg.LinearOperator
representing the new quasi-newton approximation, total edf, term-wise edfs, total penalty, final list of penalties, aFit_info
object- Return type:
tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, scp.sparse.linalg.LinearOperator|None, float, list[float], float, list[LambdaTerm], Fit_info]
- mssm.src.python.gamm_solvers.step_fellner_schall_sparse(lgdet_deriv: float, ldet_deriv: float, bSb: float, cLam: float, scale: float) float
Internal function. Compute a generalized Fellner Schall update step for a lambda term. This update rule is discussed in Wood & Fasiolo (2017).
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
lgdet_deriv (float) – Derivative of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambda.
ldet_deriv (float) – Derivative of \(log(|\mathbf{H} + S_\lambda|)\) (\(\mathbf{X}\) is negative hessian of penalized llk) with respect to lambda.
bSb (float) –
cCoef.T@emb_SJ@cCoef
wherecCoef
is current coefficient estimatecLam (float) – Current lambda value
scale (float) – Optional scale parameter (or 1)
- Returns:
The additive update to
cLam
- Return type:
float
- mssm.src.python.gamm_solvers.test_SR1(sk: ndarray, yk: ndarray, rho: ndarray, sks: ndarray, yks: ndarray, rhos: ndarray) bool
Test whether SR1 update is well-defined for both V and H.
Relies on steps discussed by Byrd, Nocdeal & Schnabel (1992).
- References:
Byrd, R. H., Nocedal, J., & Schnabel, R. B. (1994). Representations of quasi-Newton matrices and their use in limited memory methods. Mathematical Programming, 63(1), 129–156. https://doi.org/10.1007/BF01582063
- Parameters:
sk (np.ndarray) – New update vector sk
yk (np.ndarray) – New update vector yk
rho (np.ndarray) – New rho
sks (np.ndarray) – Previous update vectors sk
yks (np.ndarray) – Previous update vector sks
rhos (np.ndarray) – Previous rhos
- Returns:
Check whether SR1 update is well-defined for both V and H.
- Return type:
bool
- mssm.src.python.gamm_solvers.undo_extension_lambda_step(lti: int, lam: float, dLam: float, extend_by: dict, was_extended: list[bool], method: str, family: Family) tuple[float, float]
Internal function. Deals with resetting any extension terms.
- Parameters:
lti (int) – Penalty index
lam (float) – Current lamda value
dLam (float) – The lambda update
extend_by (dict) – Extension info dictionary
was_extended (bool) – List holding indication per lambda parameter whether it was extended or not
method (str) – Extension method to use.
family (Family) – model family
- Raises:
ValueError – If requested method is not implemented
- Returns:
Updated values for lam and dlam
- Return type:
tuple[float,float]
- mssm.src.python.gamm_solvers.update_PIRLS(y: ndarray, yb: ndarray, mu: ndarray, eta: ndarray, X: csc_array, Xb: csc_array, family: Family, Lrhoi: csc_array | None) tuple[ndarray, csc_array, ndarray | None, csc_array | None]
Internal function. Updates the pseudo-weights and observation vector
yb
and model matrixXb
of the working model.Note: Dimensions of
yb
andXb
might not match those ofy
andX
since rows of invalid pseudo-data observations are dropped here.- Parameters:
y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
mu (np.ndarray) – vector of mean estimates
eta (np.ndarray) – vector of linear predictors
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of model
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
- Returns:
Updated observation vector
yb
and model matrixXb
of the working model, pseudo-weights, and a diagonal sparse matrix holding the root of the Fisher weights. Latter two are None for strictly additive models.- Return type:
tuple[np.ndarray,scp.sparse.csc_array,np.ndarray|None,scp.sparse.csc_array|None]
- mssm.src.python.gamm_solvers.update_coef(yb: ndarray, X: csc_array, Xb: csc_array, family: Family, S_emb: csc_array, S_root: csc_array | None, n_c: int, formula: Formula | None, offset: float | ndarray) tuple[ndarray, ndarray, ndarray, list[int], csc_array, csc_array, list[int] | None, list[int] | None]
Internal function. Estimates the coefficients of the model and updates the linear predictor and mean estimates.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
yb (np.ndarray) – vector of observations of the working model
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
family (Family) – Family of Model
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None
n_c (int) – Number of cores
formula (Formula | None) – Formula of model or None
offset (float | np.ndarray) – Offset (fixed effect) to add to
eta
- Returns:
A tuple containing the linear predictor
eta
, the estimated meansmu
, the estimated coefficients, the column permutation indicesPr
, the column permutation matirxP
, the cholesky of the pivoted penalized negative hessian, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, list[int], scp.sparse.csc_array, scp.sparse.csc_array, list[int]|None, list[int]|None]
- mssm.src.python.gamm_solvers.update_coef_and_scale(y: ndarray, yb: ndarray, z: ndarray, Wr: csc_array, rowsX: int, colsX: int, X: csc_array, Xb: csc_array, Lrhoi: csc_array | None, family, S_emb: csc_array, S_root: csc_array | None, S_pinv: csc_array, FS_use_rank: list[bool], penalties: list[LambdaTerm], n_c: int, formula: Formula, form_Linv: bool, offset: float | ndarray) tuple[ndarray, ndarray, ndarray, csc_array | None, list[float], list[float], float, list[float], list[csc_array], float, ndarray, list[int] | None, list[int] | None]
Internal function to update the coefficients and (optionally) scale parameter of the model.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – vector of observations
yb (np.ndarray) – vector of observations of the working model
z (np.ndarray) – vector of pseudo-data (can contain NaNs for invalid observations)
Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
X (scp.sparse.csc_array) – Model matrix
Xb (scp.sparse.csc_array) – Model matrix of working model
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Family of model
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_root (scp.sparse.csc_array | None) – Root of total penalty matrix or None
S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
penalties (list[LambdaTerm]) – List of penalties
n_c (int) – Number of cores
formula (Formula) – Formula of the model
form_Linv (bool) – Whether to form the inverse of the cholesky of the negative penalzied hessian or not
offset (float | np.ndarray) – Offset (fixed effect) to add to
eta
- Returns:
A tuple containing the linear predictor
eta
, the estimated meansmu
, the estimated coefficients, the unpivoted cholesky of the penalized negative hessian, the inverse of the former (optional), derivative of \(log(|\mathbf{S}_\lambda|_+)\) with respect to lambdas, cCoef.T@emb_SJ@cCoef for each SJ, total edf, termwise edf, Bs, scale estimate, working residuals, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, scp.sparse.csc_array|None, list[float], list[float], float, list[float], list[scp.sparse.csc_array], float, np.ndarray, list[int]|None, list[int]|None]
- mssm.src.python.gamm_solvers.update_coef_gammlss(family: GAMLSSFamily, mus: list[ndarray], y: ndarray, Xs, coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array, FS_use_rank: list[bool], gammlss_penalties: list[LambdaTerm], c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None) tuple[ndarray, list[ndarray], list[ndarray], list[ndarray], csc_array, csc_array, csc_array, float, float, float, list[int] | None, list[int] | None]
Repeatedly perform Newton update with step length control to the coefficient vector - essentially implements algorithm 3 from the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016). Checks for rank deficiency when
method != "Chol"
.- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GAMLSSFamily) – Family of model
mus (list[np.ndarray]) – List of estimated means
y (np.ndarray) – Vector of observations
Xs ([scp.sparse.csc_array]) – List of model matrices - one per parameter of response distribution
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – List with indices to split coef - one per parameter of response distribution
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_emb – Total penalty matrix - normalized/scaled for rank checks
S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
gammlss_penalties (list[LambdaTerm]) – List of penalties
c_llk (float) – Current llk
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None
- Returns:
A tuple containing an estimate of all coefficients, a split version of the former, updated values for mus, etas, the negative hessian of the log-likelihood, cholesky of negative hessian of the penalized log-likelihood, inverse of the former, new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop
- Return type:
tuple[np.ndarray, list[np.ndarray], list[np.ndarray], list[np.ndarray], scp.sparse.csc_array, scp.sparse.csc_array, scp.sparse.csc_array, float, float, float, list[int] | None, list[int] | None]
- mssm.src.python.gamm_solvers.update_coef_gen_smooth(family: GSMMFamily, ys: list[ndarray], Xs: list[csc_array], coef: ndarray, coef_split_idx: list[int], S_emb: csc_array, S_norm: csc_array, S_pinv: csc_array, FS_use_rank: list[bool], smooth_pen: list[LambdaTerm], c_llk: float, outer: int, max_inner: int, min_inner: int, conv_tol: float, method: str, piv_tol: float, keep_drop: list[list[int], list[int]] | None, opt_raw: LinearOperator | None) tuple[ndarray, csc_array | None, csc_array | None, csc_array | LinearOperator, float, float, float, list[int] | None, list[int] | None]
Repeatedly perform Newton/Gradient/L-qEFS update with step length control to the coefficient vector - essentially completes the steps discussed in sections 3.3 and 4 of the paper by Krause et al. (submitted).
Based on steps outlined by Wood, Pya, & Säfken (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Krause et al. (submitted). The Mixed-Sparse-Smooth-Model Toolbox (MSSM): Efficient Estimation and Selection of Large Multi-Level Statistical Models. https://doi.org/10.48550/arXiv.2506.13132
- Parameters:
family (GSMMFamily) – Model family
ys (list[np.ndarray]) – List of observation vectors
Xs (list[scp.sparse.csc_array]) – List of model matrices
coef (np.ndarray) – Coefficient estimate
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter of llk.
S_emb (scp.sparse.csc_array) – Total penalty matrix
S_norm (scp.sparse.csc_array) – Scaled version of the penalty matrix (i.e., unweighted total penalty divided by it’s norm).
S_pinv (scp.sparse.csc_array) – Generalized inverse of total penalty matrix
FS_use_rank (list[bool]) – A list of bools indicating for which EFS updates the rank rather than the generalized inverse should be used
smooth_pen (list[LambdaTerm]) – List of penalties
c_llk (float) – Current llk
outer (int) – Index of outer iteration
max_inner (int) – Maximum number of inner iterations
min_inner (int) – Minimum number of inner iterations
conv_tol (float) – Convergence tolerance
method (str) – Method to use to estimate coefficients
piv_tol (float) – Deprecated
keep_drop (list[list[int],list[int]] | None) – Set of previously dropped coeeficients or None
opt_raw (scp.sparse.linalg.LinearOperator | None) – If the L-qEFS update is used to estimate coefficients/lambda parameters, then this is the previous state of the quasi-Newton approximations to the (inverse) of the hessian of the log-likelihood
- Returns:
A tuple containing an estimate of all coefficients, the negative hessian of the log-likelihood,cholesky of negative hessian of the penalized log-likelihood,inverse of the former (or another instance of
scp.sparse.linalg.LinearOperator
representing the new quasi-newton approximation), new llk, new penalized llk, the multiple (float) added to the diagonal of the negative penalized hessian to make it invertible, an optional list of the coefficients to keep, an optional list of the estimated coefficients to drop- Return type:
tuple[np.ndarray, scp.sparse.csc_array|None, scp.sparse.csc_array|None, scp.sparse.csc_array|scp.sparse.linalg.LinearOperator, float, float, float, list[int]|None, list[int]|None]
- mssm.src.python.gamm_solvers.update_scale_edf(y: ndarray, z: ndarray, eta: ndarray, Wr: csc_array, rowsX: int, colsX: int, LP: csc_array | None, InvCholXXSP: csc_array | None, Pr: list[int], lgdetDs: list[float], Lrhoi: csc_array | None, family: Family, penalties: list[LambdaTerm], keep: list[int] | None, drop: list[int], n_c: int) tuple[ndarray, csc_array | None, float, list[float], list[csc_array], float]
Internal function. Updates the scale of the model. For this the edf are computed as well - they are returned as well because they are needed for the lambda step.
- References:
Wood, S. N., & Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. https://doi.org/10.1111/biom.12666
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
y (np.ndarray) – vector of observations
z (np.ndarray) – vector of pseudo-data (can contain NaNs for invalid observations)
eta (np.ndarray) – vector of linear predictors
Wr (scp.sparse.csc_array) – diagonal sparse matrix holding the root of the Fisher weights
rowsX (int) – Rows of model matrix
colsX (int) – Cols of model matrix
LP (scp.sparse.csc_array | None) – Pivoted Cholesky of negative penalzied hessian or None
InvCholXXSP (scp.sparse.csc_array | None) – Inverse of
LP
, or NonePr (list[int]) – Permutation list of
LP
lgdetDs (list[float]) – List of derivatives of \(log(|\mathbf{S}_\lambda|_+)\), the log of the “Generalized determinant”, with respect to lambdas.
Lrhoi (scp.sparse.csc_array | None) – Optional covariance matrix of an ar1 model
family (Family) – Family of model
penalties (list[LambdaTerm]) – List of penalties
keep (list[int] | None) – List of coefficients to keep, can be None -> keep all
drop (list[int]) – List of coefficients to drop
n_c (int) – Number of cores to use
- Returns:
a tuple containing the working residuals, optionally the unpivoted inverse of
LP
, total edf, term-wise edf, Bs, scale estimate- Return type:
tuple[np.ndarray, scp.sparse.csc_array|None, float, list[float], list[scp.sparse.csc_array], float]
mssm.src.python.matrix_solvers module
- mssm.src.python.matrix_solvers.compute_B(L: csc_array, P: csc_array, lTerm: LambdaTerm, n_c: int = 10, drop: list[int] | None = None) float | tuple[float, float]
Solves
L @ B = P @ lTerm.D_J_emb
forB
, then returnsB.power(2).sum()
or two approximations of this (for very big factor smooth models).- Parameters:
L (scp.sparse.csc_array) – Lower triangular sparse matrix
P (scp.sparse.csc_array) – Permuation matrix
lTerm (LambdaTerm) – Current penalty term
n_c (int, optional) – Number of cores, defaults to 10
drop (list[int] | None, optional) – Any parameters (columns/rows of
lTerm.D_J_emb
) to drop, defaults to None
- Returns:
sum(B.power(2).sum()
orsum(B.power(2).sum()*cluster_weights)
andB.power(2).sum()*len(cluster_weights)
with cluster weights obtained frommssm.src.python.formula.__cluster_discretize()
.- Return type:
float | tuple[float, float]
- mssm.src.python.matrix_solvers.compute_Linv(L: csc_array, n_c: int = 10) csc_array
Solves
L @ inv(L) = I
forinv(L)
optionally parallelizing over column blocks ofI
.- Parameters:
L (scp.sparse.csc_array) – Lower triangular sparse matrix
n_c (int, optional) – Number of cores to use, defaults to 10
- Returns:
inv(L)
- Return type:
scp.sparse.csc_array
Solves
L @ B = T
forB
via forward solving and based on shared memory forL
, then computes and returnsB.power(2).sum()
.- Parameters:
address_dat (str) – Address to data array of
L
address_ptr (str) – Address to pointer array of
L
address_idx (str) – Address to indices array of
L
shape_dat (tuple) – Shape of data array of
L
shape_ptr (tuple) – Shape of pointer array of
L
rows (int) – Number of rows of
L
cols (int) – Number of cols of
L
nnz (int) – Number of non-zero elements in
L
T (scp.sparse.csc_array) – Target matrix
- Returns:
B.power(2).sum()
- Return type:
float
Solves
L @ B = T
forB
via forward solving and based on shared memory forL
, then computes and returnssum(B.power(2).sum()*cluster_weights)
andB.power(2).sum()*len(cluster_weights)
.- Parameters:
address_dat (str) – Address to data array of
L
address_ptr (str) – Address to pointer array of
L
address_idx (str) – Address to indices array of
L
shape_dat (tuple) – Shape of data array of
L
shape_ptr (tuple) – Shape of pointer array of
L
rows (int) – Number of rows of
L
cols (int) – Number of cols of
L
nnz (int) – Number of non-zero elements in
L
T (scp.sparse.csc_array) – Target matrix
cluster_weights (list[float]) – Cluster weights obtained from
mssm.src.python.formula.__cluster_discretize()
.
- Returns:
sum(B.power(2).sum()*cluster_weights)
andB.power(2).sum()*len(cluster_weights)
- Return type:
tuple[float,float]
Solves
L@B = T
whereL
is available in shared memory andT
is a column subset of the identity matrix.- Parameters:
address_dat (str) – Address to data array of
L
address_ptr (str) – Address to pointer array of
L
address_idx (str) – Address to indices array of
L
shape_dat (tuple) – Shape of data array of
L
shape_ptr (tuple) – Shape of pointer array of
L
rows (int) – Number of rows of
L
cols (int) – Number of cols of
L
nnz (int) – Number of non-zero elements in
L
T (scp.sparse.csc_array) – Target matrix
- Returns:
B
- Return type:
scp.sparse.csc_array
- mssm.src.python.matrix_solvers.cpp_backsolve_tr(A: csc_array, C: csc_array) csc_array
Solves
A@B=C
, whereA
is sparse and upper triangular. This can be utilized to obtainB = inv(A)
, whenC
is the identity.- Parameters:
A (scp.sparse.csc_array) – Lower triangluar sparse matrix
C (scp.sparse.csc_array) – Sparse potentially rectangular matrix
- Returns:
B
- Return type:
scp.sparse.csc_array
- mssm.src.python.matrix_solvers.cpp_chol(A: csc_array) tuple[csc_array, int]
Computes Cholesky of
A
.- Parameters:
A (scp.sparse.csc_array) – Some square symmetric matrix
- Returns:
Returns Cholesky and code indicating success
- Return type:
tuple[scp.sparse.csc_array,int]
- mssm.src.python.matrix_solvers.cpp_cholP(A: csc_array) tuple[csc_array, list[int], int]
Computes pivoted Cholesky of
A
.- Parameters:
A (scp.sparse.csc_array) – Some square symmetric matrix
- Returns:
Returns pivoted Cholesky, pivoted column order, and code indicating success
- Return type:
tuple[scp.sparse.csc_array,list[int],int]
- mssm.src.python.matrix_solvers.cpp_dqrr(A: ndarray) tuple[list[int], int]
Computes pivoted QR decomposition of dense matrix
A
.- Parameters:
A (np.ndarray) – Some matrix
- Returns:
column pivot order for rank estimation, estimated rank
- Return type:
tuple[list[int],int]
- mssm.src.python.matrix_solvers.cpp_qr(A: csc_array) tuple[csc_array, csc_array, list[int], int]
Computes pivoted QR decomposition of
A
.- Parameters:
A (scp.sparse.csc_array) – Some matrix
- Returns:
Matrices Q, R, pivoted column order, and code indicating success
- Return type:
tuple[scp.sparse.csc_array,scp.sparse.csc_array,list[int],int]
- mssm.src.python.matrix_solvers.cpp_qrr(A: csc_array) tuple[csc_array, list[int], int, int]
Computes pivoted QR decomposition of
A
and returns rank estimate- Parameters:
A (scp.sparse.csc_array) – Some matrix
- Returns:
Matrices Q, R, pivoted column order, estimated rank, and code indicating success
- Return type:
tuple[scp.sparse.csc_array,list[int],int,int]
- mssm.src.python.matrix_solvers.cpp_solve_L(X: csc_array, S: csc_array) tuple[csc_array, list[int], int]
Solves
(X.T@X + S)@B=I
forB
, where(X.T@X + S)
is sparse, symmetric, and full rank andI
is an identity matrix of suitable dimension via Cholesky decomposition.- Parameters:
X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix
- Returns:
B
(inverse of pivotedX.T@X + S
), list of pivot indices, and code indicating success- Return type:
tuple[scp.sparse.csc_array,list[int],int]
- mssm.src.python.matrix_solvers.cpp_solve_LXX(A: csc_array) tuple[csc_array, list[int], int]
Solves
A@B=I
forB
, whereA
is sparse, symmetric, and full rank andI
is an identity matrix of suitable dimension via Cholesky decomposition.- Parameters:
A (scp.sparse.csc_array) – Some sparse symmetric matrix
- Returns:
B
(inverse of pivotedA
), list of pivot indices, and code indicating success- Return type:
tuple[scp.sparse.csc_array,list[int],int]
- mssm.src.python.matrix_solvers.cpp_solve_am(y: ndarray, X: csc_array, S: csc_array) tuple[csc_array, list[int], ndarray, int]
Solves
(X.T@X + S)@b = X.T@y
forb
via sparse Cholesky decomposition and computes inverse of pivoted Cholesky ofX.T@X + S
.- Parameters:
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix
- Returns:
Inverse of pivoted Cholesky of
X.T@X + S
, column pivot indices in a list,b
, and code indicating success- Return type:
tuple[scp.sparse.csc_array,list[int],np.ndarray,int]
- mssm.src.python.matrix_solvers.cpp_solve_coef(y: ndarray, X: csc_array, S: csc_array) tuple[csc_array, list[int], ndarray, int]
Solves
(X.T@X + S)@b = X.T@y
forb
via sparse Cholesky decomposition.- Parameters:
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
S (scp.sparse.csc_array) – Sparse square matrix
- Returns:
Pivoted Cholesky of
X.T@X + S
, column pivot indices in a list,b
, and code indicating success- Return type:
tuple[scp.sparse.csc_array,list[int],np.ndarray,int]
- mssm.src.python.matrix_solvers.cpp_solve_coefXX(Xy: ndarray, XXS: csc_array) tuple[csc_array, list[int], ndarray, int]
Solves
(X.T@X + S)@b = X.T@y
forb
via sparse Cholesky decomposition with(X.T@X + S)
andX.T@y
pre-computed.- Parameters:
Xy (np.ndarray) – Holds
X.T@y
XXS (scp.sparse.csc_array) – Holds
(X.T@X + S)
- Returns:
Pivoted Cholesky of
X.T@X + S
, column pivot indices in a list,b
, and code indicating success- Return type:
tuple[scp.sparse.csc_array,list[int],np.ndarray,int]
- mssm.src.python.matrix_solvers.cpp_solve_coef_pqr(y: ndarray, X: csc_array, E: csc_array) tuple[csc_array, list[int], list[int], ndarray, int, int]
Solves
(X.T@X + S)@b = X.T@y
forb
via sparse QR decomposition, whereE.T@E=S
.Does not form ``X.T@X + S`` for solve. Potentially pivots twice - once for sparsity (always) and then once more whenever algorithm detects a diagonal element that is too small.
Examples:
# Solve RP,Pr1,Pr2,coef,rank,code = cpp_solve_coef_pqr(yb,Xb,S_root.T.tocsc()) # Need to get overall pivot... P1 = compute_eigen_perm(Pr1) P2 = compute_eigen_perm(Pr2) P = P2.T@P1.T # Need to insert zeroes in case of rank deficiency - first insert nans to that we # can then easily find dropped coefs. if rank < S_emb.shape[1]: coef = np.concatenate((coef,[np.nan for _ in range(S_emb.shape[1]-rank)])) # Can now unpivot coef coef = coef @ P # And identify which coef was dropped idx = np.arange(len(coef)) drop = idx[np.isnan(coef)] keep = idx[np.isnan(coef)==False] # Now actually set dropped ones to zero coef[drop] = 0 # Convert R so that rest of code can just continue as with Chol (i.e., L) LP = RP.T.tocsc() # Keep only columns of Pr/P that belong to identifiable params. So P.T@LP is Cholesky of negative penalized Hessian # of model without unidentifiable coef. Important: LP and Pr/P no longer match dimensions of embedded penalties # after this! So we need to keep track of that in the appropriate functions (i.e., `calculate_edf` which calls # `compute_B` when called with only LP and not Linv). P = P[:,keep] _,Pr,_ = translate_sparse(P.tocsc()) P = compute_eigen_perm(Pr)
- Parameters:
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Some rectangular sparse matrix
E (scp.sparse.csc_array) – Sparse square matrix
- Returns:
Pivoted Cholesky of
X.T@X + S
, first column pivot indices in a list, second column pivot indices in a list,b
, estimated rank, and code indicating success.- Return type:
tuple[scp.sparse.csc_array,list[int],list[int],np.ndarray,int,int]
- mssm.src.python.matrix_solvers.cpp_solve_qr(A: csc_array) tuple[csc_array, int, int]
Solves
A@B=I
forB
, whereA
is sparse, square, and full rank andI
is an identity matrix of suitable dimension via QR decomposition.- Parameters:
A (scp.sparse.csc_array) – Some sparse square matrix
- Returns:
B
(inverse ofA
), estimated rank, and code indicating success- Return type:
tuple[scp.sparse.csc_array,int,int]
- mssm.src.python.matrix_solvers.cpp_solve_tr(A: csc_array, C: csc_array) csc_array
Solves
A@B=C
, whereA
is sparse and lower triangular. This can be utilized to obtainB = inv(A)
, whenC
is the identity.- Parameters:
A (scp.sparse.csc_array) – Lower triangluar sparse matrix
C (scp.sparse.csc_array) – Sparse potentially rectangular matrix
- Returns:
B
- Return type:
scp.sparse.csc_array
- mssm.src.python.matrix_solvers.cpp_symqr(A: csc_array, tol: float) tuple[csc_array, list[int], list[int], int, int]
Computes pivoted QR decomposition of symmetric matrix
A
.- Parameters:
A (scp.sparse.csc_array) – Some symmetric matrix
tol (float) – tolerance for rank estimation
- Returns:
Matrix R, column pivot order for sparsity, column pivot order for rank estimation, rank estimate, code indicating success
- Return type:
tuple[scp.sparse.csc_array,list[int],list[int],int,int]
- mssm.src.python.matrix_solvers.est_condition(L: csc_array, Linv: csc_array, seed: int | None = 0, verbose: bool = True) tuple[float, float, float, int]
Estimate the condition number
K
- the ratio of the largest to smallest singular values - of matrixA
, whereA.T@A = L@L.T
.L
andLinv
can either be obtained by Cholesky decomposition, i.e.,A.T@A = L@L.T
or by QR decompositionA=Q@R
whereR=L.T
.If
verbose=True
(default), separate warnings will be issued in caseK>(1/(0.5*sqrt(epsilon)))
andK>(1/(0.5*epsilon))
. If the former warning is raised, this indicates that computingL
via a Cholesky decomposition is likely unstable and should be avoided. If the second warning is raised as well, obtainingL
via QR decomposition (ofA
) is also likely to be unstable (see Golub & Van Loan, 2013).- References:
Cline et al. (1979). An Estimate for the Condition Number of a Matrix.
Golub & Van Loan (2013). Matrix computations, 4th edition.
- Parameters:
L (scp.sparse.csc_array) – Cholesky or any other root of
A.T@A
as a sparse matrix.Linv (scp.sparse.csc_array) – Inverse of Choleksy (or any other root) of
A.T@A
.seed (int or None or numpy.random.Generator) – The seed to use for the random parts of the singular value decomposition. Defaults to 0.
verbose (bool) – Whether or not warnings should be printed. Defaults to True.
- Returns:
A tuple, containing the estimate of condition number
K
, an estimate of the largest singular value ofA
, an estimate of the smallest singular value ofA
, and acode
. The latter will be zero in case no warning was raised, 1 in case the first warning described above was raised, and 2 if the second warning was raised as well.- Return type:
tuple[float,float,float,int]
- mssm.src.python.matrix_solvers.map_csc_to_eigen(X: csc_array) tuple[int, int, int, ndarray, ndarray, ndarray]
Pybind11 comes with copy overhead for sparse matrices, so instead of passing the sparse matrix to c++, I pass the data, indices, and indptr arrays as buffers to c++. see: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html.
An Eigen mapping can then be used to refer to these, without requiring an extra copy. see: https://eigen.tuxfamily.org/dox/classEigen_1_1Map_3_01SparseMatrixType_01_4.html
The mapping needs to assume compressed storage, since then we can use the indices, indptr, and data arrays directly for the valuepointer, innerPointer, and outerPointer fields of the sparse array map constructor. see: https://eigen.tuxfamily.org/dox/group__TutorialSparse.html (section sparse matrix format).
I got this idea from the NumpyEigen project, which also uses such a map! see: https://github.com/fwilliams/numpyeigen/blob/master/src/npe_sparse_array.h#L74
- Parameters:
X (scp.sparse.csc_array) – Some sparse matrix
- Returns:
Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)
- Return type:
tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]
- mssm.src.python.matrix_solvers.map_csr_to_eigen(X: csr_array) tuple[int, int, int, ndarray, ndarray, ndarray]
see:
map_csc_to_eigen()
- Parameters:
X (scp.sparse.csr_array) – Some sparse matrix
- Returns:
Number of rows in X, Number of cols in X, Number of non-zero elements in X, X.data, X.indptr.astype(np.int64), X.indices.astype(np.int64)
- Return type:
tuple[int,int,int,np.ndarray,np.ndarray,np.ndarray]
- mssm.src.python.matrix_solvers.translate_sparse(mat: csc_array) tuple[ndarray, ndarray, ndarray]
Translate canonical sparse csc matrix representation into data, row, col representation
- Parameters:
mat (scp.sparse.csc_array) – sparse matrix
- Returns:
data, rows, cols of sparse matrix
- Return type:
tuple[np.ndarray,np.ndarray,np.ndarray]
mssm.src.python.penalties module
- class mssm.src.python.penalties.DifferencePenalty
Bases:
Penalty
Difference Penalty class. Generates penalty matrices for smooth terms.
- Variables:
pen_type (PenType.DIFFERENCE) – Type of the penalty matrix.
- constructor(n: int, constraint: ConstType | None, m: int = 2) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]
Creates difference (order=m) n*n penalty matrix + root of the penalty. Based on code in Eilers & Marx (1996) and Wood (2017).
- References:
Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
n (int) – Dimension of square penalty matrix
constraint (ConstType|None) – Any contraint to absorb by the penalty or None if no constraint is required
m (int, optional) – Differencing order to apply to the identity matrix to get the penalty (this will also be the dimension of the penalty’s Kernel), defaults to 2
- Returns:
penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty
- Return type:
tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]
- class mssm.src.python.penalties.IdentityPenalty(pen_type: PenType)
Bases:
Penalty
Difference Penalty class. Generates penalty matrices for smooth terms and random terms.
- Parameters:
pen_type (PenType) – Type of the penalty matrix
- Variables:
pen_type (PenType) – Type of the penalty matrix passed to init method.
- constructor(n: int, constraint: ConstType | None, f: Callable | None = None) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]
Creates identity matrix penalty + root in case
f is None
.Note: This penalty never absorbs marginal constraints. It always returns an identity matrix but just decreases
n
by 1 ifconstraint is not None
to ensure that the returned penalty matrix is of suitable dimensions.- Parameters:
n (int) – Dimension of square penalty matrix
constraint (ConstType|None) – Any contraint to absorb by the penalty or None if no constraint is required
f (Callable|None, optional) – Any kind of function to apply to the diagonal elements of the penalty, defaults to None
- Returns:
penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty
- Return type:
tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]
- class mssm.src.python.penalties.Penalty(pen_type: PenType)
Bases:
object
Penalty base-class. Generates penalty matrices for smooth terms.
- Parameters:
pen_type (PenType) – Type of the penalty matrix
- Variables:
pen_type (PenType) – Type of the penalty matrix passed to the init method.
- constructor(n: int, constraint: ConstType | None, *args, **kwargs) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]
Creates penalty matrix + root of the penalty and returns both in list form (data, row indices, col indices).
- Parameters:
n (int) – Dimension of square penalty matrix
constraint (ConstType | None) – Any contraint to absorb by the penalty or None if no constraint is required
- Returns:
penalty data, penalty row indices, penalty column indices, root of penalty data, root of penalty row indices, root of penalty column indices, rank of penalty
- Return type:
tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]
- mssm.src.python.penalties.TP_pen(S_j: csc_array, D_j: csc_array, j: int, ks: list[int], constraint: ConstType | None) tuple[list[float], list[int], list[int], list[float], list[int], list[int], int]
Computes a tensor smooth penalty + root as defined in section 5.6 of Wood (2017) based on marginal penalty matrix
S_j
.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
S_j (scp.sparse.csc_array) – Marginal penalty matrix
D_j (scp.sparse.csc_array) – Root of marginal penalty matrix
j (int) – Index for current marginal
ks (list[int]) – List of number of basis functions of all marginals
constraint (ConstType | None) – Any constraint to absorb by the final penalty or None if no constraint is required
- Returns:
penalty data,penalty row indices,penalty column indices,root of penalty data,root of penalty row indices,root of penalty column indices,rank of penalty
- Return type:
tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]
- mssm.src.python.penalties.adjust_pen_drop(dat: list[float], rows: list[int], cols: list[int], drop: list[int], offset: int = 0) tuple[list[float], list[int], list[int], int]
Adjusts penalty matrix (represented via
dat
,rows
, andcols
) by dropping rows and columns indicated bydrop
.Optionally,
offset
is added to the elements inrows
andcols
, which is useful when indices indrop
do not start at zero.- Parameters:
dat ([float]) – List of elements in penalty matrix.
rows ([int]) – List of row indices of penalty matrix.
cols ([int]) – List of column indices of penalty matrix.
drop ([int]) – Rows and columns to drop from penalty matrix. Might actually contain indices corresponding to
rows + offset
andcols + offset
, which can be corrected for via theoffset
argument.offset (int, optional) – An optional offset to add to
rows
andcols
to adjust for the indexing indrop
, defaults to 0
- Returns:
A tuple with 4 elements: the data, rows, and cols of the adjusted penalty matrix excluding dropped elements and the number of excluded elements.
- Return type:
tuple[list[float],list[int],list[int],int]
- mssm.src.python.penalties.embed_in_S_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], S_emb: csc_array | None, S_col: int, SJ_col: int, cIndex: int) tuple[csc_array, int]
Embed a term-specific penalty matrix
SJ
(provided as three lists:pen_data
,pen_rows
andpen_cols
) into the total penalty matrixS_emb
(see Wood, 2017)- Parameters:
pen_data (list[float]) – Data of
SJ
pen_rows (list[int]) – Row indices of
SJ
pen_cols (list[int]) – Column indices of
SJ
S_emb (scp.sparse.csc_array | None) – Total penalty matrix or
None
in caseS_emb
will be initialized by the function.S_col (int) – Columns of total penalty matrix
SJ_col (int) – Columns of
SJ
cIndex (int) – Current row and column index indicating the top left cell of the (
SJ_col
*SJ_col
) blockSJ
should take up inS_emb
- Returns:
S_emb
withSJ
embedded, the updatedcIndex
(i.e.,cIndex + SJ_col
)- Return type:
tuple[scp.sparse.csc_array,int]
- mssm.src.python.penalties.embed_in_Sj_sparse(pen_data: list[float], pen_rows: list[int], pen_cols: list[int], Sj: csc_array | None, SJ_col: int) csc_array
Parameterize a term-specific penalty matrix
SJ
(provided as three lists:pen_data
,pen_rows
andpen_cols
).- Parameters:
pen_data (list[float]) – Data of
SJ
pen_rows (list[int]) – Row indices of
SJ
pen_cols (list[int]) – Column indices of
SJ
Sj (scp.sparse.csc_array | None) – A sparse matrix or
None
. In the latter case,SJ
is simply initialized by the function. If not, then the function returnsSJ + Sj
. The latter is useful if a term penalty is a sum of individual penalty matrices.SJ_col (int) – Columns of
SJ
- Returns:
SJ
which might actually beSJ + Sj
.- Return type:
scp.sparse.csc_array
Embed penalties from individual formulas into overall penalties for GAMMLSS/GSMM models.
- Parameters:
shared_penalties (list[list[LambdaTerm]]) – Nested list, with the inner one containing the penalties associated with an individual formula in
formulas
.formulas (list) – List of
mssm.src.python.formula.Formula
objectsextra_coef (int) – Number of extra coefficients required by the model’s family. Will result in the shared penalties being padded by an extra block of
extra_coef
zeroes.
- Returns:
A list of the embedded penalties required by a GAMMLSS or GSMM model.
- Return type:
list[LambdaTerm]
mssm.src.python.repara module
- mssm.src.python.repara.reparam(X: csc_array | None, S: list[LambdaTerm], cov: ndarray | None, option: int = 1, n_bins: int = 30, QR: bool = False, identity: bool = False, scale: bool = False) tuple
Options 1 - 3 are natural reparameterization discussed in Wood (2017; 5.4.2) with different strategies for the QR computation of \(\mathbf{X}\). Option 4 helps with stabilizing the REML computation and is from Appendix B of Wood (2011) and section 6.2.7 in Wood (2017):
Form complete matrix \(\mathbf{X}\) based on entire covariate.
Form matrix \(\mathbf{X}\) only based on unique covariate values.
Form matrix \(\mathbf{X}\) on a sample of values making up covariate. Covariate is split up into
n_bins
equally wide bins. The number of covariate values per bin is then calculated. Subsequently, the ratio relative to minimum bin size is computed and each ratio is rounded to the nearest integer. Thenratio
samples are obtained from each bin. That way, imbalance in the covariate is approximately preserved when forming the QR.Transform term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on Appendix B of Wood (2011) and section 6.2.7 in Wood (2017) so that they are full-rank and their log-determinant can be computed safely. In that case, only
S
needs to be provided and has to be a list holding the penalties to be transformed. If the transformation is to be applied to model matrices, coefficients, hessian, and covariance matrices X should be set to something other thanNone
(does not matter what, can for example be the first model matrix.) Themssm.src.python.gamm_solvers.reparam_model()
function can be used to apply the transformation and also returns the required transformation matrices to reverse it.
For Options 1-3:
If
QR==True
then \(\mathbf{X}\) is decomposed into \(\mathbf{Q}\mathbf{R}\) directly via QR decomposition. Alternatively, we first form \(\mathbf{X}^T\mathbf{X}\) and then compute the cholesky \(\mathbf{L}\) of this product - note that \(\mathbf{L}^T = \mathbf{R}\). Overall the latter strategy is much faster (in particular ifoption==1
), but the increased loss of precision in \(\mathbf{L}^T = \mathbf{R}\) might not be ok for some.After transformation S only contains elements on it’s diagonal and \(\mathbf{X}\) the transformed functions. As discussed in Wood (2017), the transformed functions are decreasingly flexible - so the elements on \(\mathbf{S}\) diagonal become smaller and eventually zero, for elements that are in the kernel of the original \(\mathbf{S}\) (un-penalized == not flexible).
For a similar transformation (based solely on \(\mathbf{S}\)), Wood et al. (2013) show how to further reduce the diagonally transformed \(\mathbf{S}\) to an even simpler identity penalty. As discussed also in Wood (2017) the same behavior of decreasing flexibility if all entries on the diagonal of \(\mathbf{S}\) are 1 can only be maintained if the transformed functions are multiplied by a weight related to their wiggliness. Specifically, more flexible functions need to become smaller in amplitude - so that for the same level of penalization they are removed earlier than less flexible ones. To achieve this Wood further post-multiply the transformed matrix \(\mathbf{X}'\) with a matrix that contains on it’s diagonal the reciprocal of the square root of the transformed penalty matrix (and 1s in the last cells corresponding to the kernel). This is done here if
identity=True
.In
mgcv
the transformed model matrix and penalty can optionally be scaled by the root mean square value of the transformed model matrix (see the nat.param function in mgcv). This is done here ifscale=True
.For Option 4:
Option 4 enforces re-parameterization of term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) based on section Wood (2011) and section 6.2.7 in Wood (2017). In
mssm
multiple penalties can be placed on individual terms (i.e., tensor terms, random smooths, Kernel penalty) but it is not always the case that the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) - i.e., the sum over all those individual penalties multiplied with their \(\lambda\) parameters, is of full rank. If we need to form the inverse of the term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) this is problematic. It is also problematic, as discussed by Wood (2011), if the different \(\lambda\) are all of different magnitude in which case forming the term-specific \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|+)\) becomes numerically difficult.The re-parameterization implemented by option 4, based on Appendix B in Wood (2011), solves these issues. After this re-parameterization a term-specific \(\mathbf{S}_{\boldsymbol{\lambda}}\) has been formed that is full rank. And \(log(|\mathbf{S}_{\boldsymbol{\lambda}}|)\) - no longer just a generalized determinant - can be computed without running into numerical problems.
The strategy by Wood (2011) could be applied to form an overall - not just term-specific - \(\mathbf{S}_{\boldsymbol{\lambda}}\) with these properties. However, this does not work for general smooth models as defined by Wood et al. (2016). Hence, mssm opts for the blockwise strategy. However, in
mssm
penalties currently cannot overlap, so this is not necessary at the moment.- References:
Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Scheipl, F., & Faraway, J. J. (2013). Straightforward intermediate rank tensor product smoothing in mixed models.
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
mgcv source code (accessed 2024). smooth.R file, nat.param function.
- Parameters:
X (scp.sparse.csc_array | None) – Model/Term matrix or None
S (list[LambdaTerm]) – List of penalties
cov (np.ndarray | None) – covariate array associated with a specific term or None
option (int, optional) – Which re-parameterization to compute, defaults to 1
n_bins (int, optional) – Number of bins to use as part of option 3, defaults to 30
QR (bool, optional) – Whether to rely on a QR decomposition or not (then a Cholesky is used) as part of options 1-3, defaults to False
identity (bool, optional) – Whether the penalty matrix should be transformed to identity as part of options 1-3, defaults to False
scale (bool, optional) – Whether the penalty matrix and term matrix should be scaled as part of options 1-3, defaults to False
- Returns:
Return object content depends on
option
but will usually hold informations to apply/undo the required re-parameterization as well as already re-parameterized objects.- Return type:
tuple
- mssm.src.python.repara.reparam_model(dist_coef: list[int], dist_up_coef: list[int], coef: ndarray, split_coef_idx: list[int], Xs: list[csc_array], penalties: list[LambdaTerm], form_inverse: bool = True, form_root: bool = True, form_balanced: bool = True, n_c: int = 1) tuple[ndarray, list[csc_array], list[LambdaTerm], csc_array, csc_array | None, csc_array | None, csc_array | None, csc_array, list[csc_array]]
Relies on the transformation strategy from Appendix B of Wood (2011) to re-parameterize the model.
Coefficients, model matrices, and penalties are all transformed. The transformation is applied to each term separately as explained by Wood et al., (2016).
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
- Parameters:
dist_coef ([int]) – List of number of coefficients per formula/linear predictor/distribution parameter of model.
dist_up_coef ([int]) – List of number of unpenalized (i.e., fixed effects, linear predictors/parameters) coefficients per formula/linear predictor/distribution parameter of model.
coef (numpy.array) – Vector of coefficients (numpy.array of dim (-1,1)).
split_coef_idx ([int]) – List with indices to split
coef
vector into separate versions per linear predictor.Xs ([scp.sparse.csc_array]) – List of model matrices obtained for example via
model.get_mmat()
.penalties ([LambdaTerm]) – List of penalties for model.
form_inverse (bool, optional) – Whether or not an inverse of the transformed penalty matrices should be formed. Useful for computing the EFS update, defaults to True
form_root (bool, optional) – Whether or not to form a root of the total penalty, defaults to True
form_balanced (bool, optional) – Whether or not to form the “balanced” penalty as described by Wood et al. (2016) after the re-parameterization, defaults to True
n_c (int, optional) – Number of cores to use to ocmpute the inverse when
form_inverse=True
, defaults to 1
- Raises:
ValueError – Raises a value error if one of the inverse computations fails.
- Returns:
A tuple with 9 elements: the re-parameterized coefficient vector, a list with the re-parameterized model matrices, a list of the penalties after re-parameterization, the total re-parameterized penalty matrix, optionally the balanced version of the former, optionally a root of the re-parameterized total penalty matrix, optionally the inverse of the re-parameterized total penalty matrix, the transformation matrix
Q
so thatQ.T@S_emb@Q = S_emb_rp
whereS_emb
andS_emb_rp
are the total penalty matrix before and after re-parameterization, a list of transformation matricesQD
so thatXD@QD=XD_rp
whereXD
andXD_rp
are the model matrix of the Dth linear predictor before and after re-parameterization.- Return type:
tuple[np.ndarray, list[scp.sparse.csc_array], list[LambdaTerm], scp.sparse.csc_array, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array | None, scp.sparse.csc_array, list[scp.sparse.csc_array]]
mssm.src.python.smooths module
- mssm.src.python.smooths.B_spline_basis(cov: ndarray, event_onset: int | None, nk: int, min_c: float | None = None, max_c: float | None = None, drop_outer_k: bool = False, convolve: bool = False, deg: int = 3) ndarray
Computes B-spline basis of degree
deg
givenknots
.Based on code and definitions in “Splines, Knots, and Penalties” by Eilers & Marx (2010) and adapted to allow for convolving B-spline bases.
- References:
Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
- Parameters:
cov (np.ndarray) – Flattened covariate array (i.e., of shape (-1,))
event_onset (int | None) – Sample on which to place a dirac delta with which the B-spline bases should be convolved - ignored if
convolve==False
.nk (int) – Number of basis functions to create
min_c (float | None, optional) – Minimum covariate value, defaults to None
max_c (float | None, optional) – Maximum covariate value, defaults to None
drop_outer_k (bool, optional) – Deprecated, defaults to False
convolve (bool, optional) – Whether basis functions should be convolved (i.e., time-shifted) with an impulse response function triggered at
event_onset
, defaults to Falsedeg (int, optional) – Degree of basis, defaults to 3
- Returns:
An array of shape
(-1,nk)
holding thenk
Basis functions evaluated overx
and optionally convolved with an impulse response function triggered atevent_onset
- Return type:
np.ndarray
- mssm.src.python.smooths.TP_basis_calc(cTP: ndarray, nB: ndarray) ndarray
Computes row-wise Kroenecker product between
cTP
andnB
. Useful to create a Tensor smooth basis.See Wood(2017) 5.6.1 and B.4.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
cTP (np.ndarray) – Marginal basis or partially accumulated tensor smooth basis
nB (np.ndarray) – Marginal basis to include in the tensor smooth
- Returns:
The row-wise Kroenecker product between
cTP
andnB
- Return type:
np.ndarray
- mssm.src.python.smooths.bbase(x: ndarray, knots: ndarray, dx: float, deg: int) ndarray
Computes B-spline basis of degree
deg
givenknots
and interval spacingdx
.Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)
- References:
Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
- Parameters:
x (np.ndarray) – Covariate
knots (np.ndarray) – knot location vector
dx (float) – Interval spacing
(xr-xl) / ndx
wherexr
andxl
are max and min ofx
andndx=nk-deg
wherenk
is the number of basis functions.deg (int) – Degree of basis
- Returns:
numpy.array of shape (-1,``nk``)
- Return type:
np.ndarray
- mssm.src.python.smooths.convolve_event(f: ndarray, pulse_location: int) ndarray
Convolution of function
f
with dirac delta spike centered around samplepulse_locations
.Based on code by Wierda et al. 2012
- References:
Wierda, S. M., van Rijn, H., Taatgen, N. A., & Martens, S. (2012). Pupil dilation deconvolution reveals the dynamics of attention at high temporal resolution. https://doi.org/10.1073/pnas.1201858109
- Parameters:
f (np.ndarray) – Function evaluated over some samples
pulse_location (int) – Location of spike (in sample)
- Returns:
Convolved function as array
- Return type:
np.ndarray
- mssm.src.python.smooths.tpower(x: ndarray, t: ndarray, p: int) ndarray
Computes truncated
p-t
power function ofx
.Function taken from “Splines, Knots, and Penalties” by Eilers & Marx (2010)
- References:
Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
- Parameters:
x (np.ndarray) – Covariate
t (np.ndarray) – knot location vector
p (int) – degrees of spline basis
- Returns:
np.power(x - t,p) * (x > t)
- Return type:
np.ndarray
mssm.src.python.terms module
- class mssm.src.python.terms.GammTerm(variables: list[str], type: TermType, is_penalized: bool, penalty: list[Penalty], pen_kwargs: list[dict])
Bases:
object
Base-class implemented by the terms passed to
mssm.src.python.formula.Formula
.- Parameters:
variables ([str]) – List of variables as strings.
type (TermType) – Type of term as enum
is_penalized (bool) – Whether the term is penalized/can be penalized or not
penalty ([Penalty]) – The default penalties associated with a term.
pen_kwargs ([dict]) – A list of dictionaries, each with key-word arguments passed to the construction of the corresponding
Penalty
inpenalty
.
- build_matrix(*args, **kwargs)
Builds the design/term/model matrix associated with this term and returns it represented as a list of values, a list of row indices, and a list of column indices.
This method is implemented by every implementation of the
GammTerm
class. The returned lists can then be used to create a sparse matrix for this term. Also returns the number of additional columnsthat would be added to the total model matrix by this term.
- build_penalty(penalties: list[LambdaTerm], cur_pen_idx: int, *args, **kwargs) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.
- Returns:
Updated
penalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(*args, **kwargs)
Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
This method is implemented by every implementation of the
GammTerm
class.
- mssm.src.python.terms.build_ir_smooth_series(irsterm: irf, s_cov: ndarray, s_event: int, var_map: dict, var_mins: dict, var_maxs: dict, by_levels: ndarray | None) ndarray
Function to build the impulse response martrix for a single time-series.
- Parameters:
irsterm (irf) – Impulse response smooth term
s_cov (np.ndarray) – covariate array associated with
irsterm
s_event (int) – Onset of impulse response function
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or
None
for categorical variables.var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or
None
for categorical variables.by_levels (np.ndarray | None) – Numpy array holding the levels of the factor associated with the
irsterm
term (viairsterm.by
) or None
- Returns:
The term matrix associated with the particular event at
s_event
- Return type:
np.ndarray
- mssm.src.python.terms.build_linear_term(lTerm: l | rs, has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix associated with a linear/random term and returns it represented as a list of values, a list of row indices, and a list of column indices.
- Parameters:
lTerm – Linear or random slope term
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- class mssm.src.python.terms.f(variables: list, by: str = None, by_cont: str = None, binary: tuple[str, str] | None = None, id: int = None, nk: int | list[int] = None, te: bool = False, rp: int = 0, constraint: ~mssm.src.python.custom_types.ConstType = ConstType.QR, identifiable: bool = True, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {}, is_penalized: bool = True, penalize_null: bool = False, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)
Bases:
GammTerm
A univariate or tensor interaction smooth term. If
variables
only contains a single variable \(x\), this term will represent a univariate \(f(x)\) in a model:\[\mu_i = a + f(x_i)\]For example, the model below in
mgcv
:bam(y ~ s(x,k=10) + s(z,k=20))
would be expressed as follows in
mssm
:GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19)]),Gaussian())
If
variables
contains two variables \(x\) and \(z\), then this term will either represent the tensor interaction \(f(x,z)\) in model:\[\mu_i = a + f(x_i) + f(z_i) + f(x_i,z_i)\]or in model:
\[\mu_i = a + f(x_i,z_i)\]The first behavior is achieved by setting
te=False
. In that case it is necessary to add ‘main effect’f
terms for \(x\) and \(y\). In other words, the behavior then mimicks theti()
term available inmgcv
(Wood, 2017). Ifte=True
, the term instead behaves like ate()
term inmgcv
, so no separate smooth effects for the main effects need to be included.For example, the model below in
mgcv
:bam(y ~ te(x,z,k=10))
would be expressed as follows in
mssm
:GAMM(Formula(lhs("y"),[i(),f(["x","z"],nk=9,te=True)]),Gaussian())
In addition, the model below in
mgcv
:bam(y ~ s(x,k=10) + s(z,k=20) + ti(x,z,k=10))
would be expressed as follows in
mssm
:GAMM(Formula(lhs("y"),[i(),f(["x"],nk=9),f(["z"],nk=19),f(["x","z"],nk=9,te=False)]),Gaussian())
By default a B-spline basis is used with
nk=9
basis functions (after removing identifiability constrains). This is equivalent tomgcv
’s default behavior of using 10 basis functions (before removing identifiability constrains). In casevariables
contains more then one variablenk
can either bet set to a single value or to a list containing the number of basis functions that should be used to setup the spline matrix for every variable. The former implies that the same number of coefficients should be used for all variables. Keyword arguments that change the computation of the spline basis can be passed along via a dictionary to thebasis_kwargs
argument. Importantly, if multiple variables are present and a list is passed tonk
, a list of dictionaries with keyword arguments of the same length needs to be passed tobasis_kwargs
as well.Multiple penalties can be placed on every term by adding
Penalty
to thepenalties
argument. In casevariables
contains multiple variables a separate tensor penalty (see Wood, 2017) will be created for every penalty included inpenalties
. Again, key-word arguments that alter the behavior of the penalty creation need to be passed as dictionaries topen_kwargs
for every penalty included inpenalties
. By default, a univariate term is penalized with a difference penalty of order 2 (Eilers & Marx, 2010).References:
Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in
data
passed toFormula
. Need to be continuous.by (str, optional) – A string corresponding to a factor in
data
passed toFormula
. Separate f(variables
) (and smoothness penalties) will be estimated per level ofby
.by_cont (str, optional) – A string corresponding to a numerical variable in
data
passed toFormula
. The model matrix for the estimated smooth term f(variables
) will be multiplied by the column of this variable. Can be used to estimate ‘varying coefficient’ models but also to set up binary smooths or to only estimate a smooth term for specific levels of a factor (i.e., what is possible for ordered factors in R & mgcv).binary ([str,str], optional) – A list containing two strings. The first string corresponds to a factor in
data
passed toFormula
. A separate f(variables
) will be estimated for the level of this factor corresponding to the second string.id (int, optional) – Only useful in combination with specifying a
by
variable. Ifid
is set to any integer the penalties placed on the separate f(variables
) will share a single smoothness penalty.nk (int or list[int], optional) – Number of basis functions to use. Even if
identifiable
is true, this number will reflect the final number of basis functions for this term (i.e., mssm acts like you would have asked for 10 basis functions ifnk=9
and identifiable=True; the default).te (bool, optional) – For tensor interaction terms only. If set to false, the term mimics the behavior of
ti()
in mgcv (Wood, 2017). Otherwise, the term behaves like ate()
term in mgcv - i.e., the marginal basis functions are not removed from the interaction.rp (int, optional) – Experimental - will currently break for tensor smooths or in case
by
is provided. Whether or not to re-parameterize the term - seemssm.src.python.formula.reparam()
for details. Defaults to no re-parameterization.constraint (mssm.src.constraints.ConstType, optional) – What kind of identifiability constraints should be absorbed by the terms (if they are to be identifiable). Either QR-based constraints (default, well-behaved), by means of column-dropping (no infill, not so well-behaved), or by means of difference re-coding (little infill, not so well behaved either).
identifiable (bool, optional) – Whether or not the constant should be removed from the space of functions this term can fit. Achieved by enforcing that \(\mathbf{1}^T \mathbf{X} = 0\) (\(\mathbf{X}\) here is the spline matrix computed for the observed data; see Wood, 2017 for details). Necessary in most cases to keep the model identifiable.
basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in
mssm.src.smooths.B_spline_basis()
.basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. Consult the docstring of the function computing the basis you want. For the default B-spline basis for example see the
mss.src.smooths.B_spline_basis()
function. The default arguments set by any basis function, should work for most cases though.is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.
penalize_null (bool, optional) – Should a separate Null-space penalty (Marra & Wood, 2011) be placed on the term. By default, the term here will leave a linear f(variables) un-penalized! Thus, there is no option for the penalty to achieve f(variables) = 0 even if that would be supported by the data. Adding a Null-space penalty provides the penalty with that power. This can be used for model selection instead of Hypothesis testing and is the preferred way in
mssm
(see Marra & Wood, 2011 for details).penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.
pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. Consult the docstring of the
Penalty.constructor()
method of the specificPenalty
you want to use for details.
- absorb_repara(rpidx, X, cov)
Computes all terms necessary to absorb a re-parameterization into the term and penalty matrix.
- Parameters:
rpidx (int) – Index to specific reparam. obejct. There must be a 1 to 1 relationship between reparam. objects and the number of marginals required by this smooth (i.e., the number of variables).
X (scipy.sparse.csc_array) – Design matrix associated with this term.
cov (np.ndarray) – The covariate this term is a function of as a flattened numpy array.
- Raises:
ValueError – If this method is called with
rpidx
exceeding the number of this term’s RP objects (i.e., whenrpidx > (len(self.RP) - 1)
) or ifself.rp
is equal to a value for which no reparameterisation is implemented.
- build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: list[int], cov_flat: ndarray, use_only: list[int], tol: int = 0) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix for this smooth term.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or
None
for categorical variables.var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or
None
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this smooth term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.penid (int) – If a term is subjected to multipe penalties, then
penid
indexes which of those penalties is currently implemented. Otherwise can be set to zero.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
n_coef (int) – Number of coefficients associated with this term.
col_S (int) – Number of columns of the total penalty matrix.
- Returns:
Updated
penalties
list including the new penalties implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(factor_levels: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this smooth term, the number of unpenalized coefficients associated with this smooth term, and a list with names for each of the coefficients associated with this smooth term.
- Parameters:
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- class mssm.src.python.terms.fs(variables: list, rf: str = None, nk: int = 9, m: int = 1, rp: int = 1, by_cont: str | None = None, by_subgroup: tuple[str, str] | None = None, approx_deriv: dict | None = None, basis: ~collections.abc.Callable = <function B_spline_basis>, basis_kwargs: dict = {})
Bases:
f
Essentially a
f
term withby=rf
,id != None
,penalize_null= True
,pen_kwargs = [{"m":1}]
, andrp=1
.This term approximates the “factor-smooth interaction” basis “fs” with
m= 1
available inmgcv
(Wood, 2017). For example, the term below frommgcv
:s(x,sub,bs="fs"))
would approximately correspond to the following term in
mssm
:fs(["x"],rf="sub")
They are however not equivalent (mgcv by default uses a different basis for which the
m
key-word has a different functionality).Specifically, here
m= 1
implies that the only function left unpenalized by the default (difference) penalty is the constant (Eilers & Marx, 2010). Thus, a linear basis is penalized by the same default penalty that also penalizes smoothness (and not by a separate penalty as is the case inmgcv
whenm=1
for the default basis)! Any constant basis is penalized by the null-space penalty (in bothmgcv
andmssm
; see Marra & Wood, 2011) - the term thus shrinks towards zero (Wood, 2017).The factor smooth basis in mgcv allows to let the penalty be different for different levels of an additional factor (by additionally specifying the
by
argument for a smooth with basis “fs”). I.e.,s(Time,Subject,by='condition',bs='fs')
in
mgcv
would estimate a non-linear random smooth of “time” per level of the “subject” & “condition” interaction - with the same penalty being placed on all random smooth terms within the same “condition” level.This can be achieved in
mssm
by adding multiplefs
terms to theFormula
and utilising theby_subgroup
argument. This needs to be set to a list where the first element identifies the additional factor variable (e.g., “condition”) and the second element corresponds to a level of said factor variable. E.g., to approximate the aforementionedmgcv
term we have to add:*[fs(["Time"],rf="subject_cond",by_subgroup=["cond",cl]) for cl in np.unique(dat["cond"])]
to the
Formula
terms
list. Importantly, “subject_cond” is the interaction of “subject” and “condition” - not just the “subject variable in the data.Model estimation can become quite expensive for
fs
terms, when the factor variable forrf
has many levels. (> 10000) In that case, approximate derivative evaluation can speed things up considerably. To enforce this, theapprox_deriv
argument needs to be specified with a dict, having the following structure:{"no_disc":[str],"excl":[str],"split_by":[str],"restarts":int,"seed":None or int}
. “no_disc” should usually be set to an empty list, and should in general only contain names of continuous variables included in the formula. Any variable mentioned here will not be discretized before clustering - this will make the approximation a bit more accurate but also require more time. Similarly, “excl” specifies any continuous variables that should be excluded for clustering. “split_by” should generally be set to a list containing all categorical variables present in the formula. “restarts” indicates the number of times to re-produce the clustering (40 seems to be a good number). “seed” can either be set to None or to an integer - in the latter case, the random cluster initialization will use that seed, ensuring that the clustering outcome (and hence model fit) is replicable.References:
Eilers, P., & Marx, B. (2010). Splines, knots, and penalties. https://doi.org/10.1002/WICS.125
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models.Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.). Chapman and Hall/CRC.
- Parameters:
variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in
data
passed toFormula
. Need to be continuous.rf (str, optional) – A string corresponding to a (random) factor in
data
passed toFormula
. Separate f(variables
) (but a shared smoothness penalty!) will be estimated per level ofrf
.nk (int or list[int], optional) – Number of basis functions -1 to use. I.e., if
nk=9
(the default), the term will use 10 basis functions. By defaultf()
has identifiability constraints applied and we act as ifnk``+ 1 coefficients were requested. The ``fs()
term needs no identifiability constrains so if the same number of coefficients used for af()
term is requested (the desired approach), one coefficient is added to compensate for the lack of identifiability constraints. This is the opposite to how this is handled in mgcv: specifyingnk=10
for “fixed” univariate smooths results in 9 basis functions being available. However, for a smooth in mgcv with basis=’fs’, 10 basis functions will remain available.basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in
mssm.src.smooths.B_spline_basis()
.basis_kwargs (dict, optional) – A list containing one or multiple dictionaries specifying how the basis should be computed. For the B-spline basis the following arguments (with default values) are available:
convolve``=``False
,min_c``=``None
,max_c``=``None
,deg``=``3
. Seemssm.src.smooths.B_spline_basis()
for details.by_cont (str, optional) – A string corresponding to a numerical variable in
data
passed toFormula
. The model matrix for the estimated smooth term will be multiplied by the column of this variable. Can be used as an alternative to estimate separate random smooth terms per level of another factor (wich is also possible with by_subgroup).by_subgroup ([str,str], optional) – List including a factor variable and specific level of said variable. Allows for separate penalties as described above.
approx_deriv (dict, optional) – Dict holding important info for the clustering algorithm. Structure:
{"no_disc":[str],"excl":[str],"split_by":[str],"restarts":int}
- build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int], tol: int = 0) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix for this factor smooth term.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or
None
for categorical variables.var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or
None
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero rather than absolutely zero. Defaults to strictly zero.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this factor smooth term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.penid (int) – If a term is subjected to multipe penalties, then
penid
indexes which of those penalties is currently implemented. Otherwise can be set to zero.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.
- Returns:
Updated
penalties
list including the new penalties implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(factor_levels: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this factor smooth term, the number of unpenalized coefficients associated with this factor smooth term, and a list with names for each of the coefficients associated with this factor smooth term.
- Parameters:
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- mssm.src.python.terms.get_linear_coef_info(lTerm: l | rs, has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with a linear or random term, the number of unpenalized coefficients associated with a linear or random and a list with names for each of the coefficients associated with a linear or random.
- Parameters:
lTerm – Linear or random slope term
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- class mssm.src.python.terms.i
Bases:
GammTerm
An intercept/offset term. In a model
\[\mu_i = a + f(x_i)\]it reflects \(a\).
- build_matrix(ci: int, ti: int, ridx: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix for an intercept term.
- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
ridx (np.ndarray) – Array of non NAN rows in the data.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- get_coef_info() tuple[int, int, list[str]]
Returns the total number of coefficients associated with this term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- class mssm.src.python.terms.irf(variables: list[str], event_onset: list[int], basis_kwargs: list[dict], by: str = None, id: int = None, nk: int = 10, basis: ~collections.abc.Callable = <function B_spline_basis>, is_penalized: bool = True, penalty: list[~mssm.src.python.penalties.Penalty] | None = None, pen_kwargs: list[dict] | None = None)
Bases:
GammTerm
A simple impulse response term, designed to correct for events with overlapping responses in multi-level time-series modeling.
The idea (see Ehinger & Dimigen; 2019 for a detailed introduction to this kind of deconvolution analysis) is that some kind of event happens during each recorded time-series (e.g., stimulus onset, distractor display, mask onset, etc.) which is assumed to affect the recorded signal in the next X ms in some way. The moment of event onset can differ between recorded time-series. In other words, the event is believed to act like an impulse which triggers a delayed response on the signal. This term class can be used to estimate the shape of this impulse response. Multiple
irf
terms can be included in aFormula
if multiple events happen, potentially with overlapping responses.Example:
# Simulate time-series based on two events that elicit responses which vary in their overlap. # The summed responses + a random intercept + noise is then the signal. overlap_dat,onsets1,onsets2 = sim7(100,1,2,seed=20) # Model below tries to recover the shape of the two responses in the 200 ms after event onset (max_c=200) + the random intercepts: overlap_formula = Formula(lhs("y"),[irf(["time"],onsets1,nk=15,basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]), irf(["time"],onsets2,nk=15,basis_kwargs=[{"max_c":200,"min_c":0,"convolve":True}]), ri("factor")], data=overlap_dat, series_id="series") # For models with irf terms, the column in the data identifying unique series need to be specified. model = GAMM(overlap_formula,Gaussian()) model.fit()
Note, that care needs to be taken when predicting for models including
irf
terms, because the onset of events can differ between time-series. Hence, model predictions + standard errors should first be obtained for the entire data-set used also to train the model and then extract series-specific predictions from the model-matrix as follows:# Get model matrix for entire data-set but only based on the estimated shape for first irf term: _,pred_mat,ci_b = model.predict([0],overlap_dat,ci=True) # Now extract the prediction + approximate ci boundaries for a single series: s = 8 s_pred = pred_mat[overlap_dat["series"] == s,:]@model.coef s_ci = ci_b[overlap_dat["series"] == s] # Now the estimated response following the onset of the first event can be visualized + an approximate CI: from matplotlib import pyplot as plt plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred,color='blue') plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred+s_ci,color='blue',linestyle='dashed') plt.plot(overlap_dat["time"][overlap_dat["series"] == s],s_pred-s_ci,color='blue',linestyle='dashed')
References:
Ehinger, B. V., & Dimingen, O. (2019). Unfold: an integrated toolbox for overlap correction, non-linear modeling, and regression-based EEG analysis. https://doi.org/10.7717/peerj.7838
- Parameters:
variables (list[str]) – A list of the variables (strings) of which the term is a function. Need to exist in
data
passed toFormula
. Need to be continuous.event_onset ([int]) – A
np.array
containing, for each individual time-series, the index corresponding to the sample/time-point at which the event eliciting the response to be estimate by this term happened.basis_kwargs (dict) – A list containing one or multiple dictionaries specifying how the basis should be computed. For
irf
terms, theconvolve
argument has to be set to True! Also,min_c
andmax_c
must be specified.min_c
corresponds to the assumed min. delay of the response after event onset and can usually be set to 0.max_c
corresponds to the assumed max. delay of the response (in ms) after which the response is believed to have returned to a zero base-line.by (str, optional) – A string corresponding to a factor in
data
passed toFormula
. Separate irf(variables
) (and smoothness penalties) will be estimated per level ofby
.id (int, optional) – Only useful in combination with specifying a
by
variable. Ifid
is set to any integer the penalties placed on the separate irff(variables
) will share a single smoothness penalty.nk (int, optional) – Number of basis functions to use. I.e., if
nk=10
(the default), the term will use 10 basis functions (Note that these terms are not made identifiable by absorbing any kind of constraint).basis (Callable, optional) – The basis functions to use to construct the spline matrix. By default a B-spline basis (Eilers & Marx, 2010) implemented in
src.smooths.B_spline_basis
.is_penalized (bool, optional) – Should the term be left unpenalized or not. There are rarely good reasons to set this to False.
penalty (list[Penalty], optional) – A list of penalty types to be placed on the term.
pen_kwargs (list[dict], optional) – A list containing one or multiple dictionaries specifying how the penalty should be created. For the default difference penalty (Eilers & Marx, 2010) the only keyword argument (with default value) available is:
m=2
. This reflects the order of the difference penalty. Note, that while a higherm
permits penalizing towards smoother functions it also leads to an increased dimensionality of the penalty Kernel (the set of f[variables
] which will not be penalized). In other words, increasingly more complex functions will be left un-penalized for higherm
(except ifpenalize_null
is set to True).m=2
is usually a good choice and thus the default but see Eilers & Marx (2010) for details.
- build_matrix(ci: int, ti: int, var_map: dict, var_mins: dict, var_maxs: dict, factor_levels: dict, ridx: ndarray, cov: list[ndarray], use_only: list[int], pool, tol: int = 0) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix associated with this impulse response smooth term and returns it represented as a list of values, a list of row indices, and a list of column indices.
This method is implemented by every implementation of the
GammTerm
class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updatedci
column index, reflecting how many additional columns would be added to the total model matrix.- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_mins (dict) – Var mins dictionary. Keys are variables in the data, values are either the minimum value the variable takes on for continuous variables or
None
for categorical variables.var_maxs (dict) – Var maxs dictionary. Keys are variables in the data, values are either the maximum value the variable takes on in for continuous variables or
None
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov ([np.ndarray]) – A list containing a separate array per time-series included in the data and indicated to the formula. The array contains, for the particular time-seriers, all (encoded, in case of categorical predictors) values on each predictor (each columns of the array corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.
use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.pool (Any) – A multiprocessing pool for parallel matrix construction parts
tol (int, optional) – A tolerance that can be used to prune the term matrix from values close to zero but not absolutely zero. Defaults to strictly zero.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, penid: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this impulse response smooth term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.penid (int) – If a term is subjected to multipe penalties, then
penid
indexes which of those penalties is currently implemented. Otherwise can be set to zero.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.
- Returns:
Updated
penalties
list including the new penalties implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(ti: int, factor_levels: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this impulse response smooth term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- class mssm.src.python.terms.l(variables: list)
Bases:
GammTerm
Adds a parametric (linear) term to the model formula. The model \(\mu_i = a + b*x_i\) can for example be achieved by adding
[i(), l(['x'])]
to theterm
argument of aFormula
. The coefficient \(b\) estimated for the term will then correspond to the slope of \(x\). This class can also be used to add predictors for categorical variables. If the formula includes an intercept, binary coding will be utilized to add reference-level adjustment coefficients for the remaining k-1 levels of any additional factor variable.If more than one variable is included in
variables
the model will only add the the len(variables
)-interaction to the model! Lower order interactions and main effects will not be included by default (seeli()
function instead, which automatically includes all lower-order interactions and main effects).Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and acontinuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:
formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"]),l(["cond","x"])])
This formula will estimate the following model:
\[\mu_i = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]Here, \(c\) is a binary predictor variable created so that it is 1 if “cond”=2 else 0 and \(b_3\) is the coefficient that is added because
l(["cond","x"])
is included in the terms (i.e., the interaction effect).To get a model with only main effects for “cond” and “x”, the following formula could be used:
formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])
This formula will estimate:
\[\mu_i = a + b_1*c_i + b_2*x_i\]- Parameters:
variables ([str]) – A list of the variables (strings) for which linear predictors should be included
- build_matrix(has_intercept: bool, ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix associated with this linear term and returns it represented as a list of values, a list of row indices, and a list of column indices.
This method is implemented by every implementation of the
GammTerm
class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updatedci
column index, reflecting how many additional columns would be added to the total model matrix.- Parameters:
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- get_coef_info(has_intercept: bool, var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this linear term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
- Parameters:
has_intercept (bool) – Whether or not the formula of which this term is part includes an intercept term.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- mssm.src.python.terms.li(variables: list[str])
Behaves like the
l
class but automatically includes all lower-order interactions and main effects.Example: The interaction effect of factor variable “cond”, with two levels “1” and “2”, and acontinuous variable “x” on the dependent variable “y” are of interest. To estimate such a model, the following formula can be used:
formula = Formula(lhs("y"),terms=[i(),*li(["cond","x"])])
Note, the use of the
*
operator to unpack the individual terms returned from li!This formula will still (see
l
) estimate the following model:\[\mu = a + b_1*c_i + b_2*x_i + b_3*c_i*x_i\]with: \(c\) corresponding to a binary predictor variable created so that it is 1 if “cond”=2 else 0.
To get a model with only main effects for “cond” and “x”
li
cannot be used andl
needs to be used instead:formula = Formula(lhs("y"),terms=[i(),l(["cond"]),l(["x"])])
This formula will estimate:
\[\mu_i = a + b_1*c_i + b_2*x_i\]- Parameters:
variables (list[str]) – A list of the variables (strings) for which linear predictors should be included
- class mssm.src.python.terms.ri(variable: str)
Bases:
GammTerm
Adds a random intercept for the factor
variable
to the model. The random intercepts \(b_i\) are assumed to be i.i.d \(b_i \sim N(0,\sigma_b)\) i.e., normally distributed around zero - the simplest random effect supported bymssm
.Thus, this term achieves exactly what is achieved in
mgcv
by adding the term:s(variable,bs="re")
The
variable
needs to identify a factor-variable in the data (i.e., the .dtype of the variable has to be equal to ‘O’). If you want to add more complex random effects to the model (e.g., random slopes for continuous variable “x” per level of factorvariable
) use thers
class.- Parameters:
variable (str) – The name (string) of a factor variable. For every level of this factor a random intercept will be estimated. The random intercepts are assumed to follow a normal distribution centered around zero.
- build_matrix(ci: int, ti: int, var_map: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix associated with this random intercept term and returns it represented as a list of values, a list of row indices, and a list of column indices.
This method is implemented by every implementation of the
GammTerm
class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updatedci
column index, reflecting how many additional columns would be added to the total model matrix.- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this random intercept term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.
- Returns:
Updated
penalties
list including the new penalties implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this random intercept term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
- Parameters:
factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
- class mssm.src.python.terms.rs(variables: list[str], rf: str)
Bases:
GammTerm
Adds random slopes for the effects of
variables
for each level of the random factorrf
. The type of random slope created depends on the content ofvariables
.If
len(variables)==1
, and the string invariables
identifies a categorical variable in the data, then a random offset adjustment (for every level of the categorical variable, so without binary coding!) will be estimated for every level of the random factorrf
.Example: The factor variable “cond”, with two levels “1” and “2” is assumed to have a general effect on the DV “y”. However, data was collected from multiple subjects (random factor
rf
= “subject”) and it is reasonable to assume that the effect of “cond” is slightly different for every subject (it is also assumed that all subjects took part in both conditions identified by “cond”). A model that accounts for this is estimated via:formula = Formula(lhs("y"),terms=[i(),l(["cond"]),rs(["cond"],rf="subject")])
This formula will estimate the following model:
\[\mu = a + b_1*c_i + a_{j(i),cc(i)}\]Here, \(c\) is again a binary predictor variable created so that it is 1 if “cond”=2 for observation i else 0, \(cc(i)\) indexes the level of “cond” at observation \(i\), \(j(i)\) indexes the level of “subject” at observation \(i\), and \(a_{j,cc(i)}\) identifies the random offset estimated for subject \(j\) at the level of “cond” indicated by \(cc(i)\). The \(a_{j,cc(i)}\) are assumed to be i.i.d \(\sim N(0,\sigma_a)\). Note that the fixed effect sturcture uses binary coding but the random effect structure does not!
Hence,
rs(["cond"],rf="subject")
inmssm
corresponds to adding the term below to amgcv
model:s(cond,subject,bs="re")
If all the strings in
variables
identify continuous variables in the data, then a random slope for the len(variables
)-way interaction (will simplify to a slope for a single continuous variable if len(variables
) == 1) will be estimated for every level of the random factorrf
.Example: The continuous variable “x” is assumed to have a general effect on the DV “y”. However, data was collected from multiple subjects (random factor
rf
=”subject”) and it is reasonable to assume that the effect of “x” is slightly different for every subject. A model that accounts for this is estimated via:formula = Formula(lhs("y"),terms=[i(),l(["x"]),rs(["x"],rf="subject")])
This formula will estimate the following model:
\[\mu = a + b*x_i + b_{j(i)} * x_i\]Where, \(j(i)\) again indexes the level of “subject” at observation \(i\), \(b_j(i)\) identifies the random slope (the subject-specific slope adjustment for \(b\)) for variable “x” estimated for subject \(j\) and the \(b_{j(i)}\) are again assumed to be i.i.d from a single \(\sim N(0,\sigma_b)\)
Note, lower-order interaction slopes (as well as main effects) are not pulled in by default! Consider the following formula:
formula = Formula(lhs("y"),terms=[i(),*li(["x","z"]),rs(["x","z"],rf="subject")])
with another continuous variable “z”. This corresponds to the model:
\[\mu = a + b_1*x_i + b_2*z_i + b_3*x_i*z_i + b_{j(i)}*x_i*z_i\]With \(j(i)\) again indexing the level of “subject” at observation i, \(b_{j(i)}\) identifying the random slope (the subject-specific slope adjustment for \(b_3\)) for the interaction of variables \(x\) and \(z\) estimated for subject \(j\). The \(b_{j(i)}\) are again assumed to be i.i.d from a single \(\sim N(0,\sigma_b)\).
To add random slopes for the main effects of either \(x\) or \(z\) as well as an additional random intercept, additional
rs
and ari
terms would have to be added to the formula:formula = Formula(lhs("y"),terms=[i(),*li(["x","z"]), ri("subject"), rs(["x"],rf="subject"), rs(["z"],rf="subject"), rs(["x","z"],rf="subject")])
If
len(variables) > 1
and at least one string invariables
identifies a categorical variable in the data then random slopes for the len(variables
)-way interaction will be estimated for every level of the random factorrf
. Separate distribution parameters (the \(\sigma\) of the Normal) will be estimated for every level of the resulting interaction.Example: The continuous variable “x” and the factor variable “cond”, with two levels “1” and “2” are assumed to have a general interaction effect on the DV “y”. However, data was collected from multiple subjects (random factor
rf
=”subject”) and it is reasonable to assume that their interaction effect is slightly different for every subject. A model that accounts for this is estimated via:formula = Formula(lhs("y"),terms=[i(),*li(["x","cond"]),rs(["x","cond"],rf="subject")])
This formula will estimate the following model:
\[\mu = a + b_1*c_i + b_2*x_i + b_3*x_i*c_i + b_{j(i),cc(i)}*x_i\]With, \(c\) corresponding to a binary predictor variable created so that it is 1 if “cond”=2 for observation \(i\) else 0, \(cc(i)\) corresponds to the level of “cond” at observation \(i\), \(j(i)\) corresponds to the level of “subject” at observation \(i\), and \(b_{j(i),cc(i)}\) identifies the random slope for variable \(x\) at “cond” = \(cc(i)\) estimated for subject \(j\). That is: the \(b_{j,cc(i)}\) where \(cc(i)=1\) are assumed to be i.i.d realizations from normal distribution \(N(0,\sigma_{b_1})\) and the \(b_{j,cc(i)}\) where \(cc(i)=2\) are assumed to be i.i.d realizations from a separate normal distribution \(N(0,\sigma_{b_2})\).
Hence, adding
rs(["x","cond"],rf="subject")
to amssm
model, is equivalent to adding the term below to amgcv
model:s(x,subject,by=cond,bs="re")
Correlations between random effects cannot be taken into account by means of parameters (this is possible for example in
lme4
).- Parameters:
variables ([str]) – A list of variables. Can point to continuous and categorical variables.
rf (str) – A factor variable. Identifies the random factor in the data.
- build_matrix(ci: int, ti: int, var_map: dict, var_types: dict, factor_levels: dict, ridx: ndarray, cov_flat: ndarray, use_only: list[int]) tuple[list[float], list[int], list[int], int]
Builds the design/term/model matrix associated with this random slope term and returns it represented as a list of values, a list of row indices, and a list of column indices.
This method is implemented by every implementation of the
GammTerm
class. The returned lists can then be used to create a sparse matrix for this term. Also returns an updatedci
column index, reflecting how many additional columns would be added to the total model matrix.- Parameters:
ci (int) – Current column index.
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
var_map (dict) – Var map dictionary. Keys are variables in the data, values their column index in the encoded predictor matrix.
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
ridx (np.ndarray) – Array of non NAN rows in the data.
cov_flat (np.ndarray) – An array, containing all (encoded, in case of categorical predictors) values on each predictor (each columns of
cov_flat
corresponds to a different predictor) variable included in any of the terms in order of the data-frame passed to the Formula.use_only ([int]) – A list holding term indices for which the matrix should be formed. For terms not included in this list a zero matrix will be returned. Can be set to
None
so that no terms are excluded.
- Returns:
matrix data, matrix row indices, matrix column indices, added columns
- Return type:
tuple[list[float],list[int],list[int],int]
- build_penalty(ti: int, penalties: list[LambdaTerm], cur_pen_idx: int, factor_levels: dict, col_S: int) tuple[list[LambdaTerm], int]
Builds a penalty matrix associated with this random slope term and returns an updated
penalties
list including it.This method is implemented by most implementations of the
GammTerm
class. Two arguments need to be returned: the updatedpenalties
list including the new penalty implemented as aLambdaTerm
and the updatedcur_pen_idx
. The latter simply needs to be incremented for every penalty added topenalties
.- Parameters:
ti (int) – Index corresponding to the position the current term (i.e., self) takes on in the list of terms of the Formula.
penalties ([LambdaTerm]) – List of previosly created penalties.
cur_pen_idx (int) – Index of the last element in
penalties
.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
col_S (int) – Number of columns of the total penalty matrix.
- Returns:
Updated
penalties
list including the new penalties implemented as aLambdaTerm
and the updatedcur_pen_idx
- Return type:
tuple[list[LambdaTerm],int]
- get_coef_info(var_types: dict, factor_levels: dict, coding_factors: dict) tuple[int, int, list[str]]
Returns the total number of coefficients associated with this random slope term, the number of unpenalized coefficients associated with this term, and a list with names for each of the coefficients associated with this term.
- Parameters:
var_types (dict) – Var types dictionary. Keys are variables in the data, values are either
VarType.NUMERIC
for continuous variables orVarType.FACTOR
for categorical variables.factor_levels (dict) – Factor levels dictionary. Keys are factor variables in the data, values are np.arrays holding the unique levels (as str) of the corresponding factor.
coding_factors (dict) – Factor coding dictionary. Keys are factor variables in the data, values are dictionaries, where the keys correspond to the encoded levels (int) of the factor and the values to their levels (str).
- Returns:
Number of coefficients associated with term, number of un-penalized coefficients associated with term, coef names
- Return type:
tuple[int,int,list[str]]
mssm.src.python.utils module
- class mssm.src.python.utils.DummyRhoPrior(a=np.float64(-16.11809565095832), b=np.float64(16.11809565095832))
Bases:
RhoPrior
Simple uniform prior for rho - the log-smoothing penalty parameters
- logpdf(rho: ndarray) ndarray
Returns an array holding zeroes for all log(lambda) parameters within
self.a
andself.b
, otherwise-np.inf
.- Parameters:
rho (np.ndarray) – Array of log(lambda) parameters
- Returns:
Log-density array as described above
- Return type:
np.ndarray
- class mssm.src.python.utils.GAMLSSGSMMFamily(pars: int, gammlss_family: GAMLSSFamily)
Bases:
GSMMFamily
Implementation of the
GSMMFamily
class that uses only information about the likelihood to estimate any implemented GAMMLSS model.Allows to estimate any GAMMLSS as a GSMM via the L-qEFS & Newton update. Example:
# Simulate 500 data points sim_dat = sim3(500,2,c=1,seed=0,family=Gaussian(),binom_offset = 0, correlate=False) # We need to model the mean: mu_i formula_m = Formula(lhs("y"), [i(),f(["x0"]),f(["x1"]),f(["x2"]),f(["x3"])], data=sim_dat) # And for sd - here constant formula_sd = Formula(lhs("y"), [i()], data=sim_dat) # Collect both formulas formulas = [formula_m,formula_sd] links = [Identity(),LOG()] # Now define the general family + model gsmm_fam = GAMLSSGSMMFamily(2,GAUMLSS(links)) model = GSMM(formulas=formulas,family=gsmm_fam) # Fit with SR1 bfgs_opt={"gtol":1e-9, "ftol":1e-9, "maxcor":30, "maxls":200, "maxfun":1e7} model.fit(init_coef=None,method='qEFS',extend_lambda=False, control_lambda=0,max_outer=200,max_inner=500,min_inner=500, seed=0,qEFSH='SR1',max_restarts=5,overwrite_coef=False,qEFS_init_converge=False,prefit_grad=True, progress_bar=True,**bfgs_opt) ################### Or for a multinomial model: ################### formulas = [Formula(lhs("y"), [i(),f(["x0"])], data=sim5(1000,seed=91)) for k in range(4)] # Create family - again specifying K-1 pars - here 4! family = MULNOMLSS(4) # Collect both formulas links = family.links # Now again define the general family + model gsmm_fam = GAMLSSGSMMFamily(4,family) model = GSMM(formulas=formulas,family=gsmm_fam) # And fit with SR1 bfgs_opt={"gtol":1e-9, "ftol":1e-9, "maxcor":30, "maxls":200, "maxfun":1e7} model.fit(init_coef=None,method='qEFS',extend_lambda=False, control_lambda=0,max_outer=200,max_inner=500,min_inner=500, seed=0,qEFSH='SR1',max_restarts=0,overwrite_coef=False,qEFS_init_converge=False,prefit_grad=True, progress_bar=True,**bfgs_opt)
- References:
Wood, Pya, & Säfken (2016). Smoothing Parameter and Model Selection for General Smooth Models.
Nocedal & Wright (2006). Numerical Optimization. Springer New York.
- Parameters:
pars (int) – Number of parameters of the likelihood.
gammlss_family (GAMLSSFamily) – Any implemented member of the
GAMLSSFamily
class. Available inself.llkargs[0]
.
- gradient(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) ndarray
Function to evaluate gradient of GAMM(LSS) model when estimated via GSMM.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The Gradient of the log-likelihood evaluated at
coef
as numpy array) of shape (-1,1).- Return type:
np.ndarray
- hessian(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) csc_array
Function to evaluate Hessian of GAMM(LSS) model when estimated via GSMM.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The Hessian of the log-likelihood evaluated at
coef
.- Return type:
scp.sparse.csc_array
- llk(coef: ndarray, coef_split_idx: list[int], ys: list[ndarray], Xs: list[csc_array]) float
Function to evaluate log-likelihood of GAMM(LSS) model when estimated via GSMM.
- Parameters:
coef (np.ndarray) – The current coefficient estimate (as np.array of shape (-1,1) - so it must not be flattened!).
coef_split_idx ([int]) – A list used to split (via
np.split()
) thecoef
into the sub-sets associated with each paramter of the llk.ys ([np.ndarray or None]) – List containing the vectors of observations passed as
lhs.variable
to the formulas. Note: by conventionmssm
expectes that the actual observed data is passed along via the first formula (so it is stored inys[0]
). If multiple formulas have the samelhs.variable
as this first formula, thenys
containsNone
at their indices to save memory.Xs ([scp.sparse.csc_array]) – A list of sparse model matrices per likelihood parameter.
- Returns:
The log-likelihood evaluated at
coef
.- Return type:
float
- mssm.src.python.utils.REML(llk: float, nH: csc_array, coef: ndarray, scale: float, penalties: list[LambdaTerm], keep: list[int] | None = None) float | ndarray
Based on Wood (2011). Exact REML for Gaussian GAM, Laplace approximate (Wood, 2016) for everything else. Evaluated after applying stabilizing reparameterization discussed by Wood (2011).
Important: the dimension of the output depend on the shape of
coef
. Ifcoef
is flattened, then the output will be a float. Ifcoef
is of shape (-1,1), the output will be [[float]].- References:
Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
- Parameters:
llk (float) – log-likelihood of model
nH (scp.sparse.csc_array) – negative hessian of log-likelihood of model
coef (np.ndarray) – Estimated vector of coefficients of shape (-1,1)
scale (float) – (Estimated) scale parameter - can be set to 1 for GAMLSS or GSMMs.
penalties ([LambdaTerm]) – List of penalties that were part of the model.
keep (list[int]|None, optional) – Optional List of indices corresponding to identifiable coefficients. Coefficients not in this list (not identifiable) are dropped from the negative hessian of the penalized log-likelihood. Can also be set to
None
(default) in which case all coefficients are treated as identifiable.
- Returns:
(Approximate) REML score
- Return type:
float|np.ndarray
- class mssm.src.python.utils.RhoPrior(*args, **kwargs)
Bases:
object
Base class to demonstrate the functionlaity that any prior passed to the correct_VB function has to implement.
- logpdf(rho: ndarray)
Compute log density for log smoothing penalty parameters included in rho under this prior.
- Parameters:
rho (np.ndarray) – Numpy array of shape (nR,nrho) containing nR proposed candidate vectors for the nrho log-smoothing parameters.
- mssm.src.python.utils.adjust_CI(model, n_ps: int, b: ndarray, predi_mat: csc_array, use_terms: list[int] | None, alpha: float, seed: int | None, par: int = 0) ndarray
Internal function to adjust point-wise CI to behave like whole-function interval (based on Wood, 2017; section 6.10.2 and Simpson, 2016):
model.coef +- b
gives point-wise interval, and for the interval to cover the whole-function,1-alpha
% of posterior samples should be expected to fall completely within these boundaries.From section 6.10 in Wood (2017) we have that \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V})\). \(\mathbf{V}\) is the covariance matrix of this conditional posterior, and can be obtained by evaluating
model.lvi.T @ model.lvi * model.scale
(model.scale
should be set to 1 formsssm.models.GAMMLSS
andmsssm.models.GSMM
).The implication of this result is that we can also expect the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\) to follow \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}} | \mathbf{y}, \boldsymbol{\lambda} \sim N(0,\mathbf{V})\). In line with the whole-function interval definition above,
1-alpha
% ofpredi_mat@[*coef - coef]
(where[*coef - coef]
representes the deviations \(\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}\)) should fall within[b,-b]
. Wood (2017) suggests to finda
so that[a*b,a*-b]
achieves this.To do this, we find
a
for everypredi_mat@[*coef - coef]
and then select the final one so that1-alpha
% of samples had an equal or lower one. The consequence:1-alpha
% of samples drawn should fall completely within the modified boundaries.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Simpson, G. (2016). Simultaneous intervals for smooths revisited.
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
n_ps (int) – Number of samples to obtain from posterior.
b (np.ndarray) – Ci boundary of point-wise CI.
predi_mat (scp.sparse.csc_array) – Model matrix for a particular smooth term or additive combination of parameters evaluated usually at a representative sample of predictor variables.
use_terms (list[int] | None) – The indices corresponding to the terms that should be used to obtain the prediction or
None
in which case all terms will be used.alpha (float) – The alpha level to use for the whole-function interval adjustment calculation as outlined above.
seed (int | None) – Can be used to provide a seed for the posterior sampling.
par (int, optional) – The index corresponding to the parameter of the log-likelihood for which samples are to be obtained for the coefficients, defaults to 0.
- Returns:
The adjusted vector
b
- Return type:
np.ndarray
- mssm.src.python.utils.approx_smooth_p_values(model, par: int = 0, n_sel: int = 100000.0, edf1: bool = True, force_approx: bool = False, seed: int = 0) tuple[list[float], list[float]]
Function to compute approximate p-values for smooth terms, testing whether \(\mathbf{f}=\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\) based on the algorithm by Wood (2013).
Wood (2013, 2017) generalize the \(\boldsymbol{\beta}_j^T\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\boldsymbol{\beta}_j\) test-statistic for parametric terms (computed by function
mssm.models.print_parametric_terms()
) to the coefficient vector \(\boldsymbol{\beta}_j\) parameterizing smooth functions. \(\mathbf{V}\) here is the covariance matrix of the posterior distribution for \(\boldsymbol{\beta}\) (see Wood, 2017). The idea is to replace \(\mathbf{V}_{\boldsymbol{\beta}_j}^{-1}\) with a rank \(r\) pseudo-inverse (smooth blocks in \(\mathbf{V}\) are usually rank deficient). Wood (2013, 2017) suggest to base \(r\) on the estimated degrees of freedom for the smooth term in question - but that \(r\) is usually not integer.They provide a generalization that addresses the realness of \(r\), resulting in a test statistic \(T_r\), which follows a weighted Chi-square distribution under the Null. Following the recommendation in Wood (2013) we here approximate the reference distribution under the Null by means of the computations outlined in the paper by Davies (1980). If this fails, we fall back on a Gamma distribution with \(\alpha=r/2\) and \(\phi=2\).
In case of a two-parameter distribution (i.e., estimated scale parameter \(\phi\)), the Chi-square reference distribution needs to be corrected, again resulting in a weighted chi-square distribution which should behave something like a F distribution with DoF1 = \(r\) and DoF2 = \(\epsilon_{DoF}\) (i.e., the residual degrees of freedom), which would be the reference distribution for \(T_r/r\) if \(r\) were integer and \(\mathbf{V}_{\boldsymbol{\beta}_j}\) full rank. We again follow the recommendations by Wood (2013) and rely on the methods by Davies (1980) to compute the p-value under this reference distribution. If this fails, we approximate the reference distribution for \(T_r/r\) with a Beta distribution, with \(\alpha=r/2\) and \(\beta=\epsilon_{DoF}/2\) (see Wikipedia for the specific transformation applied to \(T_r/r\) so that the resulting transformation is approximately beta distributed) - which is similar to the Gamma approximation used for the Chi-square distribution in the no-scale parameter case.
Warning: The resulting p-values are approximate. They should only be treated as indicative.
Note: Just like in
mgcv
, the returned p-value is an average: two p-values are computed because of an ambiguity in forming \(T_r\) and averaged to get the final one. For \(T_r\) we return the max of the two alternatives.- References:
Davies, R. B. (1980). Algorithm AS 155: The Distribution of a Linear Combination of χ2 Random Variables.
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N. (2013). On p-values for smooth components of an extended generalized additive model.
testStat
function in mgcv, see: https://github.com/cran/mgcv/blob/master/R/mgcv.r#L3780
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
par (int, optional) – Distribution parameter for which to compute p-values. Ignored when
model
is a GAMM. Defaults to 0n_sel (int, optional) – Maximum number of rows of model matrix. For models with more observations a random sample of
n_sel
rows is obtained. Defaults to 1e5edf1 (bool, optional) – Whether or not the estimated degrees of freedom should be corrected for smoothnes bias. Doing so results in more accurate p-values but can be expensive for large models for which the difference is anyway likely to be marginal. Defaults to True
force_approx (bool, optional) – Whether or not the p-value should be forced to be approximated based on a Gamma/Beta distribution. Only use for testing - in practice you want to keep this at
False
. Defaults to Falseseed (int, optional) – Random seed determining the random sample computation. Defaults to 0
- Returns:
Tuple conatining two lists: first list holds approximate p-values for all smooth terms, second list holds test statistic.
- Return type:
tuple[list[float],list[float]]
- mssm.src.python.utils.computeAr1Chol(formula: Formula, rho: float) tuple[csc_array, float]
Computes the inverse of the cholesky of the (scaled) variance matrix of an ar1 model.
- Parameters:
formula (Formula) – Formula of the model
rho (float) – ar1 weight.
- Returns:
Tuple, containing banded inverse Cholesky as a scipy array and the correction needed to get the likelihood of the ar1 model.
- Return type:
tuple[scp.sparse.csc_array,float]
- mssm.src.python.utils.compute_REML_candidate_GSMM(family: GAMLSSFamily | GSMMFamily, y: ndarray | list[ndarray], Xs: list[csc_array], penalties: list[LambdaTerm], coef: ndarray, n_coef: int, coef_split_idx: list[int], method: str = 'Chol', conv_tol: float = 1e-07, n_c: int = 10, bfgs_options: dict = {}, origNH: csc_array | None = None) tuple[float, csc_array, csc_array, ndarray, float, float]
Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GSMM or GAMMLSS.
Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).
See
REML()
function for more details.- Parameters:
family (GAMLSSFamily | GSMMFamily) – Model Family
y (np.ndarray | list[np.ndarray]) – Vector of observations or list of vectors (for GSMM)
Xs (list[scp.sparse.csc_array]) – List of model matrices
penalties (list[LambdaTerm]) – List of penalties
coef (np.ndarray) – Final coefficient estimate obtained from estimation - used to initialize
n_coef (int) – Number of coefficients
coef_split_idx (list[int]) – The indices at which to split the overall coefficient vector into separate lists - one per parameter.
method (str, optional) – Method to use to solve for the coefficients (lambda parameters in case this is set to ‘qEFS’), defaults to “Chol”
conv_tol (float, optional) – Tolerance, defaults to 1e-7
n_c (int, optional) – Number of cores to use, defaults to 10
bfgs_options (dict, optional) – An optional dictionary holding arguments that should be passed on to the call of
scipy.optimize.minimize()
ifmethod=='qEFS'
, defaults to {}origNH (scp.sparse.csc_array | None, optional) – Optional external hessian matrix, defaults to None
- Returns:
reml criterion,conditional covariance matrix of coefficients for this lambda, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, coefficients, total edf, llk
- Return type:
tuple[float, scp.sparse.csc_array, scp.sparse.csc_array, np.ndarray, float, float]
- mssm.src.python.utils.compute_Vb_corr_WPS(Vbr: csc_array, Vpr, Vr, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1) tuple[ndarray, ndarray]
Computes both correction terms for
Vb
or \(\mathbf{V}_{\boldsymbol{\beta}}\), which is the co-variance matrix for the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V}_{\boldsymbol{\beta}})\), described by Wood, Pya, & Säfken (2016).- References:
Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.
Vpr (np.ndarray) – A (regularized) estimate of the covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.
Vr (np.ndarray) – Transpose of root of un-regularized covariance matrix of \(\boldsymbol{\rho}\) - the log smoothing penalties.
H (scp.sparse.csc_array) – The Hessian of the log-likelihood
S_emb (scp.sparse.csc_array) – The weighted penalty matrix.
penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.
coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)
scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.
- Raises:
ArithmeticError – Will throw an error when the negative Hessian of the penalized likelihood is ill-scaled so that a Cholesky decomposition fails.
- Returns:
A tuple containing:
Vc
andVcc
.Vbr.T@Vbr*scale
+Vc
+Vcc
is then approximately the correction devised by WPS (2016).- Return type:
tuple[np.ndarray, np.ndarray]
- mssm.src.python.utils.compute_Vp_WPS(Vbr: csc_array, H: csc_array, S_emb: csc_array, penalties: list[LambdaTerm], coef: ndarray, scale: float = 1) tuple[ndarray, ndarray, ndarray, ndarray, ndarray, ndarray]
Computes the inverse of what is approximately the negative Hessian of the Laplace approximate REML criterion with respect to the log smoothing penalties.
The derivatives computed are only exact for Gaussian additive models and canonical generalized additive models. For all other models they are in-exact in that they assume that the hessian of the log-likelihood does not depend on \(\lambda\) (or \(log(\lambda)\)), so they are essentially the PQL derivatives of Wood et al. (2017). The inverse computed here acts as an approximation to the covariance matrix of the log smoothing parameters.
- References:
Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Wood, S. N., Li, Z., Shaddick, G., & Augustin, N. H. (2017). Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data.
- Parameters:
Vbr (scp.sparse.csc_array) – Transpose of root for the estimate for the (unscaled) covariance matrix of \(\boldsymbol{\beta} | y, \boldsymbol{\lambda}\) - the coefficients estimated by the model.
H (scp.sparse.csc_array) – The Hessian of the log-likelihood
S_emb (scp.sparse.csc_array) – The weighted penalty matrix.
penalties ([LambdaTerm]) – A list holding the Lambdaterms estimated for the model.
coef (np.ndarray) – An array holding the estimated regression coefficients. Has to be of shape (-1,1)
scale (float) – Any scale parameter estimated as part of the model. Can be omitted for more generic models beyond GAMMs. Defaults to 1.
- Returns:
Generalized inverse of negative hessian of approximate REML criterion, regularized version of the former, root of generalized inverse, root of regularized generalized inverse, hessian of approximate REML criterion, np.array of shape ((len(coef),len(penalties))) containing in each row the partial derivative of the coefficients with respect to an individual lambda parameter
- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]
- mssm.src.python.utils.compute_bias_corrected_edf(model, overwrite: bool = False) None
This function computes and assigns smoothing bias corrected (term-wise) estimated degrees of freedom.
For a definition of smoothing bias-corrected estimated degrees of freedom see Wood (2017).
Note: This function modifies
model
, settingedf1
andterm_edf1
attributes.- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – Model for which to compute p values.
overwrite (bool, optional) – Whether previously computed bias corrected edf should be overwritten. Otherwise this function immediately terminates if
model.edf1 is not None
, defaults to False
- Return type:
None
- mssm.src.python.utils.compute_reml_candidate_GAMM(family: Family, y: ndarray, X: csc_array, penalties: list[LambdaTerm], n_c: int = 10, offset: float | ndarray = 0, init_eta: ndarray | None = None, method: str = 'Chol', compute_inv: bool = False, origNH: float | None = None) tuple[float, csc_array | None, csc_array, list[int], ndarray, float, float, float]
Allows to evaluate REML criterion (e.g., Wood, 2011; Wood, 2016) efficiently for a set of lambda values for a GAMM model.
Internal function used for computing the correction applied to the edf for the GLRT - based on Wood (2017) and Wood et al., (2016).
See
REML()
function for more details.- Parameters:
family (Family) – Family of the model
y (np.ndarray) – vector of observations
X (scp.sparse.csc_array) – Model matrix
penalties (list[LambdaTerm]) – List of penalties
n_c (int, optional) – Number of cores to use, defaults to 10
offset (float | np.ndarray, optional) – Fixed offset to add to eta, defaults to 0
init_eta (np.ndarray | None, optional) – Initial vector for linear predictor, defaults to None
method (str, optional) – Method to use to solve for coefficients, defaults to ‘Chol’
compute_inv (bool, optional) – Whether to compute the inverse of the pivoted Cholesky of the negative hessian of the penalized llk, defaults to False
origNH (float | None, optional) – Optional external scale parameter, defaults to None
- Returns:
reml criterion, un-pivoted inverse of the pivoted Cholesky of the negative hessian of the penalized llk, pivoted Cholesky, pivot column indices, coefficients, estimated scale, total edf, llk
- Return type:
tuple[float, scp.sparse.csc_array|None, scp.sparse.csc_array, list[int], np.ndarray, float, float, float]
- mssm.src.python.utils.correct_VB(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, form_t1: bool = False, verbose: bool = False, drop_NA: bool = True, method: str = 'Chol', only_expected_edf: bool = False, Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, recompute_H: bool = False, seed: int | None = None, compute_Vcc: bool = True, **bfgs_options) tuple[csc_array | None, csc_array | None, ndarray | None, ndarray | None, ndarray | None, float | None, ndarray | None, float | None, float, ndarray]
Estimate \(\tilde{\mathbf{V}}\), the covariance matrix of the marginal posterior \(\boldsymbol{\beta} | y\) to account for smoothness uncertainty.
Wood et al. (2016) and Wood (2017) show that when basing conditional versions of model selection criteria or hypothesis tests on \(\mathbf{V}\), which is the co-variance matrix for the normal approximation to the conditional posterior of \(\boldsymbol{\beta}\) so that \(\boldsymbol{\beta} | y, \boldsymbol{\lambda} \sim N(\hat{\boldsymbol{\beta}},\mathbf{V})\), the tests are severely biased. To correct for this they show that uncertainty in \(\boldsymbol{\lambda}\) needs to be accounted for. Hence they suggest to base these tests on \(\tilde{\mathbf{V}}\), the covariance matrix of the normal approximation to the marginal posterior \(\boldsymbol{\beta} | y\). They show how to obtain an estimate of \(\tilde{\mathbf{V}}\), but this requires \(\mathbf{V}^{\boldsymbol{\rho}}\) - an estimate of the covariance matrix of the normal approximation to the posterior of \(\boldsymbol{\rho}=log(\boldsymbol{\lambda})\). Computing \(\mathbf{V}^{\boldsymbol{\rho}}\) requires derivatives that are not available when using the efs update.
This function implements multiple strategies to approximately correct for smoothing parameter uncertainty, based on the proposals by Wood et al. (2016) and Greven & Scheipl (2017). The most straightforward strategy (
grid_type = 'JJJ1'
) is to obtain a PQL or finite difference approximation for \(\mathbf{V}^{\boldsymbol{\rho}}\) and to then compute approximately the Wood et al. (2016) correction assuming that higher-order derivatives of the llk are zero (this will be exact for Gaussian additive or canonical Generalized models). This is too costly for large sparse multi-level models and not exact for more generic models. The MC based alternative available viagrid_type = 'JJJ2'
addresses the first problem (Important, set:use_importance_weights=False
andonly_expected_edf=True
.). The second MC based alternative available viagrid_type = 'JJJ3'
is most appropriate for more generic models (Theprior
argument can be used to specify any prior to be placed on \(\boldsymbol{\rho}\) also you will need to set:use_importance_weights=True
andonly_expected_edf=False
). Both strategies use a PQL or finite difference approximation to \(\mathbf{V}^{\boldsymbol{\rho}}\) to obtainnR
samples from the (normal approximation) to the posterior of \(\boldsymbol{\rho}\). From these samples mssm then estimates \(\tilde{\mathbf{V}}\) as described in more detail by Krause et al. (in preparation).Note: If you set
only_expected_edf=True
, only the last two output arguments will be non-zero.Example:
# Simulate some data for a Gaussian model sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21) # Now fit nested models sim_fit_formula = Formula(lhs("y"), [i(), f(["x0"],nk=20), f(["x1"],nk=20), f(["x2"],nk=20), f(["x3"],nk=20)], data=sim_fit_dat, print_warn=False) model = GAMM(sim_fit_formula,Gaussian()) model.fit(exclude_lambda=False,progress_bar=False,max_outer=100) # Compute correction from Wood et al. (2016) - will be approximate for more generic models # V will be approximate covariance matrix of marginal posterior of coefficients # LV is Cholesky of the former # Vp is approximate covariance matrix of log regularization parameters # Vpr is regularized version of the former # edf is vector of estimated degrees of freedom (uncertainty corrected) per coefficient # total_edf is sum of former (but subjected to upper bounds so might not be exactly the same) # ed2 is optionally smoothness bias corrected version of edf # total_edf2 is optionally bias corrected version of total_edf (subjected to upper bounds) # expected_edf is None here but for MC strategies (i.e., ``grid!=1``) will be an estimate # of total_edf (**without being subjected to upper bounds**) that does not require forming # V (only computed when ``only_expected_edf=True``). # mean_coef is None here but for MC strategies will be an estimate of the mean of the # marginal posterior of coefficients, only computed when setting ``recompute_H=True`` V,LV,Vp,Vpr,edf,total_edf,edf2,total_edf2,expected_edf,mean_coef = correct_VB(model, grid_type="JJJ1", verbose=True, seed=20) # Compute MC estimate for generic model and given prior prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior V_MC,LV_MC,Vp_MC,Vpr_MC,edf_MC, total_edf_MC,edf2_MC,total_edf2_MC,expected_edf_MC,mean_coef_MC = correct_VB(model2, grid_type="JJJ3", verbose=True, seed=20, df=10, prior=prior, recompute_H=True)
- References:
Wood, S. N., (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.
Wood, S. N., Pya, N., Saefken, B., (2016). Smoothing Parameter and Model Selection for General Smooth Models
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)
nR (int, optional) – In case
grid!="JJJ1"
,nR
samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250grid_type (str, optional) – How to compute the smoothness uncertainty correction - see above for details, defaults to ‘JJJ1’
a (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to samplenR
candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimateb (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to samplenR
candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimatedf (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to
np.inf
means a multivariate normal is used for sampling, defaults to 40n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 10
form_t1 (bool, optional) – Whether or not the smoothness uncertainty + smoothness bias corrected edf should be computed, defaults to False
verbose (bool, optional) – Whether to print progress information or not, defaults to False
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to
'qEFS'
, then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.only_expected_edf (bool,optional) – Whether to compute edf. by explicitly forming covariance matrix (
only_expected_edf=False
) or not. The latter is much more efficient for sparse models at the cost of access to the covariance matrix and the ability to compute an upper bound on the smoothness uncertainty corrected edf. Only makes sense whengrid_type!='JJJ1'
. Defaults to FalseVp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when
grid_type != 'JJJ1'
or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a
.logpdf()
method to compute the prior log density of a sampled candidate. If this is set toNone
, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored whenuse_importance_weights=False
. Defaults to Nonerecompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False
compute_Vcc (bool, optional) – Whether to compute the second correction term when strategy=’JJJ1’ (or when computing the lower-bound for the remaining strategies) or only the first one. In contrast to the second one, the first correction term is substantially cheaper to compute - so setting this to False for larger models will speed up the correction considerably. Defaults to True
seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None
bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of
scipy.optimize.minimize()
. If none are provided, thegtol
argument will be initialized to 1e-3. Note also, that in any case themaxiter
argument is automatically set to 100. Defaults to None.
- Returns:
A tuple containing:
V
- an estimate of the unconditional covariance matrix,LV
- the Cholesky of the former,Vp
- an estimate of the covariance matrix for \(\boldsymbol{\rho}\),Vpr
- a regularized version of the former,edf
- smoothness uncertainty corrected coefficient-wise edf,total_edf
- smoothness uncertainty corrected total (i.e., model) edf,edf2
- smoothness uncertainty + smoothness bias corrected coefficient-wise edf,total_edf2
- smoothness uncertainty + smoothness bias corrected total (i.e., model) edf,expected_edf
- an optional estimate of total_edf that does not require formingV
,mean_coef
- an optional estimate of the mean of the posterior of the coefficients- Return type:
tuple[scp.sparse.csc_array|None, scp.sparse.csc_array|None, np.ndarray|None ,np.ndarray|None, np.ndarray|None, float|None, np.ndarray|None, float|None, float, np.ndarray]
- mssm.src.python.utils.estimateVp(model, nR: int = 250, grid_type: str = 'JJJ1', a: float = 1e-07, b: float = 10000000.0, df: int = 40, n_c: int = 10, drop_NA: bool = True, method: str = 'Chol', Vp_fidiff: bool = False, use_importance_weights: bool = True, prior: Callable | None = None, seed: int | None = None, **bfgs_options) tuple[ndarray, ndarray, ndarray, ndarray, ndarray]
Estimate covariance matrix \(\mathbf{V}^{\boldsymbol{\rho}}\) of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\).
Either \(\mathbf{V}^{\boldsymbol{\rho}}\) is based on finite difference approximation or on a PQL approximation (see
grid_type
parameter), or it is estimated via numerical integration similar to what is done in thecorrect_VB()
function (this is done whengrid_type=='JJJ2'
; see the aforementioned function for details).Example:
# Simulate some data for a Gaussian model sim_fit_dat = sim3(n=500,scale=2,c=1,family=Gaussian(),seed=21) # Now fit nested models sim_fit_formula = Formula(lhs("y"), [i(),f(["x0"],nk=20,rp=0),f(["x1"],nk=20,rp=0),f(["x2"],nk=20,rp=0),f(["x3"],nk=20,rp=0)], data=sim_fit_dat, print_warn=False) model = GAMM(sim_fit_formula,Gaussian()) model.fit(exclude_lambda=False,progress_bar=False,max_outer=100) # Compute correction from Wood et al. (2016) - will be approximate for more generic models # Vp is approximate covariance matrix of log regularization parameters # Vpr is regularized version of the former # Ri is a root of covariance matrix of log regularization parameters # Rir is a root of regularized version of covariance matrix of log regularization parameters # ep will be an estimate of the mean of the marginal posterior of log regularization parameters (for ``grid_type="JJJ1"`` this will simply be the log of the estimated regularization parameters) Vp, Vpr, Ri, Rir, ep = estimateVp(model,grid_type="JJJ1",verbose=True,seed=20) # Compute MC estimate for generic model and given prior prior = DummyRhoPrior(b=np.log(1e12)) # Set up uniform prior Vp_MC, Vpr_MC, Ri_MC, Rir_MC, ep_MC = estimateVp(model,strategy="JJJ2",verbose=True,seed=20,use_importance_weights=True,prior=prior)
- References:
https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GAMM, GAMMLSS, or GSMM model (which has been fitted) for which to estimate \(\mathbf{V}\)
nR (int, optional) – In case
grid!="JJJ1"
,nR
samples/reml scores are generated/computed to numerically evaluate the expectations necessary for the uncertainty correction, defaults to 250grid_type (str, optional) – How to compute the smoothness uncertainty correction. Setting
grid_type="JJJ1"
means a PQL or finite difference approximation is obtained. Settinggrid_type="JJJ2"
means numerical integration is performed - seecorrect_VB()
for details , defaults to ‘JJJ1’a (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to samplenR
candidates) which are smaller than this are set to this value as well, defaults to 1e-7 the minimum possible estimateb (float, optional) – Any of the \(\lambda\) estimates obtained from
model
(used to define the mean for the posterior of \(\boldsymbol{\rho}|y \sim N(log(\hat{\boldsymbol{\rho}}),\mathbf{V}^{\boldsymbol{\rho}})\) used to samplenR
candidates) which are larger than this are set to this value as well, defaults to 1e7 the maximum possible estimatedf (int, optional) – Degrees of freedom used for the multivariate t distribution used to sample the next set of candidates. Setting this to
np.inf
means a multivariate normal is used for sampling, defaults to 40n_c (int, optional) – Number of cores to use during parallel parts of the correction, defaults to 10
drop_NA (bool,optional) – Whether to drop rows in the model matrices corresponding to NAs in the dependent variable vector. Defaults to True.
method (str,optional) – Which method to use to solve for the coefficients (and smoothing parameters). The default (“Chol”) relies on Cholesky decomposition. This is extremely efficient but in principle less stable, numerically speaking. For a maximum of numerical stability set this to “QR/Chol”. In that case a QR decomposition is used - which is first pivoted to maximize sparsity in the resulting decomposition but also pivots for stability in order to get an estimate of rank defficiency. A Cholesky is than used using the combined pivoting strategy obtained from the QR. This takes substantially longer. If this is set to
'qEFS'
, then the coefficients are estimated via quasi netwon and the smoothing penalties are estimated from the quasi newton approximation to the hessian. This only requieres first derviative information. Defaults to “Chol”.Vp_fidiff (bool,optional) – Whether to rely on a finite difference approximation to compute \(\mathbf{V}^{\boldsymbol{\rho}}\) or on a PQL approximation. The latter is exact for Gaussian and canonical GAMs and far cheaper if many penalties are to be estimated. Defaults to False (PQL approximation)
use_importance_weights (bool,optional) – Whether to rely importance weights to compute the numerical integration when
grid_type != 'JJJ1'
or on the log-densities of \(\mathbf{V}^{\boldsymbol{\rho}}\) - the latter assumes that the unconditional posterior is normal. Defaults to True (Importance weights are used)prior (Callable|None, optional) – An (optional) instance of an arbitrary class that has a
.logpdf()
method to compute the prior log density of a sampled candidate. If this is set toNone
, the prior is assumed to coincide with the proposal distribution, simplifying the importance weight computation. Ignored whenuse_importance_weights=False
. Defaults to Nonerecompute_H (bool, optional) – Whether or not to re-compute the Hessian of the log-likelihood at an estimate of the mean of the Bayesian posterior \(\boldsymbol{\beta}|y\) before computing the (uncertainty/bias corrected) edf. Defaults to False
seed (int|None,optional) – Seed to use for random parts of the correction. Defaults to None
bfgs_options (key=value,optional) – Any additional keyword arguments that should be passed on to the call of
scipy.optimize.minimize
. If none are provided, thegtol
argument will be initialized to 1e-3. Note also, that in any case themaxiter
argument is automatically set to 100. Defaults to None.
- Returns:
A tuple with 5 elements: an estimate of the covariance matrix of the posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\), a regularized version of the former, a root of the covariance matrix, a root of the regularized covariance matrix, and an estimate of the mean of the posterior
- Return type:
tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray]
- mssm.src.python.utils.print_parametric_terms(model, par: int = 0) None
Prints summary output for linear/parametric terms in the model of a specific parameter, not unlike the one returned in R when using the
summary
function formgcv
models.If the model has not been estimated yet, it prints the term names instead.
For each coefficient, the named identifier and estimated value are returned. In addition, for each coefficient a p-value is returned, testing the null-hypothesis that the corresponding coefficient \(\beta=0\). Under the assumption that this is true, the Null distribution follows a t-distribution for models in which an additional scale parameter was estimated (e.g., Gaussian, Gamma) and a standardized normal distribution for models in which the scale parameter is known or was fixed (e.g., Binomial). For the former case, the t-statistic, Degrees of freedom of the Null distribution (DoF.), and the p-value are printed as well. For the latter case, only the z-statistic and the p-value are printed. See Wood (2017) section 6.12 and 1.3.3 for more details.
Note that, un-penalized coefficients that are part of a smooth function are not covered by this function.
- References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GSMM, GAMMLSS, or GAMM model
par (int, optional) – Parameter of the likelihood/family for which to print terms, defaults to 0
- Raises:
NotImplementedError – Will throw an error when called for a model for which the model matrix was never former completely.
- Return type:
None
- mssm.src.python.utils.print_smooth_terms(model, par: int = 0, pen_cutoff: float = 0.2, ps: list[float] | None = None, Trs: list[float] | None = None) None
Prints the name of the smooth terms included in the model of a given parameter.
After fitting, the estimated degrees of freedom per term are printed as well. Smooth terms with edf. <
pen_cutoff
will be highlighted. This only makes sense when extra Kernel penalties are placed on smooth terms to enable penalizing them to a constant zero. In that case edf. <pen_cutoff
can then be taken as evidence that the smooth has all but notationally disappeared from the model, i.e., it does not contribute meaningfully to the model fit. This can be used as an alternative form of model selection - see Marra & Wood (2011).- References:
Marra & Wood (2011). Practical variable selection for generalized additive models.
- Parameters:
model (mssm.models.GSMM | mssm.models.GAMMLSS | mssm.models.GAMM) – GSMM, GAMMLSS, or GAMM model
par (int, optional) – Distribution parameter for which to compute p-values. Ignored when
model
is a GAMM. Defaults to 0pen_cutoff (float, optional) – At which edf. cut-off smooth terms should be marked as “effectively removed”, defaults to None
ps ([float], optional) – Optional list of p-values per smooth term if these should be printed, defaults to None
Trs ([float], optional) – Optional list of test statistics (based on which the
ps
were computed) per smooth term if these should be printed, defaults to None
- Return type:
None
- mssm.src.python.utils.sample_MVN(n: int, mu: int | ndarray, scale: float, P: csc_array | None, L: csc_array | None, LI: csc_array | None = None, use: list[int] | None = None, seed: int | None = None) ndarray
Draw
n
samples from multivariate normal with mean \(\boldsymbol{\mu}\) (mu
) and covariance matrix \(\boldsymbol{\Sigma}\).\(\boldsymbol{\Sigma}\) does not need to be provided. Rather the function expects either
L
(\(\mathbf{L}\) in what follows) orLI
(\(\mathbf{L}^{-1}\) in what follows) andscale
(\(\phi\) in what follows). These relate to \(\boldsymbol{\Sigma}\) so that \(\boldsymbol{\Sigma}/\phi = \mathbf{L}^{-T}\mathbf{L}^{-1}\) or \(\mathbf{L}\mathbf{L}^T = [\boldsymbol{\Sigma}/\phi]^{-1}\) so that \(\mathbf{L}*(1/\phi)^{0.5}\) is the Cholesky of the precision matrix of \(\boldsymbol{\Sigma}\).Notably, for models available in
mssm
L
(andLI
) have usually be computed for a permuted matrix, e.g., \(\mathbf{P}[\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]\mathbf{P}^T\) (see Wood & Fasiolo, 2017). Hence for sampling we often need to correct for permutation matrix \(\mathbf{P}\) (P
). ifLI
is provided, thenP
can be omitted and is assumed to have been used to un-pivotLI
already.Used for example sample the uncorrected posterior \(\boldsymbol{\beta} | \mathbf{y}, \boldsymbol{\lambda} \sim N(\boldsymbol{\mu} = \hat{\boldsymbol{\beta}},[\mathbf{X}^T\mathbf{X} + \mathbf{S}_{\lambda}]^{-1}\phi)\) for a GAMM (see Wood, 2017). Based on section 7.4 in Gentle (2009), assuming \(\boldsymbol{\Sigma}\) is \(p*p\) and covariance matrix of uncorrected posterior, samples \(\boldsymbol{\beta}\) are then obtained by computing:
\[\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + [\mathbf{P}^T \mathbf{L}^{-T}*\phi^{0.5}]\mathbf{z}\ \text{where}\ z_i \sim N(0,1)\ \forall i = 1,...,p\]Alternatively, relying on the fact of equivalence that:
\[[\mathbf{L}^T*(1/\phi)^{0.5}]\mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}] = \mathbf{z}\]we can first solve for \(\mathbf{y}\) in:
\[[\mathbf{L}^T*(1/\phi)^{0.5}] \mathbf{y} = \mathbf{z}\]followed by computing:
\[ \begin{align}\begin{aligned}\mathbf{y} = \mathbf{P}[\boldsymbol{\beta} - \hat{\boldsymbol{\beta}}]\\\boldsymbol{\beta} = \hat{\boldsymbol{\beta}} + \mathbf{P}^T\mathbf{y}\end{aligned}\end{align} \]The latter avoids forming \(\mathbf{L}^{-1}\) (which unlike \(\mathbf{L}\) might not benefit from the sparsity preserving permutation \(\mathbf{P}\)). If
LI is None
,L
will thus be used for sampling as outlined in these alternative steps.Often we care only about a handfull of elements in
mu
(e.g., the first ones corresponding to “fixed effects’” in a GAMM). In that case we can generate samles only for this sub-set of interest by only using a sub-block of rows of \(\mathbf{L}\) or \(\mathbf{L}^{-1}\) (all columns remain). Argumentuse
can be anp.array
containg the indices of elements inmu
that should be sampled. Because this only works efficiently whenLI
is available an error is raised whennot use is None and LI is None
.If
mu
is set to any integer (i.e., not a Numpy array/list) it is automatically treated as 0. Formssm.models.GAMMLSS
ormssm.models.GSMM
models,scale
can be set to 1.References:
Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition (2nd ed.).
Gentle, J. (2009). Computational Statistics.
- Parameters:
n (int) – Number of samples to generate
mu (int | np.ndarray) – mean of normal distribution as described above
scale (float) – scaling parameter of covariance matrix as described above
P (scp.sparse.csc_array | None) – Permutation matrix or None.
L (scp.sparse.csc_array | None) – Cholesky of precision of scaled covariance matrix as described above.
LI (scp.sparse.csc_array | None, optional) – Inverse of cholesky factor of precision of scaled covariance matrix as described above.
use (list[int] | None, optional) – Indices of parameters in
mu
for which to generate samples, defaults to None in which case all parameters will be sampledseed (int | None, optional) – Seed to use for random sample generation, defaults to None
- Returns:
Samples from multi-variate normal distribution. In case
use
is not provided, the returned array will be of shape(p,n)
wherep==LI.shape[1]
. Otherwise, the returned array will be of shape(len(use),n)
.- Return type:
np.ndarray
- mssm.src.python.utils.updateVp(ep: ndarray, ws: ndarray, rGrid: ndarray) ndarray
Update covariance matrix of posterior for \(\boldsymbol{\rho} = log(\boldsymbol{\lambda})\). REML scores are used to approximate expectation, similar to what was suggested by Greven & Scheipl (2016).
- References:
https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices
Greven, S., & Scheipl, F. (2016). Comment on: Smoothing Parameter and Model Selection for General Smooth Models
- Parameters:
ep (np.ndarray) – Model estimate log(lambda), i.e., the expectation over rGrid
ws (np.ndarray) – weight associated with each log(lambda) value used for numerical integration
rGrid (np.ndarray) – A 2d array, holding all lambda samples considered so far. Each row is one sample
- Returns:
An estimate of the covariance matrix of log(lambda) - 2d array of shape len(mp)*len(mp).
- Return type:
np.ndarray