Programmer’s Guide

Extending the functionality of mssm is quite easy. Adding new families generally only requires implementing a specific base class and this is also the case for penalty matrices. Implementing a new smooth basis can be achieved by adding a single function. Below we discuss these steps and provide code examples, outlining how each of them can be achieved.

Adding a New `Family`

Below we list the families already implemented, which can be estimated via the GAMM class:

Gaussian (mssm.src.python.exp_fam.Gaussian())
Gamma (mssm.src.python.exp_fam.Gamma())
Inverse Gaussian (mssm.src.python.exp_fam.InvGauss())
Binomial (mssm.src.python.exp_fam.Binomial())
Poisson (mssm.src.python.exp_fam.Poisson())

Now, if you want to add a new Family, your new family will have to implement the mssm.src.python.exp_fam.Family() base or template class. The documentation of the latter provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!

Adding a New `ExtendedFamily`

Below we list the extended families already implemented, which can be estimated via the GAMM class:

scaled T (mssm.src.python.exp_fam.ScaledT())

Now, if you want to add a new ExtendedFamily, your new family will have to implement the mssm.src.python.exp_fam.ExtendedFamily() base or template class. This class actually inherits from the mssm.src.python.exp_fam.Family() class and requires only a few additional methods that are necessary to estimate any additional parameters of the log-likelihood of the family. Again, the documentation provides details on all the methods your family will have to implement.

Adding a New `GAMLSSFamily`

Below we list the GAMMLSS families already implemented, which can be estimated via the GAMMLSS class:

Gaussian models of location and scale (mssm.src.python.exp_fam.GAUMLSS())
Gamma models of location and scale (mssm.src.python.exp_fam.GAMMALS())
Multinomial models (mssm.src.python.exp_fam.MULNOMLSS())

Now, if you want to add a new GAMLSSFamily, your new family will have to implement the mssm.src.python.exp_fam.GAMLSSFamily() base or template class. The documentation of the latter provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!

Adding a New `GSMMFamily`

Below we list the GSMM families already implemented, which can be estimated via the GSMM class:

Cox proportional Hazard models (mssm.src.python.exp_fam.PropHaz())
Multivariate additive models (mssm.src.python.exp_fam.MultiGauss())

To implement a new member of the most general kind of smooth model, you will only need to implement the mssm.src.python.exp_fam.GSMMFamily() template class - mssm even supports completely derivative-free estimation. You can check the mssm.models.GSMM documentation for an example or the tutorial included with this documentation - it contains step-by-step instructions on how to implement this family.

Adding a New Marginal Smooth Basis

Adding a new marginal smooth basis only requires adding a single new function. This function interacts with mssm at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f class, the mssm.src.python.terms.fs class, or the mssm.src.python.terms.irf class. Each of those takes a basis keyword argument, accepting a Callable - so a function. By default the basis argument is set to basis=mssm.src.python.smooths.B_spline_basis - i.e., it receives the mssm.src.python.smooths.B_spline_basis() function as argument and calls it whenever the basis needs to be evaluated.

Speaking of evaluation. Every basis function needs to accept a couple of mandatory function arguments. Specifically, the function header needs to look something like this:

def my_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, **kwargs) -> np.ndarray:

Let’s take a look at those mandatory arguments:

cov: This is set to the flattened covariate numpy array (i.e., of shape (-1,)) by mssm.
event_onset: This is an integer or None. If it’s an integer, it reflects the sample on which to place a dirac delta with which the bases should be convolved - this is required if your basis is to work with the mssm.src.python.terms.irf class. The mssm.src.python.terms.f and mssm.src.python.terms.fs classes always pass None to this argument.
nk: This is an integer corresponding to the number of basis functions to create.
min_c: This is the minimum covariate value, as a float, passed along by the mssm.src.python.terms.f and mssm.src.python.terms.fs classes. The mssm.src.python.terms.irf class first checks if this argument is specified in basis_kwargs (more on that in a minute) and if so, passes along the value specified there.
max_c: Maximum covariate value, handled exactly like min_c.

You can also set up your basis function to accept optional keyword arguments. For example, the function header of the default B-spline basis looks like this:

B_spline_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, drop_outer_k:bool=False, convolve:bool=False, deg:int=3) -> np.ndarray:

deg here for example corresponds to the degree of the B-spline basis that should be created. How do these extra arguments get passed to the basis function? The mssm.src.python.terms.f, mssm.src.python.terms.fs, and mssm.src.python.terms.irf classes all accept a dictionary for the basis_kwargs argument which can be filled with values for the optional extra arguments. For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f) that relies on a B-spline basis of deg=3:

f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3})

Now, let’s talk about the expected output from a basis function. The function header definition above informs us that a basis function should return a np.ndarray. Specifically, it needs to be a two-dimensional numpy array. The first dimension needs to be of the same length as cov (the covariate over which to evaluate the function) and the second dimension needs to be of length nk. In other words, the output array needs to hold in each column one of the nk basis functions, each evaluated over all values in cov.

Adding a New Penalty Matrix

To create a new type of Penalty matrix, you need to implement the mssm.src.python.penalties.Penalty class. Currently, mssm supports the following implementations: mssm.src.python.penalties.IdentityPenalty and mssm.src.python.penalties.DifferencePenalty. This penalty class interacts with mssm at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f class, the mssm.src.python.terms.fs class, or the mssm.src.python.terms.irf class. Each of those takes a penalty keyword argument, accepting a list of instances of the mssm.src.python.penalties.Penalty class (a list because terms might have more than one penalty applied to them).

The implementation of the mssm.src.python.penalties.Penalty class is quite simple, and looks like this:

class Penalty:

    def __init__(self,pen_type:PenType) -> None:
        self.type = pen_type

    def constructor(self,n:int,constraint:Constraint|None,*args,**kwargs) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:
        pass

The __init__ method receives only a single argument, a mssm.src.python.custom_types.PenType (see the documentation for supported values). For example, the pen_type of the mssm.src.python.penalties.DifferencePenalty is simply set to PenType.DIFFERENCE, while the __init__ method of mssm.src.python.penalties.DifferencePenalty accepts both PenType.IDENTITY and PenType.DISTANCE. If you want to implement a derivative-based penalty, there is a PenType for that: PenType.DERIVATIVE.

The actual construction of the penalty matrix is then handled by the constructor method. This is were you will want to implement the code that sets up your penalty matrix. The method receives two mandatory arguments:

n: An integer, corresponding to the dimension of the the square penalty matrix
constraint: Any constraint to absorb by the penalty or None if no constraint is required. If this argument is not None, it will be an instance of the mssm.src.python.custom_types.Constraint class, which holds all the information you need to absorb the constraint into the penalty (see the documentation).

Your constructor method can also accept additional arguments and key-word arguments. For example, the method header of the constructor method of the mssm.src.python.penalties.DifferencePenalty class looks like this:

constructor(self, n:int, constraint:Constraint|None, m:int=2) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:

m here corresponds to the order of the difference penalty. How do these extra arguments get passed to the basis function? The mssm.src.python.terms.f, mssm.src.python.terms.fs, and mssm.src.python.terms.irf classes all accept a list of dictionaries for the pen_kwargs argument - one for each penalty included in the list passed to the terms penalty argument. Each dictionary can be filled with values for the optional extra arguments that should be passed to the constructor method of the corresponding instance of the mssm.src.python.penalties.Penalty class. For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f) that relies on a B-spline basis of deg=3 and is subjected to a single difference penalty (i.e., an instance of mssm.src.python.penalties.DifferencePenalty) of order m=2:

f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3},
  penalty=[DifferencePenalty()],pen_kwargs=[{"m":2}])

Now, let’s talk about the expected output from the constructor method. Generally, this method needs to return the penalty matrix, a (matrix) root of the penalty matrix, and the rank of the penalty matrix. The rank is simply returned as an integer, but matters are slightly more complicated for the two matrices: each of those needs to be returned in form of three lists. The first needs to hold non-zero values of the (root of the) penalty, the second needs to hold the row indices of these non-zero values, and the third needs to hold the column indices of these non-zero values. Hence, the constructor needs to return six lists in total and an integer. The first three lists need to belong to the penalty matrix, while the last three lists need to belong to the penalty matrix root.

Programmer’s Guide

Adding a New Family

Adding a New ExtendedFamily

Adding a New GAMLSSFamily

Adding a New GSMMFamily