Programmer’s Guide
Extending the functionality of mssm is quite easy. Adding new families generally only requires implementing a specific base class and this is also the case for
penalty matrices. Implementing a new smooth basis can be achieved by adding a single function. Below we discuss these steps and provide code examples, outlining how
each of them can be achieved.
Adding a New Family
Below we list the families already implemented, which can be estimated via the GAMM class:
Gaussian (
mssm.src.python.exp_fam.Gaussian())Gamma (
mssm.src.python.exp_fam.Gamma())Inverse Gaussian (
mssm.src.python.exp_fam.InvGauss())Binomial (
mssm.src.python.exp_fam.Binomial())Poisson (
mssm.src.python.exp_fam.Poisson())
Now, if you want to add a new Family, your new family will have to implement the mssm.src.python.exp_fam.Family() base or template class. The documentation of the latter
provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!
Adding a New ExtendedFamily
Below we list the extended families already implemented, which can be estimated via the GAMM class:
scaled T (
mssm.src.python.exp_fam.ScaledT())
Now, if you want to add a new ExtendedFamily, your new family will have to implement the
mssm.src.python.exp_fam.ExtendedFamily() base or template class. This class actually
inherits from the mssm.src.python.exp_fam.Family() class and requires only a few additional
methods that are necessary to estimate any additional parameters of the log-likelihood of the
family. Again, the documentation provides details on all the methods your family will have to
implement.
Adding a New GAMLSSFamily
Below we list the GAMMLSS families already implemented, which can be estimated via the GAMMLSS class:
Gaussian models of location and scale (
mssm.src.python.exp_fam.GAUMLSS())Gamma models of location and scale (
mssm.src.python.exp_fam.GAMMALS())Multinomial models (
mssm.src.python.exp_fam.MULNOMLSS())
Now, if you want to add a new GAMLSSFamily, your new family will have to implement the mssm.src.python.exp_fam.GAMLSSFamily() base or template class. The documentation of the latter
provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!
Adding a New GSMMFamily
Below we list the GSMM families already implemented, which can be estimated via the GSMM class:
Cox proportional Hazard models (
mssm.src.python.exp_fam.PropHaz())Multivariate additive models (
mssm.src.python.exp_fam.MultiGauss())
To implement a new member of the most general kind of smooth model, you will only need to implement the mssm.src.python.exp_fam.GSMMFamily() template class - mssm even supports completely derivative-free estimation.
You can check the mssm.models.GSMM documentation for an example or the tutorial included with this documentation - it contains step-by-step instructions on how to implement this family.
Adding a New Marginal Smooth Basis
Adding a new marginal smooth basis only requires adding a single new function. This function interacts with mssm at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f class, the
mssm.src.python.terms.fs class, or the mssm.src.python.terms.irf class. Each of those takes a basis keyword argument, accepting a Callable - so a function. By default the basis argument
is set to basis=mssm.src.python.smooths.B_spline_basis - i.e., it receives the mssm.src.python.smooths.B_spline_basis() function as argument and calls it whenever the basis needs to be evaluated.
Speaking of evaluation. Every basis function needs to accept a couple of mandatory function arguments. Specifically, the function header needs to look something like this:
def my_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, **kwargs) -> np.ndarray:
Let’s take a look at those mandatory arguments:
cov: This is set to the flattened covariate numpy array (i.e., of shape (-1,)) bymssm.event_onset: This is an integer orNone. If it’s an integer, it reflects the sample on which to place a dirac delta with which the bases should be convolved - this is required if your basis is to work with themssm.src.python.terms.irfclass. Themssm.src.python.terms.fandmssm.src.python.terms.fsclasses always pass None to this argument.nk: This is an integer corresponding to the number of basis functions to create.min_c: This is the minimum covariate value, as a float, passed along by themssm.src.python.terms.fandmssm.src.python.terms.fsclasses. Themssm.src.python.terms.irfclass first checks if this argument is specified inbasis_kwargs(more on that in a minute) and if so, passes along the value specified there.max_c: Maximum covariate value, handled exactly likemin_c.
You can also set up your basis function to accept optional keyword arguments. For example, the function header of the default B-spline basis looks like this:
B_spline_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, drop_outer_k:bool=False, convolve:bool=False, deg:int=3) -> np.ndarray:
deg here for example corresponds to the degree of the B-spline basis that should be created. How do these extra arguments get passed to the basis function?
The mssm.src.python.terms.f, mssm.src.python.terms.fs, and mssm.src.python.terms.irf classes all accept a dictionary for the basis_kwargs argument which can be filled with values for the
optional extra arguments. For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f) that relies on a B-spline basis of deg=3:
f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3})
Now, let’s talk about the expected output from a basis function. The function header definition above informs us that a basis function should return a np.ndarray. Specifically, it needs to be a two-dimensional
numpy array. The first dimension needs to be of the same length as cov (the covariate over which to evaluate the function) and the second dimension needs to be of length nk. In other words, the output array
needs to hold in each column one of the nk basis functions, each evaluated over all values in cov.
Adding a New Penalty Matrix
To create a new type of Penalty matrix, you need to implement the mssm.src.python.penalties.Penalty class. Currently, mssm supports the following implementations: mssm.src.python.penalties.IdentityPenalty and
mssm.src.python.penalties.DifferencePenalty. This penalty class interacts with mssm at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f class, the
mssm.src.python.terms.fs class, or the mssm.src.python.terms.irf class. Each of those takes a penalty keyword argument, accepting a list of instances of the mssm.src.python.penalties.Penalty class (a list because
terms might have more than one penalty applied to them).
The implementation of the mssm.src.python.penalties.Penalty class is quite simple, and looks like this:
class Penalty:
def __init__(self,pen_type:PenType) -> None:
self.type = pen_type
def constructor(self,n:int,constraint:Constraint|None,*args,**kwargs) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:
pass
The __init__ method receives only a single argument, a mssm.src.python.custom_types.PenType (see the documentation for supported values). For example, the pen_type of the mssm.src.python.penalties.DifferencePenalty is simply set
to PenType.DIFFERENCE, while the __init__ method of mssm.src.python.penalties.DifferencePenalty accepts both PenType.IDENTITY and PenType.DISTANCE. If you want to implement a derivative-based penalty, there is a PenType for that: PenType.DERIVATIVE.
The actual construction of the penalty matrix is then handled by the constructor method. This is were you will want to implement the code that sets up your penalty matrix. The method receives two mandatory arguments:
n: An integer, corresponding to the dimension of the the square penalty matrixconstraint: Any constraint to absorb by the penalty orNoneif no constraint is required. If this argument is notNone, it will be an instance of themssm.src.python.custom_types.Constraintclass, which holds all the information you need to absorb the constraint into the penalty (see the documentation).
Your constructor method can also accept additional arguments and key-word arguments. For example, the method header of the constructor method of the mssm.src.python.penalties.DifferencePenalty class looks like this:
constructor(self, n:int, constraint:Constraint|None, m:int=2) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:
m here corresponds to the order of the difference penalty. How do these extra arguments get passed to the basis function?
The mssm.src.python.terms.f, mssm.src.python.terms.fs, and mssm.src.python.terms.irf classes all accept a list of dictionaries for the pen_kwargs argument - one for each penalty included in the list passed to the terms penalty argument.
Each dictionary can be filled with values for the optional extra arguments that should be passed to the constructor method of the corresponding instance of the mssm.src.python.penalties.Penalty class.
For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f) that relies on a B-spline basis of deg=3 and is subjected to a single difference penalty (i.e., an instance of mssm.src.python.penalties.DifferencePenalty)
of order m=2:
f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3},
penalty=[DifferencePenalty()],pen_kwargs=[{"m":2}])
Now, let’s talk about the expected output from the constructor method. Generally, this method needs to return the penalty matrix, a (matrix) root of the penalty matrix, and the rank of the penalty matrix. The rank is simply returned as an integer, but matters are slightly more
complicated for the two matrices: each of those needs to be returned in form of three lists. The first needs to hold non-zero values of the (root of the) penalty, the second needs to hold the row indices of these non-zero values, and the third needs to hold the column indices of these non-zero values. Hence,
the constructor needs to return six lists in total and an integer. The first three lists need to belong to the penalty matrix, while the last three lists need to belong to the penalty matrix root.