Programmer’s Guide
Extending the functionality of mssm
is quite easy. Adding new families generally only requires implementing a specific base class and this is also the case for
penalty matrices. Implementing a new smooth basis can be achieved by adding a single function. Below we discuss these steps and provide code examples, outlining how
each of them can be achieved.
Adding a New Family
Below we list the families already implemented, which can be estimated via the GAMM class:
Gaussian (G)AMMs (
mssm.src.python.exp_fam.Gaussian()
)Gamma GAMMs (
mssm.src.python.exp_fam.Gamma()
)Inverse Gaussian GAMMs (
mssm.src.python.exp_fam.InvGauss()
)Binomial GAMMs (
mssm.src.python.exp_fam.Binomial()
)Poisson GAMMs (
mssm.src.python.exp_fam.Poisson()
)
Now, if you want to add a new Family
, your new family will have to implement the mssm.src.python.exp_fam.Family()
base or template class. The documentation of the latter
provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!
Adding a New GAMLSSFamily
Below we list the GAMMLSS families already implemented, which can be estimated via the GAMMLSS class:
Gaussian models of location and scale (
mssm.src.python.exp_fam.GAUMLSS()
)Gamma models of location and scale (
mssm.src.python.exp_fam.GAMMALS()
)Multinomial models (
mssm.src.python.exp_fam.MULNOMLSS()
)
Now, if you want to add a new GAMLSSFamily
, your new family will have to implement the mssm.src.python.exp_fam.GAMLSSFamily()
base or template class. The documentation of the latter
provides details on all the methods your family will have to implement. You can also check the source code of the Families already implemented to get examples!
Adding a New GSMMFamily
Below we list the GSMM families already implemented, which can be estimated via the GSMM class:
Cox proportional Hazard models (
mssm.src.python.exp_fam.PropHaz()
)
To implement a new member of the most general kind of smooth model, you will only need to implement the mssm.src.python.exp_fam.GSMMFamily()
template class - mssm
even supports completely derivative-free estimation.
You can check the mssm.models.GSMM
documentation for an example or the tutorial included with this documentation - it contains step-by-step instructions on how to implement this family.
Adding a New Marginal Smooth Basis
Adding a new marginal smooth basis only requires adding a single new function. This function interacts with mssm
at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f
class, the
mssm.src.python.terms.fs
class, or the mssm.src.python.terms.irf
class. Each of those takes a basis
keyword argument, accepting a Callable
- so a function. By default the basis
argument
is set to basis=mssm.src.python.smooths.B_spline_basis
- i.e., it receives the mssm.src.python.smooths.B_spline_basis()
function as argument and calls it whenever the basis needs to be evaluated.
Speaking of evaluation. Every basis function needs to accept a couple of mandatory function arguments. Specifically, the function header needs to look something like this:
def my_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, **kwargs) -> np.ndarray:
Let’s take a look at those mandatory arguments:
cov
: This is set to the flattened covariate numpy array (i.e., of shape (-1,)) bymssm
.event_onset
: This is an integer orNone
. If it’s an integer, it reflects the sample on which to place a dirac delta with which the bases should be convolved - this is required if your basis is to work with themssm.src.python.terms.irf
class. Themssm.src.python.terms.f
andmssm.src.python.terms.fs
classes always pass None to this argument.nk
: This is an integer corresponding to the number of basis functions to create.min_c
: This is the minimum covariate value, as a float, passed along by themssm.src.python.terms.f
andmssm.src.python.terms.fs
classes. Themssm.src.python.terms.irf
class first checks if this argument is specified inbasis_kwargs
(more on that in a minute) and if so, passes along the value specified there.max_c
: Maximum covariate value, handled exactly likemin_c
.
You can also set up your basis function to accept optional keyword arguments. For example, the function header of the default B-spline basis looks like this:
B_spline_basis(cov:np.ndarray, event_onset:int|None, nk:int, min_c:float|None=None, max_c:float|None=None, drop_outer_k:bool=False, convolve:bool=False, deg:int=3) -> np.ndarray:
deg
here for example corresponds to the degree of the B-spline basis that should be created. How do these extra arguments get passed to the basis function?
The mssm.src.python.terms.f
, mssm.src.python.terms.fs
, and mssm.src.python.terms.irf
classes all accept a dictionary for the basis_kwargs
argument which can be filled with values for the
optional extra arguments. For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f
) that relies on a B-spline basis of deg=3
:
f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3})
Now, let’s talk about the expected output from a basis function. The function header definition above informs us that a basis function should return a np.ndarray
. Specifically, it needs to be a two-dimensional
numpy array. The first dimension needs to be of the same length as cov
(the covariate over which to evaluate the function) and the second dimension needs to be of length nk
. In other words, the output array
needs to hold in each column one of the nk
basis functions, each evaluated over all values in cov
.
Adding a New Penalty Matrix
To create a new type of Penalty matrix, you need to implement the mssm.src.python.penalties.Penalty
class. Currently, mssm supports the following implementations: mssm.src.python.penalties.IdentityPenalty
and
mssm.src.python.penalties.DifferencePenalty
. This penalty class interacts with mssm
at three points: it is passed to the constructor of an instance of either the mssm.src.python.terms.f
class, the
mssm.src.python.terms.fs
class, or the mssm.src.python.terms.irf
class. Each of those takes a penalty
keyword argument, accepting a list of instances of the mssm.src.python.penalties.Penalty
class (a list because
terms might have more than one penalty applied to them).
The implementation of the mssm.src.python.penalties.Penalty
class is quite simple, and looks like this:
class Penalty:
def __init__(self,pen_type:PenType) -> None:
self.type = pen_type
def constructor(self,n:int,constraint:ConstType|None,*args,**kwargs) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:
pass
The __init__
method receives only a single argument, a mssm.src.python.custom_types.PenType
(see the documentation for supported values). For example, the pen_type
of the mssm.src.python.penalties.DifferencePenalty
is simply set
to PenType.DIFFERENCE
, while the __init__
method of mssm.src.python.penalties.DifferencePenalty
accepts both PenType.IDENTITY
and PenType.DISTANCE
. If you want to implement a derivative-based penalty, there is a PenType
for that: PenType.DERIVATIVE
.
The actual construction of the penalty matrix is then handled by the constructor
method. This is were you will want to implement the code that sets up your penalty matrix. The method receives two mandatory arguments:
n
: An integer, corresponding to the dimension of the the square penalty matrixconstraint
: Any constraint to absorb by the penalty orNone
if no constraint is required. If this argument is notNone
, it will be an instance of themssm.src.python.custom_types.Constraint
class, which holds all the information you need to absorb the constraint into the penalty (see the documentation).
Your constructor
method can also accept additional arguments and key-word arguments. For example, the method header of the constructor
method of the mssm.src.python.penalties.DifferencePenalty
class looks like this:
constructor(self, n:int, constraint:ConstType|None, m:int=2) -> tuple[list[float],list[int],list[int],list[float],list[int],list[int],int]:
m
here corresponds to the order of the difference penalty. How do these extra arguments get passed to the basis function?
The mssm.src.python.terms.f
, mssm.src.python.terms.fs
, and mssm.src.python.terms.irf
classes all accept a list of dictionaries for the pen_kwargs
argument - one for each penalty included in the list passed to the terms penalty
argument.
Each dictionary can be filled with values for the optional extra arguments that should be passed to the constructor
method of the corresponding instance of the mssm.src.python.penalties.Penalty
class.
For example, the code below creates a smooth function of variable “x1” (i.e., an instance of mssm.src.python.terms.f
) that relies on a B-spline basis of deg=3
and is subjected to a single difference penalty (i.e., an instance of mssm.src.python.penalties.DifferencePenalty
)
of order m=2
:
f(["x1"],basis=mssm.src.python.smooths.B_spline_basis,basis_kwargs={"deg":3},
penalty=[DifferencePenalty()],pen_kwargs=[{"m":2}])
Now, let’s talk about the expected output from the constructor
method. Generally, this method needs to return the penalty matrix, a (matrix) root of the penalty matrix, and the rank of the penalty matrix. The rank is simply returned as an integer, but matters are slightly more
complicated for the two matrices: each of those needs to be returned in form of three lists. The first needs to hold non-zero values of the (root of the) penalty, the second needs to hold the row indices of these non-zero values, and the third needs to hold the column indices of these non-zero values. Hence,
the constructor needs to return six lists in total and an integer. The first three lists need to belong to the penalty matrix, while the last three lists need to belong to the penalty matrix root.