Title: | Cox Models for Time-to-Event Data with Nonparametric Discrete Group-Specific Frailties |
---|---|
Description: | Functions for fitting Cox proportional hazards models for grouped time-to-event data, where the shared group-specific frailties have a discrete nonparametric distribution. There are also functions for simulating from these models, and from similar models with a parametric baseline survival function. |
Authors: | Francesca Gasperoni [aut, cre], Christopher Jackson [aut] |
Maintainer: | Francesca Gasperoni <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2025-02-05 03:40:49 UTC |
Source: | https://github.com/fgaspe04/discfrail |
discfrail: Functions for fitting Cox proportional hazards models for grouped time-to-event data, where the shared group-specific frailties have a discrete nonparametric distribution.
The methods are fully described and illustrated in Gasperoni et al. (2018).
Groups of individuals are clustered into a number of latent populations of groups, each with a common frailty that is assumed to act multiplicatively on their hazard. Covariates can be included through proportional hazards, in the manner of the standard Cox model. The baseline hazard is left unspecified as in the Cox model, and estimation is performed by maximum partial likelihood through an EM algorithm.
There are also functions for simulating from these models, and from similar models with a parametric baseline survival function.
npdf_cox
fits nonparametric discrete frailty models. The number of latent populations of groups can either be fixed, or estimated through an automated model comparison and selection procedure.
plot.npdf
plots fitted survival or cumulative hazard curves from the fitted models.
sim_npdf
simulates from a proportional hazards model with an arbitrary baseline survival distribution and discrete shared frailty terms.
sim_weibdf
simulates from a Weibull survival model with discrete shared frailty terms.
Maintainer: Francesca Gasperoni [email protected]
Authors:
Christopher Jackson [email protected]
F. Gasperoni, F. Ieva, A.M. Paganoni, C. Jackson, L. Sharples. (2018) Nonparametric frailty Cox models for hierarchical time-to-event data. Biostatistics.
Nelson-Aalen estimates of group-specific cumulative hazard from a nonparametric discrete frailty model
nelsonaalen_npdf(x)
nelsonaalen_npdf(x)
x |
A fitted nonparametric discrete frailty model, as returned by |
A list of objects of length equal to the number of groups in the data. Each component is a list, equivalent to the output of survfit
called for the corresponding group with type="fh"
, but with two additional components:
y0
: -log
of the survival estimate
sfun0
: a step function for y0
in terms of time, estimated using stepfun
.
plot.nelsonaalen_npdf
, plot.npdf
.
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = nelsonaalen_npdf( x )
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = nelsonaalen_npdf( x )
This function fits a Cox proportional hazards model to grouped survival data, where the shared group-specific frailties have a nonparametric discrete distribution. An EM algorithm is used to maximise the marginal partial likelihood.
npdf_cox(formula, groups, data, K = 2, estK = TRUE, criterion = "BIC", eps_conv = 10^-4, se_method = c("louis", "exact"))
npdf_cox(formula, groups, data, K = 2, estK = TRUE, criterion = "BIC", eps_conv = 10^-4, se_method = c("louis", "exact"))
formula |
A formula expression in conventional R linear modelling
syntax. The response must be a survival time constructed by the
Only |
groups |
name of the variable which indicates the group in which each individual belongs (e.g. the hospital that the individual is treated in). This can be integer, factor or character. The name should be unquoted. |
data |
A data frame in which to find variables supplied in
|
K |
initial number of latent populations, or clusters of groups which have the same discrete frailty. |
estK |
If If |
criterion |
Criterion used to choose the best-fitting model to highlight when
|
eps_conv |
convergence tolerance for the EM algorithm. |
se_method |
Method or methods used to compute the standard errors. A character vector containing one or more of the following:
By default this is |
If estK=FALSE
this returns a list of class npdf
which includes information about the model fit, including estimates and standard errors.
If estK=TRUE
this returns a list of class npdflist
. This has an element models
that contains a list of length K
, with one component of class npdf
for each fitted model.
comparison
is a matrix composed of K
rows and 5 columns (K
, K_fitted
, llik
, AIC
, BIC
). K_fitted
is the number of estimated latent populations, which can be equal to or less than K
. llik
stands for log-likelihood, AIC
for Akaike Information Criterion and BIC
for Bayesian Information Criterion.
Kopt
is optimal model under each criterion.
criterion
is the preferred criterion.
In either case, the data frame used for the fit (the "model frame") is appended as a component mf
.
Gasperoni, F., Ieva, F., Paganoni, A.M., Jackson, C. and Sharples, L. (2018). Nonparametric frailty Cox models for hierarchical time-to-event data. Biostatistics.
Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73(364), 805–811.
Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B, 44(2), 226–233.
test <- npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 4, eps_conv=10^-4) test # optimal model (by all criteria) has 2 latent populations test$models[[1]] # examine alternative model with 1 latent population
test <- npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 4, eps_conv=10^-4) test # optimal model (by all criteria) has 2 latent populations test$models[[1]] # examine alternative model with 1 latent population
Plot Nelson-Aalen estimates of group-specific cumulative hazard from a nonparametric discrete frailty model
## S3 method for class 'nelsonaalen_npdf' plot(x, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, cols = NULL, ...)
## S3 method for class 'nelsonaalen_npdf' plot(x, xlim = NULL, ylim = NULL, xlab = NULL, ylab = NULL, cols = NULL, ...)
x |
Object returned by |
xlim |
x-axis limits (vector of 2) |
ylim |
x-axis limits (vector of 2) |
xlab |
x-axis label |
ylab |
y-axis label |
cols |
vector of colour names or numbers, of the same length as the number of groups |
... |
options to pass to the generic |
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = nelsonaalen_npdf( x ) plot( object ) plot( object, xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate", cols=ifelse(x$belonging==2, "purple", "black"))
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = nelsonaalen_npdf( x ) plot( object ) plot( object, xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate", cols=ifelse(x$belonging==2, "purple", "black"))
This function plots estimates of the survival or cumulative hazard for each group, coloured according to the latent population that each group belongs to.
## S3 method for class 'npdf' plot(x, type = "km", cols = NULL, survfit_opts = NULL, na_opts = NULL, ...)
## S3 method for class 'npdf' plot(x, type = "km", cols = NULL, survfit_opts = NULL, na_opts = NULL, ...)
x |
A fitted nonparametric discrete frailty model, as returned by |
type |
character. If |
cols |
Vector of colour names or numbers, of the same length as the number of groups. If not given, this defaults to |
survfit_opts |
Optional list of additional arguments to pass to |
na_opts |
Optional list of arguments (other than |
... |
Optional arguments to pass to |
result = npdf_cox( Surv(time, status) ~ x, groups = family, data = weibdata2030, K = 2, estK = FALSE, eps_conv=10^-4) plot( result ) plot( result, type = "km" ) plot( result, cols = ifelse( result$belonging == 1, "purple", "black" ), xlim = c( 0, 150 ) ) ## use of survfit_opts. show only first 10 groups plot( result, survfit_opts = list(subset = (weibdata2030$family >= 10) )) plot( result, type = "na" ) ## use of na_opts to customise the Nelson-Aalen plot plot( result, type = "na", cols=ifelse(result$belonging==2, "purple", "black"), na_opts = list(xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate"))
result = npdf_cox( Surv(time, status) ~ x, groups = family, data = weibdata2030, K = 2, estK = FALSE, eps_conv=10^-4) plot( result ) plot( result, type = "km" ) plot( result, cols = ifelse( result$belonging == 1, "purple", "black" ), xlim = c( 0, 150 ) ) ## use of survfit_opts. show only first 10 groups plot( result, survfit_opts = list(subset = (weibdata2030$family >= 10) )) plot( result, type = "na" ) ## use of na_opts to customise the Nelson-Aalen plot plot( result, type = "na", cols=ifelse(result$belonging==2, "purple", "black"), na_opts = list(xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate"))
Survival or cumulative hazard curves from a nonparametric discrete frailty model chosen by automatic model selection
## S3 method for class 'npdflist' plot(x, K = NULL, ...)
## S3 method for class 'npdflist' plot(x, K = NULL, ...)
x |
An object returned by |
K |
The number of latent populations which identifies the specific model to plot estimates from. By default this is the model selected by the criterion specified when calling |
... |
Options to customise the plot, passed to |
result = npdf_cox( Surv(time, status) ~ x, groups = family, data = weibdata2030, K = 2, eps_conv=10^-4) plot( result, K = 2 ) plot( result, type = "na" ) plot( result, type = "na", cols=ifelse(result$belonging==2, "purple", "black"), na_opts = list(xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate"))
result = npdf_cox( Surv(time, status) ~ x, groups = family, data = weibdata2030, K = 2, eps_conv=10^-4) plot( result, K = 2 ) plot( result, type = "na" ) plot( result, type = "na", cols=ifelse(result$belonging==2, "purple", "black"), na_opts = list(xlim=c(0,200), ylim=c(0,2), xlab="Follow-up days", ylab="Nelson-Aalen estimate"))
Plot Kaplan-Meier estimates of group-specific cumulative hazard from a nonparametric discrete frailty model
## S3 method for class 'survfit_npdf' plot(x, xlab = NULL, ylab = NULL, cols = NULL, ...)
## S3 method for class 'survfit_npdf' plot(x, xlab = NULL, ylab = NULL, cols = NULL, ...)
x |
Object returned by |
xlab |
x-axis label |
ylab |
y-axis label |
cols |
vector of colour names or numbers, of the same length as the number of groups |
... |
options to pass to the generic |
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = survfit_npdf( x ) plot( object )
x = npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 2, estK=FALSE, eps_conv=10^-4) object = survfit_npdf( x ) plot( object )
Prints estimates, standard errors, likelihood and model comparison statistics from a fitted nonparametric discrete frailty model
## S3 method for class 'npdf' print(x, digits = NULL, ...)
## S3 method for class 'npdf' print(x, digits = NULL, ...)
x |
A fitted nonparametric discrete frailty model, as returned by |
digits |
Number of significant digits to present, by default |
... |
Further options (currently unused) |
Prints the model comparison statistics comparing models with different numbers of latent populations, and prints the estimates from the optimal model according to the criterion that was specified when calling npdf_cox
(by default, BIC criterion).
## S3 method for class 'npdflist' print(x, digits = NULL, ...)
## S3 method for class 'npdflist' print(x, digits = NULL, ...)
x |
An object returned by |
digits |
Number of significant digits to present, by default |
... |
Further options (currently unused) |
This function returns a dataset generated from a semiparametric proportional hazards model with a shared discrete frailty term, for given cumulative baseline hazard function, hazard ratios, distribution of groups among latent populations, frailty values for each latent population, and randomly-generated covariate values.
sim_npdf(J, N = NULL, beta, Lambda_0_inv, p, w_values, cens_perc)
sim_npdf(J, N = NULL, beta, Lambda_0_inv, p, w_values, cens_perc)
J |
number of groups in the data |
N |
number of individuals in each group |
beta |
vector of log hazard ratios |
Lambda_0_inv |
inverse cumulative baseline hazard function, that is, with covariate values 0 and frailty ratio 1 |
p |
vector of K elements. The kth element gives the proportion of groups in the kth latent population of groups. |
w_values |
vector of K distinct frailty values, one for each latent population. |
cens_perc |
percentage of censored events. Censoring times are assumed to be distributed as a Normal with variance equal to 1. |
A data frame with one row for each simulated individual, and the following columns:
family
: the group which the individual is in (integers 1, 2, ...)
time
: the simulated event time.
status
: the simulated survival status. Censoring times are generated from a Normal distribution with standard deviation equal to 1 and the mean is estimated in order to guarantee the determined percentage of censored events. The event time is observed (status=1) if it is less than the censoring time, and censored otherwise (status=0).
x
: matrix of covariate values, generated from a standard normal distribution.
belong
: the frailty hazard ratio corresponding to the cluster of groups in which the individual's group has been allocated.
Wan, F. (2017). Simulating survival data with predefined censoring rates for proportional hazards models. Statistics in medicine, 36(5), 838-854.
J <- 100 N <- 40 Lambda_0_inv = function( t, c=0.01, d=4.6 ) ( t^( 1/d ) )/c beta <- 1.6 p <- c( 0.8, 0.2 ) w_values <- c( 0.8, 1.6 ) cens_perc <- 0.1 data <- sim_npdf( J, N, beta, Lambda_0_inv, p, w_values, cens_perc ) head( data )
J <- 100 N <- 40 Lambda_0_inv = function( t, c=0.01, d=4.6 ) ( t^( 1/d ) )/c beta <- 1.6 p <- c( 0.8, 0.2 ) w_values <- c( 0.8, 1.6 ) cens_perc <- 0.1 data <- sim_npdf( J, N, beta, Lambda_0_inv, p, w_values, cens_perc ) head( data )
This function returns a dataset generated from a Weibull proportional hazards model with a shared discrete frailty term, for given Weibull parameters, hazard ratios, distribution of groups among latent populations, frailty values for each latent population, and randomly-generated covariate values.
sim_weibdf(J, N = NULL, lambda, rho, beta, p, w_values, cens_perc)
sim_weibdf(J, N = NULL, lambda, rho, beta, p, w_values, cens_perc)
J |
number of groups in the data |
N |
number of individuals in each group |
lambda |
Weibull baseline rate parameter (see below), interpreted as the rate parameter with covariate values of 0 and frailty ratio 1. For |
rho |
Weibull shape parameter (see below) |
beta |
covariate effects in the Weibull distribution, interpreted as log hazard ratios (see below) |
p |
vector of K elements. The kth element gives the proportion of groups in the kth latent population of groups. |
w_values |
vector of K distinct frailty values, one for each latent population. |
cens_perc |
cens_perc percentage of censored events. Censoring times are assumed to be distributed as a Normal with variance equal to 1. |
The "proportional hazards" parameterisation of the Weibull distribution is used, with survivor function . Note this is different from the "accelerated failure time" parameterisation used in, e.g.
dweibull
. Distribution functions for the proportional hazards parameterisation can be found in the flexsurv package.
A data frame with one row for each simulated individual, and the following columns:
family
: the group which the individual is in (integers 1, 2, ...)
time
: the simulated event time.
status
: the simulated survival status. Censoring times are generated from a Normal distribution with standard deviation equal to 1 and the mean is estimated in order to guarantee the determined percentage of censored events. The event time is observed (status=1) if it is less than the censoring time, and censored otherwise (status=0).
x
: matrix of covariate values, generated from a standard normal distribution.
belong
: the frailty hazard ratio corresponding to the cluster of groups in which the individual's group has been allocated.
Wan, F. (2017). Simulating survival data with predefined censoring rates for proportional hazards models. Statistics in medicine, 36(5), 838-854.
J <- 100 N <- 40 lambda <- 0.5 beta <- 1.6 rho <- 1 p <- c( 0.8, 0.2 ) w_values <- c( 0.8, 1.6 ) cens_perc <- 0.2 data <- sim_weibdf( J, N, lambda, rho, beta, p, w_values, cens_perc) head( data )
J <- 100 N <- 40 lambda <- 0.5 beta <- 1.6 rho <- 1 p <- c( 0.8, 0.2 ) w_values <- c( 0.8, 1.6 ) cens_perc <- 0.2 data <- sim_weibdf( J, N, lambda, rho, beta, p, w_values, cens_perc) head( data )
Kaplan-Meier estimates of group-specific cumulative hazard from a nonparametric discrete frailty model
survfit_npdf(x, survfit_opts = NULL)
survfit_npdf(x, survfit_opts = NULL)
x |
A fitted nonparametric discrete frailty model, as returned by |
survfit_opts |
Optional list of additional arguments to pass to |
A list of survival estimates, one for each group, as produced by survfit.formula
Hierarchical time-to-event data simulated from a Weibull baseline distribution with a nonparametric frailty
weibdata10040
weibdata10040
A data frame with 4000 rows and the following columns
Integer from 1 to 100 indicating the group membership. There are 40 individuals in each group.
Survival time
Indicator for an observed event at the survival time
Covariate value, generated from a standard normal distribution
Underlying group-specific frailty value
There are two underlying latent populations with proportion 0.8 and 0.2 with frailty values 0.8 and 1.6 respectively, thus the second population has twice the hazard of the first. The covariate is associated with a log hazard ratio of 1.6.
The baseline hazard is set to that of a Weibull model.
The percentage of censored events is equal to 20.
The dataset was generated by sim_weibdf
as follows.
sim_weibdf( J = 100, N = 40, lambda = 0.3, rho = 1.3, beta = 1.6, p = c( 0.8, 0.2 ), w_values = c( 0.8, 1.6 ), cens_perc = 0.2)
Hierarchical time-to-event data simulated from a Weibull baseline distribution with a nonparametric frailty
weibdata2030
weibdata2030
A data frame with 600 rows and the following columns
Integer from 1 to 20 indicating the group membership. Each group has 30 individuals.
Survival time
Indicator for an observed event at the survival time
Covariate value, generated from a standard normal distribution
Underlying group-specific frailty value
There are two underlying latent populations with proportion 0.3 and 0.7 with frailty values 0.8 and 1.6 respectively, thus the second population has twice the hazard of the first. The covariate is associated with a log hazard ratio of 1.6.
The baseline hazard is set to that of a Weibull model.
The percentage of censored events is equal to 10.
The dataset was generated by sim_weibdf
as follows.
sim_weibdf( J = 20, N = 30, lambda = 0.6, rho = 1.1, beta = 1.6, p = c( 0.3, 0.7 ), w_values = c( 0.8, 1.6 ), cens_perc = 0.1)