---
title: "Vignette_tneh"
author: Juste Goungounga, Judith Breaud, Olayidé Boussari, Valérie Jooste
format: html
editor: visual
output: 
  rmarkdown::html_document:
      theme: paper 
      toc: true
      toc_float: true
vignette: >
  %\VignetteIndexEntry{Vignette_tneh}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
#| echo: true
#| label: load-packages

library(curesurv)
```

# How to analyse survival data using Time-To-Null excess hazard model (TNEH model)

## Additive hazard assumption in net survival setting

When the cause of death is unknown, the most common method to estimate the cancer-related survival is net survival. Its estimation assumes that the observed hazard $\lambda_{obs}$ is equal to the sum of the known background mortality hazard in the general population $\lambda_{pop}$ (obtained from national Statistic Institutes such as INSEE in France) and the excess hazard (due to cancer) $\lambda_{exc}$. For one individual $i$, this relation can be expressed as:

$$
\lambda_{obs}(t_i|z_i) = \lambda_{pop}(t_i+a_i|z_{pop,i}) + \lambda_{exc}(t_i|z_i)
$$ where $Z_{pop}\subset Z$ are the variables of the mortality table and of the model respectively.

The cumulative observed hazard can be written as: $$
\Lambda_{obs}(t|z) = \Lambda_{pop}(t+a|z) + \Lambda_{exc}(t|z)
$$ and the net survival is obtained as following: $$
S_{n}(t|z) = \exp(-\Lambda_{exc}(t|z))
$$

# TNEH model

The TNEH model is a relatively recent excess hazard model developed by Boussari et al. $\\$

The particularity of this model is that it enables the estimation, at the same time as the classical parameters of a model of the excess rate, of a quantity which is obtained by post-estimation by the classical models: it concerns the time after which the excess rate becomes null i.e. the cure point.

## Instantaneous excess hazard

The excess hazard proposed can be expressed as following:

$$
\lambda_{exc}(t|z;\theta) = \left(\dfrac{t}{\tau(z;\tau*)}\right)^{\alpha(z;\alpha*)-1} \left(1 - \dfrac{t}{\tau(z;\tau*)}\right)^{\beta-1} 1_{\left\{0 \le t \le \tau(z;\tau*)\right\}}
$$

where : $\\$

$\tau(z;\tau*) > 0$ is the time to cure, depends on covariates z and vector of parameters $\tau*$. It corresponds to the vector of parameters fitting the time-to-null excess hazard. $\\$

$\alpha(z;\alpha*) > 0$ and $\beta > 1$ are shape parameters. With $\beta>1$, the excess hazard is forced to be null and continuous in $\tau(z;\tau*)$. $\\$

The vector of parameters to be estimated is $\theta = (\alpha*, \beta, \tau*)$ with $\alpha(z;\alpha*) > 0$ .

## Cumulative excess hazard

$$
\Lambda_{exc}(t|z;\theta) = \tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right)
$$

where

B is the beta function $\\$ $F_{Be}$ is the cumulative distribution function (cdf) of the beta distribution

## Net survival

$$
S_n(t|z) = \exp(-\Lambda_{exc}(t|z)) = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right) F_{Be} \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*),\beta \right)\right) 
$$

## The cure fraction $\pi$

The cure fraction corresponds to the net survival at $t = \tau$ in TNEH model. It can be expressed as:

$$
\pi(z|\theta) = \exp\left(-\Lambda_{exc}(\tau(z;\tau*)|z)\right)
        = \exp\left(-\tau(z;\tau*) B \left( \alpha(z;\alpha*), \beta \right)\right)
$$

## The probability P(t)

This quantity corresponds to the probability P(t) of being cured at a given time t after diagnosis knowing that he/she was alive up to time t. It can be expressed as following:

$$
P(t|z) = \dfrac{\pi(z|\theta)}{S_n(t|z)} 
        = \exp \left( \tau(z;\tau*) \left( B \left( \dfrac{t}{\tau(z;\tau*)} ; \alpha(z;\alpha*) , \beta \right) - B(\alpha, \beta)  \right) \right)
$$ To calculates the confidence intervals of $P(t|z)$, can be obtained using the delta method. The application of this method requires the partial derivatives of $P(t|z)$ with respect of the parameters of the model. This can be written as:

$$
\dfrac{\partial P(t|z)}{\partial \theta} = \dfrac{1}{S_n(t|z)^2} \left( \dfrac{\partial \pi(z|\theta)}{\partial \theta} S_n(t|z) - \dfrac{\partial S_n(t|z)}{\partial \theta} \pi(z|\theta)   \right)
$$

# Fit of tneh model using R

## Without covariates

There are no covariables acting on parameters $\tau$ ($\tau=\tau_0$) and $\alpha$ ($\alpha=\alpha_0$)

```{r }
#| echo: true
#| warning: false
#| message: false
#| label: nocov

fit_ad_tneh_nocov <- curesurv(Surv(time_obs, event) ~ 1,
                             pophaz = "ehazard",
                             cumpophaz = "cumehazard",
                             model = "nmixture", dist = "tneh",  
                             link_tau = "linear",
                             data = testiscancer,
                             method_opt = "L-BFGS-B")


```

```{r}
summary(fit_ad_tneh_nocov)
```

We can see that the time-to-cure $\tau=\tau_0$ is estimated at 5.102 years, and the cure proportion is estimated at 84.74%.

### Prediction

We predict for varying time after diagnosis

```{r }
#| echo: true
#| label: predictnocov
newdata1 <- with(testiscancer,
  expand.grid(event = 0, time_obs  = seq(0.1,10,0.1)))
p_28 <- predict(object = fit_ad_tneh_nocov, newdata = newdata1)

```

### Plot of different estimators (hazard, survival, probability of being cured conditionally on being alive)

```{r}
#| echo: true
#| label: plotpredictnocov
#| fig.height: 8
#| fig.width: 10
plot(p_28)
```

### Cure fraction estimation precision

The confidence intervals at $1-\alpha$ level for the cure fraction $\pi$ can be written as:

$$
\left[\hat{\pi} \pm z_{1 - \alpha / 2}  \sqrt{Var(\hat{\hat{\pi}})}\right]
$$ where $$
Var(\hat{\pi}) = \dfrac{\partial \hat{\pi}}{\partial \theta} Var(\theta) \left(\dfrac{\partial \hat{\pi}}{\partial \theta}\right)^T
$$

```{r}
p_28[1,c("lower_bound_pi_tneh","cured","upper_bound_pi_tneh")]
```

### Time-to-cure estimation

We search the time $\text{t}=\text{TTC}_i$ from which $\text{P}_i(t) = 1-\epsilon$. $\epsilon$ can be fixed to 0.95.

The variance formula can be expressed as:

$$
Var(TTC) = Var(g(\theta;z_i)) \simeq \left(\dfrac{\partial P(t|z_i;\theta)}{\partial t}_{|t = TTC}\right)^{-2} Var(P(t|z_i;\theta))_{|t=TTC}
$$

```{r}
p_28[1,c("lower_bound_TTC_tneh","time_to_cure_ttc","upper_bound_TTC_tneh")]
```

## With covariates acting both on parameters tau and alpha

We create a new age variable : age_crmin the reduced age and "centered" around the age of the youngest patient

```{r}
#| echo: true
#| warning: false
#| message: false
 testiscancer$age_crmin <- (testiscancer$age- min(testiscancer$age)) /
              sd(testiscancer$age)
```

```{r}
#| echo: false
#| warning: false
#| message: false

oldpar<-par(no.readonly = TRUE)
par(mfrow=c(1,1))
plot(testiscancer$age,testiscancer$age_crmin,type="l",xlab="age",ylab="age_crmin")
text(x=min(testiscancer$age),y=0,col="blue",round(min(testiscancer$age),2))
lines(x=rep(min(testiscancer$age),2),y=c(-1,0),col="blue",lty=2)
lines(x=c(-1,min(testiscancer$age)),y=c(0,0),col="blue",lty=2)
text(x=min(testiscancer$age)+sd(testiscancer$age),y=1.2,col="red",round(sd(testiscancer$age)+min(testiscancer$age),2))
lines(x=rep(min(testiscancer$age)+sd(testiscancer$age),2),y=c(-1,1),col="red",lty=2)
lines(x=c(-1,min(testiscancer$age)+sd(testiscancer$age)),y=c(1,1),col="red",lty=2)
par(oldpar)
```

This model has the variable age_crmin acting on both $\alpha$ and $\tau$ : ($\alpha=\alpha_0+Z_{age\_crmin}\times\alpha_1$ et $\tau=\tau_0+Z_{age\_crmin}\times\tau_1$)

```{r}
#| echo: true
#| warning: false
#| message: false
#| label: withcov

fit_m1_ad_tneh <- curesurv(Surv(time_obs, event) ~ z_tau(age_crmin) + 
                          z_alpha(age_crmin),
                          pophaz = "ehazard",
                          cumpophaz = "cumehazard",
                          model = "nmixture", dist = "tneh",
                          link_tau = "linear",
                          data = testiscancer,
                          method_opt = "L-BFGS-B")
```

```{r}
 summary(fit_m1_ad_tneh)
```

For the reference individual (that is the patient with age_crmin=0, so the youngest patient, approximately 20y old at diagnosis), the cure proportion is estimated at 96,75% and the time to null excess hazard at 3.258 year. For an individual whose age\_ is_crmin is 1 (that is they are aged the standard deviation more than the youngest, that is approximately 39y old at diagnosis), the time to null excess hazard is 6.721 year.

### Predictions :

With varying time for patient of mean age

```{r}
#| echo: true

  #time varying prediction for patient with age mean
newdata1 <- with(testiscancer, 
                 expand.grid(event = 0,
                             age_crmin = mean(age_crmin),
                             time_obs  = seq(0.001,10,0.1)))

 pred_agemean <- predict(object = fit_m1_ad_tneh, newdata = newdata1)
```

With varying time for oldest patient

```{r}
#| echo: true

#time varying prediction for patient with age max
newdata2 <- with(testiscancer, 
                  expand.grid(event = 0,
                              age_crmin = max(age_crmin),
                              time_obs  = seq(0.001,10,0.1)))

pred_agemax <- predict(object = fit_m1_ad_tneh, newdata = newdata2)
```

At 2 years after diagnostic for patients of different ages

```{r}
#| echo: true

# predictions at time 2 years with varying age

   newdata3 <- with(testiscancer,
      expand.grid(event = 0, 
                  age_crmin = seq(min(testiscancer$age_crmin), 
                                  max(testiscancer$age_crmin), 0.1),
                  time_obs  = 2))

pred_age_val <- predict(object = fit_m1_ad_tneh, newdata = newdata3)
val_age <- seq(min(testiscancer$age_crmin),
               max(testiscancer$age_crmin),
               0.1) * sd(testiscancer$age) +  min(testiscancer$age)
```

### Plot of main indicators as a function of time for two patients of age mean and age max

```{r }
#| echo: true
#| label: predict_tnehcov
#| fig.height: 10
#| fig.width: 10
par(mfrow = c(2, 2),cex = 1.0)

plot(pred_agemax$time,pred_agemax$ex_haz,type = "l",lty = 1,lwd = 2,
     xlab = "Time since diagnosis", ylab = "excess hazard")

lines(pred_agemean$time,pred_agemean$ex_haz,type = "l",lty = 2,lwd = 2)

legend("topright",horiz = FALSE,
       legend = c("hE(t) age.max = 79.9", "hE(t) age.mean = 50.8"),
       col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2))
grid()

plot(pred_agemax$time,pred_agemax$netsurv,type = "l",lty = 1,lwd = 2,
    ylim = c(0, 1),xlab = "Time since diagnosis",ylab = "net survival")

lines(pred_agemean$time,pred_agemean$netsurv,type = "l",lty = 2,lwd = 2)

legend("bottomleft",horiz = FALSE,
       legend = c("Sn(t) age.max = 79.9", "Sn(t) age.mean = 50.8"),
       col = c("black", "black"),lty = c(1, 2, 1, 1, 2, 2))
grid()

plot(pred_agemax$time,pred_agemax$pt_cure,type = "l",lty = 1,lwd = 2,ylim = c(0, 1),
     xlab = "Time since diagnosis",ylab = "probability of being cured P(t)")

lines(pred_agemean$time,pred_agemean$pt_cure,type = "l",lty = 2,lwd = 2)

abline(v = pred_agemean$tau[1],lty = 2,lwd = 2,col = "blue")
abline(v = pred_agemean$time_to_cure_ttc[1],lty = 2,lwd = 2,col = "red")
abline(v = pred_agemax$tau[1],lty = 1,lwd = 2,col = "blue")
abline(v = pred_agemax$time_to_cure_ttc[1],lty = 1,lwd = 2,col = "red")
grid()

legend("bottomright",horiz = FALSE,
       legend = c("P(t) age.max = 79.9","P(t) age.mean = 50.8",
                  "TNEH age.max = 79.9","TTC age.max = 79.9",
                 "TNEH age.mean = 50.8","TTC age.mean = 50.8"),
      col = c("black", "black", "blue", "red", "blue", "red"),
      lty = c(1, 2, 1, 1, 2, 2))
par(oldpar)
```

### Plot of the main indicators 2 years after diagnosis, for patients of varying ages

```{r}
#| echo: true
#| label: predicted_to_plot_by_age
#| fig.height: 8
#| fig.width: 10

par(mfrow=c(2,2))
plot(val_age,pred_age_val$ex_haz, 
     type = "l",lty=1, lwd=2,
     xlab = "age",ylab = "excess hazard 2y after diagnosis")
grid()

 plot(val_age,pred_age_val$netsurv, 
      type = "l", lty=1,lwd=2,ylim=c(0,1), 
      xlab = "age", ylab = "net survival 2y after diagnosis")
     grid()

 plot(val_age,pred_age_val$pt_cure, type = "l", lty=1, lwd=2,ylim=c(0,1),
     xlab = "age",ylab = "P(t) 2y after diagnosis")
     grid()
     
plot(val_age,pred_age_val$cured, type = "l", lty=1, lwd=2,ylim=c(0,1),
     xlab = "age", ylab = "cure proportion")
     grid()
     
par(oldpar)
```

## With covariates acting only on parameters adjusting the time-to-null excess hazard tau

Effects of centered and reduced age on $\tau$ ($tau=\tau_0+Z_{age\_cr}\times\tau_1$, $\alpha=\alpha_0$)

```{r }

#| echo: true
#| label: withtauonly
#| warning: false
#| message: false

fit_ad_tneh_covtau <- curesurv(
  Surv(time_obs, event) ~ z_tau(age_cr),
  pophaz = "ehazard",
  cumpophaz = "cumehazard",
  model = "nmixture",
  dist = "tneh",
  link_tau = "linear",
  data = testiscancer,
  method_opt = "L-BFGS-B"
)


```

```{r}
#| echo: true
#| label: withtauonly
summary(fit_ad_tneh_covtau)

```

This model estimates a cure proportion of 79.4% for individuals of mean age (51.05 year old at diagnosis) with a time to null excess hazard 7.44 year. For an individual of age at diagnosis 70.01 (age_cr=1), the time to null excess hazard is estimated at 9.754. This model estimates that the time to null excess hazard increases by 2.3159 when age_cr increases by (that is, when age at diagnosis increases by 18.96)

### Prediction

```{r }
#| echo: true
#| label: predicted
summary(testiscancer$age_cr)
summary(testiscancer$age)
newdata2 <- with(testiscancer,
                 expand.grid(event = 0, 
                             time_obs  = seq(0.1, 10, 0.1),
                             age_cr = c(-0.9358, 0.0276, 0.9525) ))
newdata2_1stqu <- newdata2[newdata2$age_cr==-0.9358,]
newdata2_2rdqu <- newdata2[newdata2$age_cr==0.0276,]
newdata2_3rdqu <- newdata2[newdata2$age_cr==0.9525,]

p1stqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_1stqu)
p2rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_2rdqu)
p3rdqu <- predict(object = fit_ad_tneh_covtau, newdata = newdata2_3rdqu)

```

### Plot of excess hazard at varying times for 3 different ages at diagnosis (33.3, 51.6, 69,1)

```{r}
#| echo: true
#| label: plot_for_specific_age
#| fig.height: 8
#| fig.width: 10

par(mfrow=c(2,2))

  plot(p1stqu, 
     main = "Excess hazard for age 33.3", 
     fun = "haz")

  plot(p2rdqu,
     fun = "haz",
     main = "Excess hazard for age 51.57")

  plot(p3rdqu,
     fun = "haz",
     main = "Excess hazard for age 69.11")

par(oldpar)
```

## With covariates acting only on scale parameter alpha

Effect of age_cr on only $\alpha$ ($\alpha=\alpha_0+Z_{age\_cr}*\alpha_1$, $\tau=\tau_0$)

```{r }

#| echo: true
#| label: only_covariate_on_alpha
#| message: false
#| warning: false

fit_ad_tneh_covalpha <-
  curesurv(
    Surv(time_obs, event) ~ z_alpha(age_cr),
    pophaz = "ehazard",
    cumpophaz = "cumehazard",
    model = "nmixture",
    dist = "tneh",
    link_tau = "linear",
    data = testiscancer,
    method_opt = "L-BFGS-B"
  )


```

```{r}
#| echo: true
#| message: false
#| warning: false

summary(fit_ad_tneh_covalpha)
```

The time to null excess hazard is estimated at 6.099 year for all individuals, and the cure proportion is estimated at 81.81% for patient of mean age (51.05 yo at diagnosis)

### Predictions

```{r }
#| echo: true
#| message: false
#| warning: false
#| include: true
p4_33.3 <- predict(object = fit_ad_tneh_covalpha,
                 newdata = newdata2_1stqu)
p4_51.6 <- predict(object = fit_ad_tneh_covalpha,
                 newdata = newdata2_2rdqu)
p4_69.1 <- predict(object = fit_ad_tneh_covalpha,
                 newdata = newdata2_3rdqu)

```

### Plot of estimation of probability P(t) of being cured at a given time t after diagnosis knowing that he/she was alive up to time t

```{r}
#| echo: true
#| label: plot_for_specific_age_2
#| fig.height: 8
#| fig.width: 10

par(mfrow=c(2,2))

  plot(p4_33.3, 
     main = "Pt_cure for age 33.3", 
     fun = "pt_cure")

  plot(p4_51.6,
     fun = "pt_cure",
     main = "Pt_cure for age 51.6")

  plot(p4_69.1,
     fun = "pt_cure",
     main = "Pt_cure for age 69.1")
par(oldpar)
```

## Comparing models

We fitted 4 models and to compare them we can use the AIC criteria

```{r}
AIC(fit_ad_tneh_nocov,fit_ad_tneh_covalpha,fit_ad_tneh_covtau,fit_m1_ad_tneh)
```

The best model is the one with the lowest AIC : it's the one with an effect of age on both \$ \tau\$ and $\alpha$.

# Session info

```{r}
date()
sessionInfo()
```