Type: | Package |
Title: | Algorithms for Quantile- And Mean-Optimal Treatment Regimes |
Author: | Yu Zhou [cre, aut], Lan Wang [ctb], Ben Sherwood [ctb], Rui Song [ctb] |
Maintainer: | Yu Zhou <zhou0269@umn.edu> |
Description: | Estimation methods for optimal treatment regimes under three different criteria, namely marginal quantile, marginal mean, and mean absolute difference. For the first two criteria, both one-stage and two-stage estimation method are implemented. A doubly robust estimator for estimating the quantile-optimal treatment regime is also included. |
Version: | 0.1.3 |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyData: | TRUE |
Imports: | stringr, rgenoud (≥ 5.7), quantreg (≥ 5.18), parallel, methods, Rdpack |
Depends: | R (≥ 3.2), stats, utils |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2018-02-05 05:24:52 UTC; yuzhou |
RdMacros: | Rdpack |
Repository: | CRAN |
Date/Publication: | 2018-02-05 05:56:14 UTC |
The Doubly Robust Estimator of the Quantile-Optimal Treatment Regime
Description
DR_Qopt
implements the doubly robust estimation method to
estimate the quantile-optimal treatment regime. The double robustness
property means that it is consistent when either the propensity score model
is correctly specified, or the conditional quantile function is correctly specified.
Both linear and nonlinear conditional quantile models are considered. See 'Examples'
for an illustrative example.
Usage
DR_Qopt(data, regimeClass, tau, moPropen = "BinaryRandom",
nlCondQuant_0 = FALSE, nlCondQuant_1 = FALSE, moCondQuant_0,
moCondQuant_1, max = TRUE, length.out = 200, s.tol, it.num = 8,
cl.setup = 1, p_level = 1, pop.size = 3000, hard_limit = FALSE,
start_0 = NULL, start_1 = NULL)
Arguments
data |
a data frame, must contain all the variables that appear in |
regimeClass |
a formula specifying the class of treatment regimes to search,
e.g. if
Polynomial arguments are also supported. See also 'Details'. |
tau |
a value between 0 and 1. This is the quantile of interest. |
moPropen |
The propensity score model for the probability of receiving
treatment level 1.
When |
nlCondQuant_0 |
Logical. When |
nlCondQuant_1 |
Logical. When |
moCondQuant_0 |
Either a formula or a string representing the parametric form of the conditional quantile function given that treatment=0. |
moCondQuant_1 |
Either a formula or a string representing the parametric form of the conditional quantile function given that treatment=1. |
max |
logical. If |
length.out |
an integer greater than 1. If one of the conditional quantile
model is set to be nonlinear, this argument will be triggered and we will fit
|
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
hard_limit |
logical. When it is true the maximum number of generations
in |
start_0 |
a named list or named numeric vector of starting estimates for
the conditional quantile function when |
start_1 |
a named list or named numeric vector of starting estimates for
the conditional quantile function when |
Details
Standardization on covariates AND explanation on the differences between the two returned regime parameters.
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
This estimated parameters indexing the quantile-optimal treatment regime are returned in two scales:
The returned
coefficients
is the set of parameters after covariatesX
are standardized to be in the interval [0, 1]. To be exact, every covariate is subtracted by the smallest observed value and divided by the difference between the largest and the smallest value. Next, we carried out the algorithm in Wang 2016 to get the estimated regime parameters,coefficients
, based on the standardized data. For the identifiability issue, we force the Euclidean norm ofcoefficients
to be 1.In contrast,
coef.orgn.scale
corresponds to the original covariates, so the associated decision rule can be applied directly to novel observations. In other words, let\beta
denote the estimated parameter in the original scale, then the estimated treatment regime is:d(x)= I\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k > 0\}.
The estimated
\bm{\hat{\beta}}
is returned ascoef.orgn.scale
. The same ascoefficients
, we force the Euclidean norm ofcoef.orgn.scale
to be 1.
If, for each input covariate, the smallest observed value is exactly 0 and the range (i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coefficients
andcoef.orgn.scale
will render identical.Property of the doubly robust(DR) estimator. The DR estimator
DR_Qopt
is consistent if either the propensity score model or the conditional quantile regression model is correctly specified. (Wang et. al. 2016)
Value
This function returns an object with 9 objects. Both coefficients
and coef.orgn.scale
were normalized to have unit euclidean norm.
coefficients
the parameters indexing the estimated quantile-optimal treatment regime for standardized covariates.
coef.orgn.scale
the parameter indexing the estimated quantile-optimal treatment regime for the original input covariates.
tau
the quantile of interest
hatQ
the estimated marginal tau-th quantile when the treatment regime indexed by
coef.orgn.scale
is applied on everyone. See the 'details' for connection betweencoef.orgn.scale
andcoefficient
.call
the user's call.
moPropen
the user specified propensity score model
regimeClass
the user specified class of treatment regimes
moCondQuant_0
the user specified conditional quantile model for treatment 0
moCondQuant_1
the user specified conditional quantile model for treatment 1
Author(s)
Yu Zhou, zhou0269@umn.edu
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
See Also
Examples
ilogit <- function(x) exp(x)/(1 + exp(x))
GenerateData.DR <- function(n)
{
x1 <- runif(n,min=-1.5,max=1.5)
x2 <- runif(n,min=-1.5,max=1.5)
tp <- ilogit( 1 - 1*x1^2 - 1* x2^2)
a <-rbinom(n,1,tp)
y <- a * exp(0.11 - x1- x2) + x1^2 + x2^2 + a*rgamma(n, shape=2*x1+3, scale = 1) +
(1-a)*rnorm(n, mean = 2*x1 + 3, sd = 0.5)
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
regimeClass <- as.formula(a ~ x1+x2)
moCondQuant_0 <- as.formula(y ~ x1+x2+I(x1^2)+I(x2^2))
moCondQuant_1 <- as.formula(y ~ exp( 0.11 - x1 - x2)+ x1^2 + p0 + p1*x1
+ p2*x1^2 + p3*x1^3 +p4*x1^4 )
start_1 = list(p0=0, p1=1.5, p2=1, p3 =0,p4=0)
n <- 400
testdata <- GenerateData.DR(n)
## Examples below correctly specified both the propensity model and
## the conditional quantile model.
system.time(
fit1 <- DR_Qopt(data=testdata, regimeClass = regimeClass,
tau = 0.25,
moPropen = a~I(x1^2)+I(x2^2),
moCondQuant_0 = moCondQuant_0,
moCondQuant_1 = moCondQuant_1,
nlCondQuant_1 = TRUE, start_1=start_1,
pop.size = 1000))
fit1
## Go parallel for the same fit. It would save a lot of time.
### Could even change the cl.setup to larger values
### if more cores are available.
system.time(fit2 <- DR_Qopt(data=testdata, regimeClass = regimeClass,
tau = 0.25,
moPropen = a~I(x1^2)+I(x2^2),
moCondQuant_0 = moCondQuant_0,
moCondQuant_1 = moCondQuant_1,
nlCondQuant_1 = TRUE, start_1=start_1,
pop.size = 1000, cl.setup=2))
fit2
Estimation of the Optimal Treatment Regime defined as Minimizing Gini's Mean Differences
Description
IPWE_MADopt
seeks to estimated the treatment regime which minimizes
the Gini's Mean difference defined below.
Besides mean and quantile criterion, in some applications
people seek minimization of dispersion in the outcome, which, for example, can
be described by Gini's mean difference. Formally, it is defined as the absolute
differences of two random variables Y_1
and Y_2
drawn independently
from the same distribution:
MAD:= E(|Y_1-Y_2|).
Given a treatment regime d
, define the potential outcome of a subject
following the treatment recommended by d
as
Y^{*}(d)
. When d
is followed by everyone in the target population,
the Gini's mean absolute difference is
MAD(d):= E(| Y_1^{*}(d)-Y_2^{*}(d) |).
Usage
IPWE_MADopt(data, regimeClass, moPropen = "BinaryRandom", s.tol, it.num = 8,
hard_limit = FALSE, cl.setup = 1, p_level = 1, pop.size = 3000)
Arguments
data |
a data frame, containing variables in the |
regimeClass |
a formula specifying the class of treatment regimes to search,
e.g. if
Polynomial arguments are also supported. See also 'Details'. |
moPropen |
The propensity score model for the probability of receiving
treatment level 1.
When |
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
hard_limit |
logical. When it is true the maximum number of generations
in |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
Details
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
This estimated parameters indexing the MAD-optimal treatment regime are returned in two scales:
The returned
coefficients
is the set of parameters after covariatesX
are standardized to be in the interval [0, 1]. To be exact, every covariate is subtracted by the smallest observed value and divided by the difference between the largest and the smallest value. Next, we carried out the algorithm in Wang et al. 2017 to get the estimated regime parameters,coefficients
, based on the standardized data. For the identifiability issue, we force the Euclidean norm ofcoefficients
to be 1.In contrast,
coef.orgn.scale
corresponds to the original covariates, so the associated decision rule can be applied directly to novel observations. In other words, let\beta
denote the estimated parameter in the original scale, then the estimated treatment regime is:d(x;\bm{\hat{\beta}})= I\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k > 0\}.
The estimated
\bm{\hat{\beta}}
is returned ascoef.orgn.scale
. The same ascoefficients
, we force the Euclidean norm ofcoef.orgn.scale
to be 1.
If, for every input covariate, the smallest observed value is exactly 0 and the range
(i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coefficients
and coef.orgn.scale
will render identical.
Value
This function returns an object with 6 objects. Both coefficients
and coef.orgn.scale
were normalized to have unit euclidean norm.
coefficients
the parameters indexing the estimated MAD-optimal treatment regime for standardized covariates.
coef.orgn.scale
the parameter indexing the estimated MAD-optimal treatment regime for the original input covariates.
hat_MAD
the estimated MAD when a treatment regime indexed by
coef.orgn.scale
is applied on everyone. See the 'details' for connection betweencoef.orgn.scale
andcoefficient
.call
the user's call.
moPropen
the user specified propensity score model
regimeClass
the user specified class of treatment regimes
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
Examples
GenerateData.MAD <- function(n)
{
x1 <- runif(n)
x2 <- runif(n)
tp <- exp(-1+1*(x1+x2))/(1+exp(-1+1*(x1+x2)))
a<-rbinom(n = n, size = 1, prob=tp)
error <- rnorm(length(x1))
y <- (1 + a*0.6*(-1+x1+x2<0) + a*-0.6*(-1+x1+x2>0)) * error
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
# The true MAD optimal treatment regime for this generative model
# can be deduced trivially, and it is: c( -0.5773503, 0.5773503, 0.5773503).
# With correctly specified propensity model ####
n <- 400
testData <- GenerateData.MAD(n)
fit1 <- IPWE_MADopt(data = testData, regimeClass = a~x1+x2,
moPropen=a~x1+x2, cl.setup=2)
fit1
# With incorrectly specified propensity model ####
fit2 <- IPWE_MADopt(data = testData, regimeClass = a~x1+x2,
moPropen="BinaryRandom", cl.setup=2)
fit2
Estimate the Mean-optimal Treatment Regime
Description
IPWE_Mopt
aims at estimating the treatment regime which
maximizes the marginal mean of the potential outcomes.
Usage
IPWE_Mopt(data, regimeClass, moPropen = "BinaryRandom", max = TRUE,
s.tol = 1e-04, cl.setup = 1, p_level = 1, it.num = 10,
hard_limit = FALSE, pop.size = 3000)
Arguments
data |
a data frame, containing variables in the |
regimeClass |
a formula specifying the class of treatment regimes to search,
e.g. if
Polynomial arguments are also supported. See also 'Details'. |
moPropen |
The propensity score model for the probability of receiving
treatment level 1.
When |
max |
logical. If |
s.tol |
This is the tolerance level used by |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
it.num |
integer > 1. This argument will be used in |
hard_limit |
logical. When it is true the maximum number of generations
in |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
Details
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
This functions returns the estimated parameters indexing the mean-optimal treatment regime under two scales.
The returned coefficients
is the set of parameters when covariates are
all standardized to be in the interval [0, 1] by subtracting the smallest observed
value and divided by the difference between the largest and the smallest value.
While the returned coef.orgn.scale
corresponds to the original covariates,
so the associated decision rule can be applied directly to novel observations.
In other words, let \beta
denote the estimated parameter in the original
scale, then the estimated treatment regime is:
d(x)= I\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k > 0\}.
The estimated \bm{\hat{\beta}}
is returned as coef.orgn.scale
.
If, for every input covariate, the smallest observed value is exactly 0 and the range
(i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coefficients
and coef.orgn.scale
will render identical.
Value
This function returns an object with 6 objects. Both coefficients
and coef.orgn.scale
were normalized to have unit euclidean norm.
coefficients
the parameters indexing the estimated mean-optimal treatment regime for standardized covariates.
coef.orgn.scale
the parameter indexing the estimated mean-optimal treatment regime for the original input covariates.
hatM
the estimated marginal mean when a treatment regime indexed by
coef.orgn.scale
is applied on everyone. See the 'details' for connection betweencoef.orgn.scale
andcoefficient
.call
the user's call.
moPropen
the user specified propensity score model
regimeClass
the user specified class of treatment regimes
Author(s)
Yu Zhou, zhou0269@umn.edu, with substantial contribution from Ben Sherwood.
References
Zhang B, Tsiatis AA, Laber EB and Davidian M (2012). “A robust method for estimating optimal treatment regimes.” Biometrics, 68(4), pp. 1010–1018.
Examples
GenerateData.test.IPWE_Mopt <- function(n)
{
x1 <- runif(n)
x2 <- runif(n)
tp <- exp(-1+1*(x1+x2))/(1+exp(-1+1*(x1+x2)))
error <- rnorm(length(x1), sd=0.5)
a <- rbinom(n = n, size = 1, prob=tp)
y <- 1+x1+x2 + a*(3 - 2.5*x1 - 2.5*x2) +
(0.5 + a*(1+x1+x2)) * error
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
n <- 500
testData <- GenerateData.test.IPWE_Mopt(n)
fit <- IPWE_Mopt(data=testData, regimeClass = a~x1+x2,
moPropen=a~x1+x2,
pop.size=1000)
fit
Estimate the Quantile-optimal Treatment Regime
Description
Estimate the Quantile-optimal Treatment Regime by inverse probability of weighting
Usage
IPWE_Qopt(data, regimeClass, tau, moPropen = "BinaryRandom", max = TRUE,
s.tol, it.num = 8, hard_limit = FALSE, cl.setup = 1, p_level = 1,
pop.size = 3000)
Arguments
data |
a data frame, containing variables in the |
regimeClass |
a formula specifying the class of treatment regimes to search,
e.g. if
Polynomial arguments are also supported. See also 'Details'. |
tau |
a value between 0 and 1. This is the quantile of interest. |
moPropen |
The propensity score model for the probability of receiving
treatment level 1.
When |
max |
logical. If |
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
hard_limit |
logical. When it is true the maximum number of generations
in |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
Details
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
This estimated parameters indexing the quantile-optimal treatment regime are returned in two scales:
The returned
coefficients
is the set of parameters after covariatesX
are standardized to be in the interval [0, 1]. To be exact, every covariate is subtracted by the smallest observed value and divided by the difference between the largest and the smallest value. Next, we carried out the algorithm in Wang et al. 2017 to get the estimated regime parameters,coefficients
, based on the standardized data. For the identifiability issue, we force the Euclidean norm ofcoefficients
to be 1.In contrast,
coef.orgn.scale
corresponds to the original covariates, so the associated decision rule can be applied directly to novel observations. In other words, let\beta
denote the estimated parameter in the original scale, then the estimated treatment regime is:d(x)= I\{\hat{\beta}_0 + \hat{\beta}_1 x_1 + ... + \hat{\beta}_k x_k > 0\}.
The estimated
\bm{\hat{\beta}}
is returned ascoef.orgn.scale
. The same ascoefficients
, we force the Euclidean norm ofcoef.orgn.scale
to be 1.
If, for every input covariate, the smallest observed value is exactly 0 and the range
(i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coefficients
and coef.orgn.scale
will render identical.
Value
This function returns an object with 7 objects. Both coefficients
and coef.orgn.scale
were normalized to have unit euclidean norm.
coefficients
the parameters indexing the estimated quantile-optimal treatment regime for standardized covariates.
coef.orgn.scale
the parameter indexing the estimated quantile-optimal treatment regime for the original input covariates.
tau
the quantile of interest
hatQ
the estimated marginal tau-th quantile when the treatment regime indexed by
coef.orgn.scale
is applied on everyone. See the 'details' for connection betweencoef.orgn.scale
andcoefficient
.call
the user's call.
moPropen
the user specified propensity score model
regimeClass
the user specified class of treatment regimes
Author(s)
Yu Zhou, zhou0269@umn.edu with substantial contribution from Ben Sherwood.
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
Examples
GenerateData <- function(n)
{
x1 <- runif(n, min=-0.5,max=0.5)
x2 <- runif(n, min=-0.5,max=0.5)
error <- rnorm(n, sd= 0.5)
tp <- exp(-1+1*(x1+x2))/(1+exp(-1+1*(x1+x2)))
a <- rbinom(n = n, size = 1, prob=tp)
y <- 1+x1+x2 + a*(3 - 2.5*x1 - 2.5*x2) + (0.5 + a*(1+x1+x2)) * error
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
n <- 300
testData <- GenerateData(n)
# 1. Estimate the 0.25th-quantile optimal treatment regime. ###
fit1 <- IPWE_Qopt(data = testData, regimeClass = "a~x1+x2",
tau = 0.25, moPropen="a~x1+x2")
fit1
# 2. Go parallel. This saves time in calculation. ###
fit2 <- IPWE_Qopt(data = testData, regimeClass = "a~x1+x2",
tau = 0.25, moPropen="a~x1+x2", cl.setup=2)
fit2
# 3. Set a quardratic term in the class #######################
fit3 <- IPWE_Qopt(data = testData, regimeClass = "a~x1+x2+I(x1^2)",
tau = 0.25, moPropen="a~x1+x2", pop.size=1000)
fit3
# 4. Set screen prints level. #######################
# Set the p_level to be 0,
# then all screen prints from the genetic algorithm will be suppressed.
fit4 <- IPWE_Qopt(data = testData, regimeClass = "a~x1+x2",
tau = 0.25, moPropen="a~x1+x2", cl.setup=2, p_level=0)
fit4
Estimate the Two-stage Mean-Optimal Treatment Regime
Description
This function implements the estimator of two-stage mean-optimal treatment regime by inverse probability of weighting proposed by Baqun Zhang. As there are more than one stage, the second stage treatment regime could take into account the evolving status of an individual after the first stage and the treatment level received in the first stage. We assume the options at the two stages are both binary and take the form:
d_1(x_{stage1})=I\left(\beta_{10} +\beta_{11} x_{11} +...+ \beta_{1k} x_{1k} > 0\right),
d_2(x_{stage2})=I\left(\beta_{20} +\beta_{21} x_{21} +...+ \beta_{2j} x_{2j} > 0\right)
Usage
TwoStg_Mopt(data, regimeClass.stg1, regimeClass.stg2,
moPropen1 = "BinaryRandom", moPropen2 = "BinaryRandom", max = TRUE,
s.tol, cl.setup = 1, p_level = 1, it.num = 10, pop.size = 3000,
hard_limit = FALSE)
Arguments
data |
a data frame, containing variables in the |
regimeClass.stg1 |
a formula or a string specifying the Class of treatment regimes
at stage 1, e.g. |
regimeClass.stg2 |
a formula or a string specifying the Class of treatment regimes
at stage 2, e.g. |
moPropen1 |
The propensity score model for the probability of receiving
treatment level 1 at the first stage .
When |
moPropen2 |
The propensity score model for the probability of receiving
treatment level 1 at the second stage .
When |
max |
logical. If |
s.tol |
This is the tolerance level used by |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
it.num |
integer > 1. This argument will be used in |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
hard_limit |
logical. When it is true the maximum number of generations
in |
Details
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
For every stage k
, k=1,2
, this estimated parameters indexing the two-stage mean-optimal treatment regime
are returned in two scales:
, the returned
coef.k
is the set of parameters that we estimated after standardizing every covariate available for decision-making at stagek
to be in the interval [0, 1]. To be exact, every covariate is subtracted by the smallest observed value and divided by the difference between the largest and the smallest value. Next, we carried out the algorithm in Wang 2016 to get the estimated regime parameters,coef.k
, based on the standardized data. For the identifiability issue, we force the Euclidean norm ofcoef.k
to be 1.The difference between
coef.k
andcoef.orgn.scale.k
is that the latter set of parameters correspond to the original covariates, so the associated decision rule can be applied directly to novel observations. In other words, let\beta
denote the estimated parameter in the original scale, then the estimated treatment regime is:d(x)= I\{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k > 0\},
where the
\beta
values are returned ascoef.orgn.scale.k
, and the the vector(1, x_1,...,x_k)
corresponds to the specified class of treatment regimes in thek
th stage.
If, for every input covariate, the smallest observed value is exactly 0 and the range
(i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coef.k
and coef.orgn.scale.k
will render identical.
Value
This function returns an object with 6 objects. Both coef.1
, coef.2
and coef.orgn.scale.1
, coef.orgn.scale.2
were normalized to have unit euclidean norm.
coef.1
,coef.2
the set of parameters indexing the estimated mean-optimal treatment regime for standardized covariates.
coef.orgn.scale.1
,coef.orgn.scale.2
the set of parameter indexing the estimated mean-optimal treatment regime for the original input covariates.
hatM
the estimated marginal mean when the treatment regime indexed by
coef.orgn.scale.1
andcoef.orgn.scale.2
is applied on the entire population. See the 'details' for connection betweencoef.orgn.scale.k
andcoef.k
.call
the user's call.
moPropen1
,moPropen2
the user specified propensity score models for the first and the second stage respectively
regimeClass.stg1
,regimeClass.stg2
the user specified class of treatment regimes for the first and the second stage respectively
Author(s)
Yu Zhou, zhou0269@umn.edu
References
Zhang B, Tsiatis AA, Laber EB and Davidian M (2013). “Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions.” Biometrika, 100(3).
Examples
ilogit <- function(x) exp(x)/(1 + exp(x))
GenerateData.2stg <- function(n){
x1 <- runif(n)
p1 <- ilogit(-0.5+x1)
a1 <- rbinom(n, size=1, prob=p1)
x2 <- runif(n, x1, x1+1)
p2 <- ilogit(-1 + x2)
a2 <- rbinom(n, size=1, prob=p2)
mean <- 1+x1+a1*(1-3*(x1-0.2)^2) +x2 + a2*(1-x2-x1)
y <- mean + (1+a1*(x1-0.5)+0.5*a2*(x2-1))*rnorm(n,0,sd = 1)
return(data.frame(x1,a1,x2,a2,y))
}
n <- 400
testdata <- GenerateData.2stg(n)
fit <- TwoStg_Mopt(data=testdata,
regimeClass.stg1="a1~x1", regimeClass.stg2="a2~x1+a1+x2",
moPropen1="a1~x1", moPropen2="a2~x2",
cl.setup=2)
fit
fit2 <- TwoStg_Mopt(data=testdata,
regimeClass.stg1="a1~x1", regimeClass.stg2="a2~a1+x1*x2",
moPropen1="a1~x1", moPropen2="a2~x2",
cl.setup=2)
fit2
Estimate the Two-stage Quantile-optimal Treatment Regime
Description
This function implements the estimator of two-stage quantile-optimal treatment regime by inverse probability of weighting proposed by Lan Wang, et al. As there are more than one stage, the second stage treatment regime could take into account the evolving status of an individual after the first stage and the treatment level received in the first stage. We assume the options at the two stages are both binary and take the form:
d_1(x)=I\left(\beta_{10} +\beta_{11} x_{11} +...+ \beta_{1k} x_{1k} > 0\right),
d_2(x)=I\left(\beta_{20} +\beta_{21} x_{21} +...+ \beta_{2p} x_{2p} > 0\right)
Usage
TwoStg_Qopt(data, tau, regimeClass.stg1, regimeClass.stg2,
moPropen1 = "BinaryRandom", moPropen2 = "BinaryRandom", s.tol = 1e-04,
it.num = 8, max = TRUE, cl.setup = 1, p_level = 1, pop.size = 1000,
hard_limit = FALSE)
Arguments
data |
a data frame, containing variables in the |
tau |
a value between 0 and 1. This is the quantile of interest. |
regimeClass.stg1 |
a formula or a string specifying the Class of treatment regimes
at stage 1, e.g. |
regimeClass.stg2 |
a formula or a string specifying the Class of treatment regimes
at stage 2, e.g. |
moPropen1 |
The propensity score model for the probability of receiving
treatment level 1 at the first stage .
When |
moPropen2 |
The propensity score model for the probability of receiving
treatment level 1 at the second stage .
When |
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
max |
logical. If |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
hard_limit |
logical. When it is true the maximum number of generations
in |
Details
Note that all estimation functions in this package use the same type of standardization on covariates. Doing so would allow us to provide a bounded domain of parameters for searching in the genetic algorithm.
For every stage k
, k=1,2
, this estimated parameters indexing the
two-stage quantile-optimal treatment regime are returned in two scales:
, the returned
coef.k
is the set of parameters that we estimated after standardizing every covariate available for decision-making at stagek
to be in the interval [0, 1]. To be exact, every covariate is subtracted by the smallest observed value and divided by the difference between the largest and the smallest value. Next, we carried out the algorithm in Wang et. al. 2017 to get the estimated regime parameters,coef.k
, based on the standardized data. For the identifiability issue, we force the Euclidean norm ofcoef.k
to be 1.The difference between
coef.k
andcoef.orgn.scale.k
is that the latter set of parameters correspond to the original covariates, so the associated decision rule can be applied directly to novel observations. In other words, let\beta
denote the estimated parameter in the original scale, then the estimated treatment regime is:d(x)= I\{\beta_0 + \beta_1 x_1 + ... + \beta_k x_k > 0\},
where the
\beta
values are returned ascoef.orgn.scale.k
, and the the vector(1, x_1,...,x_k)
corresponds to the specified class of treatment regimes in thek
th stage.
If, for every input covariate, the smallest observed value is exactly 0 and the range
(i.e. the largest number minus the smallest number) is exactly 1, then the estimated
coef.k
and coef.orgn.scale.k
will render identical.
Value
This function returns an object with 7 objects. Both coefficients
and coef.orgn.scale
were normalized to have unit euclidean norm.
coef.1
,coef.2
the set of parameters indexing the estimated quantile-optimal treatment regime for standardized covariates.
coef.orgn.scale.1
,coef.orgn.scale.2
the set of parameter indexing the estimated quantile-optimal treatment regime for the original input covariates.
tau
the quantile of interest
hatQ
the estimated marginal quantile when the treatment regime indexed by
coef.orgn.scale.1
andcoef.orgn.scale.2
is applied on the entire population. See the 'details' for connection betweencoef.orgn.scale.k
andcoef.k
.call
the user's call.
moPropen1
,moPropen2
the user specified propensity score models for the first and the second stage respectively
regimeClass.stg1
,regimeClass.stg2
the user specified class of treatment regimes for the first and the second stage respectively
Author(s)
Yu Zhou, zhou0269@umn.edu
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
Examples
ilogit <- function(x) exp(x)/(1 + exp(x))
GenerateData.2stg <- function(n){
x1 <- runif(n)
p1 <- ilogit(-0.5+x1)
a1 <- rbinom(n, size=1, prob=p1)
x2 <- runif(n,x1,x1+1)
p2 <- ilogit(-1 + x2)
a2 <- rbinom(n, size=1, prob=p2)
mean <- 1+x1+a1*(1-3*(x1-0.2)^2) +x2 + a2*(1-x2-x1)
y <- mean + (1+a1*(x1-0.5)+0.5*a2*(x2-1))*rnorm(n,0,sd = 1)
return(data.frame(x1,a1,x2,a2,y))
}
n <- 400
testdata <- GenerateData.2stg(n)
fit <- TwoStg_Qopt(data=testdata, tau=0.2,
regimeClass.stg1=a1~x1, regimeClass.stg2=a2~x1+a1+x2,
moPropen1=a1~x1, moPropen2=a2 ~ x2,
cl.setup=2)
fit
Estimate the Gini's mean difference/mean absolute difference(MAD) for a Given Treatment Regime
Description
Estimate the MAD if the entire population follows a
treatment regime indexed by the given parameters.
This function supports the IPWE_MADopt
function.
Usage
abso_diff_est(beta, x, y, a, prob, Cnobs)
Arguments
beta |
a vector indexing the treatment regime. It indexes a linear treatment regime:
|
x |
a matrix of observed covariates from the sample.
Notice that we assumed the class of treatment regimes is linear.
This is important that columns in |
y |
a vector, the observed responses from a sample |
a |
a vector of 0s and 1s, the observed treatments from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
Cnobs |
A matrix with two columns, enumerating all possible combinations of
pairs of indexes. This can be generated by |
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
See Also
The function IPWE_MADopt
is based on this function.
Examples
library(stats)
GenerateData.MAD <- function(n)
{
x1 <- runif(n)
x2 <- runif(n)
tp <- exp(-1+1*(x1+x2))/(1+exp(-1+1*(x1+x2)))
a<-rbinom(n = n, size = 1, prob=tp)
error <- rnorm(length(x1))
y <- (1 + a*0.3*(-1+x1+x2<0) + a*-0.3*(-1+x1+x2>0)) * error
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
n <- 500
testData <- GenerateData.MAD(n)
logistic.model.tx <- glm(formula = a~x1+x2, data = testData, family=binomial)
ph <- as.vector(logistic.model.tx$fit)
Cnobs <- combn(1:n, 2)
abso_diff_est(beta=c(1,2,-1),
x=model.matrix(a~x1+x2, testData),
y=testData$y,
a=testData$a,
prob=ph,
Cnobs = Cnobs)
Generate Pseudo-Responses Based on Conditional Quantile Regression Models
Description
This function supports the DR_Qopt
function.
For every observation, we generate pseudo-observations corresponding
to treatment 0 and 1 respectively based on working conditional quantile models.
Usage
augX(raw.data, length.out = 200, txVec, moCondQuant_0, moCondQuant_1,
nlCondQuant_0 = FALSE, nlCondQuant_1 = FALSE, start_0 = NULL,
start_1 = NULL, clnodes)
Arguments
raw.data |
A data frame, must contain all the variables that appear in
|
length.out |
an integer greater than 1. If one of the conditional quantile
model is set to be nonlinear, this argument will be triggered and we will fit
|
txVec |
a numeric vector of observed treatment levels coded 0L and 1L. |
moCondQuant_0 |
A formula, used to specify the formula for the conditional quantile function when treatment = 0. |
moCondQuant_1 |
A formula, used to specify the formula for the conditional quantile function when treatment = 1. |
nlCondQuant_0 |
logical.
When |
nlCondQuant_1 |
logical.
When |
start_0 |
either a list object, providing the starting value in estimating
the parameters in the nonlinear conditional quantile model, given that treatment=0.
Default is |
start_1 |
either a list object, providing the starting value in estimating
the parameters in the nonlinear conditional quantile model, given that treatment=0.
Default is |
clnodes |
Either a cluster object to enable parallel computation or
|
Details
This function implements the algorithm to generate individual level pseudo responses for two treatment levels respectively.
For each observation, two independent random variables from
{unif}[0,1]
are generated. Denote them by u_0
and u_1
. Approximately, this function then estimates the u_0
th quantile
of this observation were treatment level 0 is applied via the conditional u_0
th quantile regression.
This estimated quantile will be the pseudo-response for treatment 0.
Similarly, this function the pseudo-response for treatment 1 will be estimated and returned.
See the reference paper for a more formal explanation.
Value
It returns a list object, consisting of the following elements:
-
y.a.0
, the vector of estimated individual level pseudo outcomes, given the treatment is 0; -
y.a.1
, the vector of estimated individual level pseudo outcomes, given the treatment is 1; -
nlCondQuant_0
, logical, indicating whether they.a.0
is generated based on a nonlinear conditional quantile model. -
nlCondQuant_1
, logical, indicating whether they.a.1
is generated based on a nonlinear conditional quantile model.
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
Examples
ilogit <- function(x) exp(x)/(1 + exp(x))
GenerateData.DR <- function(n)
{
x1 <- runif(n,min=-1.5,max=1.5)
x2 <- runif(n,min=-1.5,max=1.5)
tp <- ilogit( 1 - 1*x1^2 - 1* x2^2)
a <-rbinom(n,1,tp)
y <- a * exp(0.11 - x1- x2) + x1^2 + x2^2 + a*rgamma(n, shape=2*x1+3, scale = 1) +
(1-a)*rnorm(n, mean = 2*x1 + 3, sd = 0.5)
return(data.frame(x1=x1,x2=x2,a=a,y=y))
}
regimeClass = as.formula(a ~ x1+x2)
moCondQuant_0 = as.formula(y ~ x1+x2+I(x1^2)+I(x2^2))
moCondQuant_1 = as.formula(y ~ exp( 0.11 - x1 - x2)+ x1^2 + p0 + p1*x1
+ p2*x1^2 + p3*x1^3 +p4*x1^4 )
start_1 = list(p0=0, p1=1.5, p2=1, p3 =0,p4=0)
## Not run:
n<-200
testdata <- GenerateData.DR(n)
fit1 <- augX(raw.data=testdata, txVec = testdata$a,
moCondQuant_0=moCondQuant_0, moCondQuant_1=moCondQuant_1,
nlCondQuant_0=FALSE, nlCondQuant_1=TRUE,
start_1=start_1,
clnodes=NULL)
# How to use parallel computing in AugX(): ##
# on Mac OSX/linux
clnodes <- parallel::makeForkCluster(nnodes =getOption("mc.cores",2))
fit2 <- augX(raw.data=testdata, length.out = 5, txVec = testdata$a,
moCondQuant_0=moCondQuant_0, moCondQuant_1=moCondQuant_1,
nlCondQuant_0=FALSE, nlCondQuant_1=TRUE,
start_1=start_1,
clnodes=clnodes)
# on Windows
clnodes <- parallel::makeCluster(2, type="PSOCK")
fit3 <- augX(raw.data=testdata, length.out = 5, txVec = testdata$a,
moCondQuant_0=moCondQuant_0, moCondQuant_1=moCondQuant_1,
nlCondQuant_0=FALSE, nlCondQuant_1=TRUE,
start_1=start_1,
clnodes=clnodes)
## End(Not run)
The Doubly Robust Quantile Estimator for a Given Treatment Regime
Description
Given a fixed treatment regime, this doubly robust estimator estimates the marginal quantile of responses when it is followed by every unit in the target population. It took advantages of conditional quantile functions for different treatment levels when they are available.
Usage
dr_quant_est(beta, x, y, a, prob, tau, y.a.0, y.a.1, num_min = FALSE)
Arguments
beta |
a vector indexing the treatment regime. It indexes a linear treatment regime:
|
x |
a matrix of observed covariates from the sample.
Notice that we assumed the class of treatment regimes is linear.
This is important that columns in |
y |
a vector, the observed responses from a sample |
a |
a vector of 0s and 1s, the observed treatments from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
tau |
The quantile of interest |
y.a.0 |
Estimated conditional potential outcome given that treatment = 0,
which can be calculated by the function |
y.a.1 |
Estimated conditional potential outcome given that treatment = 1,
which can be calculated by the function |
num_min |
logical. If |
Details
The double robustness property means that it can consistently estimate the marginal quantile when either the propensity score model is correctly specified, or the conditional quantile function is correctly specified.
See Also
Get the OS from R
Description
Get the type of the operating system. The returned value is used in configuring parallel computation for the implemented algorithms.
Usage
get_os()
References
This function is adapted from https://www.r-bloggers.com/identifying-the-os-from-r/
The Inverse Probability Weighted Estimator of the Marginal Mean Given a Specific Treatment Regime
Description
Estimate the marginal mean of the response when the entire population follows a treatment regime. This function implements the inverse probability weighted estimator proposed by Baqun Zhang et. al..
This function supports the mestimate
function.
Usage
mean_est(beta, x, a, y, prob)
Arguments
beta |
a vector indexing the treatment regime. It indexes a linear treatment regime:
|
x |
a matrix of observed covariates from the sample.
Notice that we assumed the class of treatment regimes is linear.
This is important that columns in |
a |
a vector of 0s and 1s, the observed treatments from a sample |
y |
a vector, the observed responses from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
References
Zhang B, Tsiatis AA, Laber EB and Davidian M (2012). “A robust method for estimating optimal treatment regimes.” Biometrics, 68(4), pp. 1010–1018.
The Mean-Optimal Treatment Regime Wrapper Function
Description
The wrapper function for mean-optimal treatment regime that calls a genetic algorithm.
This function supports the IPWE_Mopt
function.
Usage
mestimate(x, y, a, prob, p_level, nvars, hard_limit = FALSE, max = TRUE,
cl.setup = 1, s.tol = 1e-04, it.num = 8, pop.size = 3000)
Arguments
x |
a matrix of observed covariates from the sample. Notice that we assumed the class of treatment regimes is linear. |
y |
a vector, the observed responses from a sample |
a |
a vector of 0s and 1s, the observed treatments from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
nvars |
an integer. The number of parameters indexing a treatment regime. |
hard_limit |
logical. This logical variable determines if the max.generations variable is a binding constraint for genoud. |
max |
logical. If |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
References
Zhang B, Tsiatis AA, Laber EB and Davidian M (2012). “A robust method for estimating optimal treatment regimes.” Biometrics, 68(4), pp. 1010–1018.
See Also
The function IPWE_Mopt
is based on this function.
The Quantile-Optimal Treatment Regime Wrapper Function
Description
The wrapper function for quantile-optimal treatment regime that calls a genetic algorithm.
This function supports the IPWE_Qopt
function.
Usage
qestimate(tau, x, y, a, prob, p_level, nvars, hard_limit, max = TRUE,
cl.setup = 1, s.tol = 1e-04, it.num = 8, pop.size = 3000)
Arguments
tau |
a numeric value between 0 and 1. The quantile level of interest. |
x |
a matrix of observed covariates from the sample. Notice that we assumed the class of treatment regimes is linear. |
y |
a vector, the observed responses from a sample |
a |
a vector of 0s and 1s, the observed treatments from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
p_level |
choose between 0,1,2,3 to indicate different levels of output from the genetic function. Specifically, 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug.) |
nvars |
an integer. The number of parameters indexing a treatment regime. |
hard_limit |
logical. This logical variable determines if the max.generations
variable is a binding constraint for |
max |
logical. If |
cl.setup |
the number of nodes. >1 indicates choosing parallel computing option in
|
s.tol |
This is the tolerance level used by |
it.num |
integer > 1. This argument will be used in |
pop.size |
an integer with the default set to be 3000. This is the population number for the first generation
in the genetic algorithm ( |
References
Wang L, Zhou Y, Song R and Sherwood B (2017). “Quantile-Optimal Treatment Regimes.” Journal of the American Statistical Association.
See Also
The function IPWE_Qopt
is based on this function.
Estimate the Marginal Quantile Given a Specific Treatment Regime
Description
Estimate the marginal quantile if the entire population follows a
treatment regime indexed by the given parameters.
This function supports the qestimate
function.
Usage
quant_est(beta, x, y, a, prob, tau)
Arguments
beta |
a vector indexing the treatment regime. It indexes a linear treatment regime:
|
x |
a matrix of observed covariates from the sample.
Notice that we assumed the class of treatment regimes is linear.
This is important that columns in |
y |
a vector, the observed responses from a sample |
a |
a vector of 0s and 1s, the observed treatments from a sample |
prob |
a vector, the propensity scores of getting treatment 1 in the samples |
tau |
a numeric value between 0 and 1. The quantile level of interest. |