Title: | Linear Regression and Logistic Regression with Missing Covariates |
Version: | 1.0.1 |
Date: | 2021-04-07 |
Description: | Estimate parameters of linear regression and logistic regression with missing covariates with missing data, perform model selection and prediction, using EM-type algorithms. Jiang W., Josse J., Lavielle M., TraumaBase Group (2020) <doi:10.1016/j.csda.2019.106907>. |
Depends: | R (≥ 3.4.0) |
Encoding: | UTF-8 |
License: | GPL-3 |
URL: | https://github.com/julierennes/misaem |
Imports: | mvtnorm, stats, MASS, norm, methods |
Suggests: | knitr, rmarkdown, mice |
LazyData: | false |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Author: | Wei Jiang [aut], Pavlo Mozharovskyi [ctb], Julie Josse [aut, cre], Imke Mayer [ctb] |
Maintainer: | Julie Josse <julie.josserennes@gmail.com> |
Repository: | CRAN |
Packaged: | 2021-04-09 13:00:17 UTC; imke |
Date/Publication: | 2021-04-12 08:10:02 UTC |
combinations
Description
Given all the possible patterns of missingness.
Usage
combinations(p)
Arguments
p |
Dimension of covariates. |
Value
A matrix containing all the possible missing patterns. Each row indicates a pattern of missingness. "1" means "observed", 0 means "missing".
Examples
comb = combinations(5)
Function for imputing single point for linear regression model
Description
Function for imputing single point for linear regression model
Usage
imputeEllP(point, Sigma.inv)
Arguments
point |
A single observation containing missing values. |
Sigma.inv |
Inverse of estimated |
Value
Imputed observation.
likelihood_saem
Description
Used in main function miss.saem. Calculate the observed log-likelihood for logistic regression model with missing data, using Monte Carlo version of Louis formula.
Usage
likelihood_saem(
beta,
mu,
Sigma,
Y,
X.obs,
rindic = as.matrix(is.na(X.obs)),
whichcolXmissing = (1:ncol(rindic))[apply(rindic, 2, sum) > 0],
mc.size = 2
)
Arguments
beta |
Estimated parameter of logistic regression model. |
mu |
Estimated parameter |
Sigma |
Estimated parameter |
Y |
Response vector |
X.obs |
Design matrix with missingness |
rindic |
Missing pattern of X.obs. If a component in X.obs is missing, the corresponding position in rindic is 1; else 0. |
whichcolXmissing |
The column index in covariate containing at least one missing observation. |
mc.size |
Monte Carlo sampling size. |
Value
Observed log-likelihood.
Examples
# Generate dataset
N <- 50 # number of subjects
p <- 3 # number of explanatory variables
mu.star <- rep(0,p) # mean of the explanatory variables
Sigma.star <- diag(rep(1,p)) # covariance
beta.star <- c(1, 1, 0) # coefficients
beta0.star <- 0 # intercept
beta.true = c(beta0.star,beta.star)
X.complete <- matrix(rnorm(N*p), nrow=N)%*%chol(Sigma.star) +
matrix(rep(mu.star,N), nrow=N, byrow = TRUE)
p1 <- 1/(1+exp(-X.complete%*%beta.star-beta0.star))
y <- as.numeric(runif(N)<p1)
# Generate missingness
p.miss <- 0.10
patterns <- runif(N*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
# Observed log-likelihood
ll_obs = likelihood_saem(beta.true,mu.star,Sigma.star,y,X.obs)
log_reg
Description
Calculate the likelihood or log-likelihood for one observation of logistic regression model .
Usage
log_reg(y, x, beta, iflog = TRUE)
Arguments
y |
Response value (0 or 1). |
x |
Covariate vector of dimension |
beta |
Estimated parameter of logistic regression model. |
iflog |
If TRUE, log_reg calculate the log-likelihood; else likelihood. |
Value
Likelihood or log-likelihood.
Examples
res = log_reg(1,c(1,2,3),c(1,-1,1))
louis_lr_saem
Description
Used in main function miss.saem. Calculate the variance of estimated parameters for logistic regression model with missing data, using Monte Carlo version of Louis formula.
Usage
louis_lr_saem(
beta,
mu,
Sigma,
Y,
X.obs,
pos_var = 1:ncol(X.obs),
rindic = as.matrix(is.na(X.obs)),
whichcolXmissing = (1:ncol(rindic))[apply(rindic, 2, sum) > 0],
mc.size = 2
)
Arguments
beta |
Estimated parameter of logistic regression model. |
mu |
Estimated parameter |
Sigma |
Estimated parameter |
Y |
Response vector |
X.obs |
Design matrix with missingness |
pos_var |
Index of selected covariates. |
rindic |
Missing pattern of X.obs. If a component in X.obs is missing, the corresponding position in rindic is 1; else 0. |
whichcolXmissing |
The column index in covariate containing at least one missing observation. |
mc.size |
Monte Carlo sampling size. |
Value
Variance of estimated \beta
.
Examples
# Generate dataset
N <- 50 # number of subjects
p <- 3 # number of explanatory variables
mu.star <- rep(0,p) # mean of the explanatory variables
Sigma.star <- diag(rep(1,p)) # covariance
beta.star <- c(1, 1, 0) # coefficients
beta0.star <- 0 # intercept
beta.true = c(beta0.star,beta.star)
X.complete <- matrix(rnorm(N*p), nrow=N)%*%chol(Sigma.star) +
matrix(rep(mu.star,N), nrow=N, byrow = TRUE)
p1 <- 1/(1+exp(-X.complete%*%beta.star-beta0.star))
y <- as.numeric(runif(N)<p1)
# Generate missingness
p.miss <- 0.10
patterns <- runif(N*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
# Louis formula to obtain variance of estimates
V_obs = louis_lr_saem(beta.true,mu.star,Sigma.star,y,X.obs)
Statistical Inference for Logistic Regression Models with Missing Values
Description
This function is used to perform statistical inference for logistic regression model with missing values, by algorithm SAEM.
Usage
miss.glm(formula, data, control = list(...), ...)
Arguments
formula |
an object of class " |
data |
an optional data frame containing the variables in the model. If not found in |
control |
a list of parameters for controlling the fitting process. For |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
Value
An object of class "miss.glm
": a list with following components:
coefficients |
Estimated |
ll |
Observed log-likelihood. |
var.covar |
Variance-covariance matrix for estimated parameters. |
s.err |
Standard error for estimated parameters. |
mu.X |
Estimated |
Sig.X |
Estimated |
call |
the matched call. |
formula |
the formula supplied. |
Examples
# Generate dataset
N <- 100 # number of subjects
p <- 3 # number of explanatory variables
mu.star <- rep(0,p) # mean of the explanatory variables
Sigma.star <- diag(rep(1,p)) # covariance
beta.star <- c(1, 1, 0) # coefficients
beta0.star <- 0 # intercept
beta.true = c(beta0.star,beta.star)
X.complete <- matrix(rnorm(N*p), nrow=N)%*%chol(Sigma.star) +
matrix(rep(mu.star,N), nrow=N, byrow = TRUE)
p1 <- 1/(1+exp(-X.complete%*%beta.star-beta0.star))
y <- as.numeric(runif(N)<p1)
# Generate missingness
p.miss <- 0.10
patterns <- runif(N*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
df.obs = data.frame(y,X.obs)
# SAEM
miss.list = miss.glm(y~., data=df.obs, print_iter=FALSE,seed=100)
print(miss.list)
print(summary(miss.list))
summary(miss.list)$coef
Auxiliary for Controlling Fitting
Description
Auxiliary function for miss.glm
fitting. Typically only used internally by miss.glm.fit
.
Usage
miss.glm.control(
maxruns = 500,
tol_em = 1e-07,
nmcmc = 2,
tau = 1,
k1 = 50,
subsets = NA,
seed = NA,
print_iter = TRUE,
var_cal = TRUE,
ll_obs_cal = TRUE
)
Arguments
maxruns |
maximum number of iterations. The default is maxruns = 500. |
tol_em |
the tolerance to stop SAEM. The default is tol_em = 1e-7. |
nmcmc |
the MCMC length. The default is nmcmc = 2. |
tau |
rate |
k1 |
number of first iterations |
subsets |
Index of selected covariates if any. The default is all the covariates. |
seed |
an integer as a seed set for the random generator. |
print_iter |
logical indicating if output should be produced for each iteration. |
var_cal |
logical indicating if the variance of estimated parameters should be calculated. |
ll_obs_cal |
logical indicating if the observed log-likelihood should be calculated. |
Value
A list with components named as the arguments.
Examples
## For examples see example(miss.glm)
Fitting Logistic Regression Models with Missing Values
Description
This function is used inside miss.glm
to fit logistic regression model with missing values, by algorithm SAEM.
Usage
miss.glm.fit(x, y, control = list())
Arguments
x |
design matrix with missingness |
y |
response vector |
control |
a list of parameters for controlling the fitting process. For |
Value
a list with following components:
coefficients |
Estimated |
ll |
Observed log-likelihood. |
var.covar |
Variance-covariance matrix for estimated parameters. |
s.err |
Standard error for estimated parameters. |
mu.X |
Estimated |
Sig.X |
Estimated |
Examples
## For examples see example(miss.glm)
miss.glm.model.select
Description
Model selection for the logistic regression model with missing data.
Usage
miss.glm.model.select(Y, X, seed = NA)
Arguments
Y |
Binary response vector |
X |
Design matrix with missingness |
seed |
An integer as a seed set for the random generator. The default value is 200. |
Value
An object of class "miss.glm
".
Examples
# Generate dataset
N <- 40 # number of subjects
p <- 3 # number of explanatory variables
mu.star <- rep(0,p) # mean of the explanatory variables
Sigma.star <- diag(rep(1,p)) # covariance
beta.star <- c(1, 1, 0) # coefficients
beta0.star <- 0 # intercept
beta.true = c(beta0.star,beta.star)
X.complete <- matrix(rnorm(N*p), nrow=N)%*%chol(Sigma.star) +
matrix(rep(mu.star,N), nrow=N, byrow = TRUE)
p1 <- 1/(1+exp(-X.complete%*%beta.star-beta0.star))
Y <- as.numeric(runif(N)<p1)
# Generate missingness
p.miss <- 0.10
patterns <- runif(N*p)<p.miss #missing completely at random
X <- X.complete
X[patterns] <- NA
# model selection for SAEM
miss.model = miss.glm.model.select(Y,X,seed=100)
print(miss.model)
Statistical Inference for Linear Regression Models with Missing Values
Description
This function is used to perform statistical inference for linear regression model with missing values, by algorithm EM.
Usage
miss.lm(formula, data, control = list(...), ...)
Arguments
formula |
an object of class " |
data |
an optional data frame containing the variables in the model. If not found in |
control |
a list of parameters for controlling the fitting process. For |
... |
arguments to be used to form the default control argument if it is not supplied directly. |
Value
An object of class "miss.lm
": a list with following components:
coefficients |
Estimated |
ll |
Observed log-likelihood. |
s.resid |
Estimated standard error for residuals. |
s.err |
Standard error for estimated parameters. |
mu.X |
Estimated |
Sig.X |
Estimated |
call |
the matched call. |
formula |
the formula supplied. |
Examples
# Generate complete data
set.seed(1)
mu.X <- c(1, 1)
Sigma.X <- matrix(c(1, 1, 1, 4), nrow = 2)
n <- 50
p <- 2
X.complete <- matrix(rnorm(n*p), nrow=n)%*%chol(Sigma.X) +
matrix(rep(mu.X,n), nrow=n, byrow = TRUE)
b <- c(2, 3, -1)
sigma.eps <- 0.25
y <- cbind(rep(1, n), X.complete) %*% b + rnorm(n, 0, sigma.eps)
# Add missing values
p.miss <- 0.10
patterns <- runif(n*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
# Estimate regression using EM
df.obs = data.frame(y,X.obs)
miss.list = miss.lm(y~., data=df.obs)
print(miss.list)
print(summary(miss.list))
summary(miss.list)$coef
Auxiliary for Controlling Fitting
Description
Auxiliary function for miss.lm
fitting. Typically only used internally by miss.lm.fit
.
Usage
miss.lm.control(maxruns = 500, tol_em = 1e-07, print_iter = TRUE)
Arguments
maxruns |
maximum number of iterations. The default is maxruns = 500. |
tol_em |
the tolerance to stop EM. The default is tol_em = 1e-4. |
print_iter |
logical indicating if output should be produced for each iteration. |
Value
A list with components named as the arguments.
Examples
## For examples see example(miss.lm)
Fitting Linear Regression Model with Missing Values
Description
This function is used inside miss.lm
to fit linear regression model with missing values, by EM algorithm.
Usage
miss.lm.fit(x, y, control = list())
Arguments
x |
design matrix with missingness |
y |
response vector |
control |
a list of parameters for controlling the fitting process. For |
Value
a list with following components:
coefficients |
Estimated |
ll |
Observed log-likelihood. |
s.resid |
Estimated standard error for residuals. |
s.err |
Standard error for estimated parameters. |
mu.X |
Estimated |
Sig.X |
Estimated |
Examples
## For examples see example(miss.lm)
miss.lm.model.select
Description
Model selection for the linear regression model with missing data.
Usage
miss.lm.model.select(Y, X)
Arguments
Y |
Response vector |
X |
Design matrix with missingness |
Value
An object of class "miss.lm
".
Examples
# Generate complete data
set.seed(1)
mu.X <- c(1, 1)
Sigma.X <- matrix(c(1, 1, 1, 4), nrow = 2)
n <- 50
p <- 2
X.complete <- matrix(rnorm(n*p), nrow=n)%*%chol(Sigma.X) +
matrix(rep(mu.X,n), nrow=n, byrow = TRUE)
b <- c(2, 0, -1)
sigma.eps <- 0.25
y <- cbind(rep(1, n), X.complete) %*% b + rnorm(n, 0, sigma.eps)
# Add missing values
p.miss <- 0.10
patterns <- runif(n*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
# model selection
miss.model = miss.lm.model.select(y, X.obs)
print(miss.model)
Prediction on test with missing values for the logistic regression model.
Description
Prediction on test with missing values for the logistic regression model.
Usage
## S3 method for class 'miss.glm'
predict(object, newdata = NULL, seed = NA, method = "map", ...)
Arguments
object |
a fitted object of class inheriting from "miss.glm". |
newdata |
a data frame in which to look for variables with which to predict. It can contain missing values. |
seed |
An integer as a seed set for the random generator. |
method |
The name of method to deal with missing values in test set. It can be 'map'(maximum a posteriori) or 'impute' (imputation by conditional expectation). Default is 'map'. |
... |
Further arguments passed to or from other methods. |
Value
pr.saem |
The prediction result for logistic regression: the probability of response y=1. |
Examples
# Generate dataset
N <- 100 # number of subjects
p <- 3 # number of explanatory variables
mu.star <- rep(0,p) # mean of the explanatory variables
Sigma.star <- diag(rep(1,p)) # covariance
beta.star <- c(1, 1, 0) # coefficients
beta0.star <- 0 # intercept
beta.true = c(beta0.star,beta.star)
X.complete <- matrix(rnorm(N*p), nrow=N)%*%chol(Sigma.star) +
matrix(rep(mu.star,N), nrow=N, byrow = TRUE)
p1 <- 1/(1+exp(-X.complete%*%beta.star-beta0.star))
y <- as.numeric(runif(N)<p1)
# Generate missingness
p.miss <- 0.10
patterns <- runif(N*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
df.obs = data.frame(y,X.obs)
# SAEM
miss.list = miss.glm(y~., data=df.obs, print_iter=FALSE,seed=100)
# Generate new dataset for prediction
Nt <- 20
Xt <- matrix(rnorm(Nt*p), nrow=Nt)%*%chol(Sigma.star)+
matrix(rep(mu.star,Nt), nrow=Nt, byrow = TRUE)
# Generate missingness in new dataset
patterns <- runif(Nt*p)<p.miss
Xt.obs <- Xt
Xt.obs[patterns] <- NA
# Prediction with missing values
miss.prob = predict(miss.list, data.frame(Xt.obs), method='map')
print(miss.prob)
Prediction on test with missing values for the linear regression model.
Description
Prediction on test with missing values for the linear regression model.
Usage
## S3 method for class 'miss.lm'
predict(object, newdata = NULL, seed = NA, ...)
Arguments
object |
a fitted object of class inheriting from "miss.lm". |
newdata |
a data frame in which to look for variables with which to predict. It can contain missing values. |
seed |
An integer as a seed set for the random generator. |
... |
Further arguments passed to or from other methods. |
Value
pr.y |
The prediction result for linear regression. |
Examples
# Generate complete data
set.seed(1)
mu.X <- c(1, 1)
Sigma.X <- matrix(c(1, 1, 1, 4), nrow = 2)
n <- 50 # train set size
p <- 2 # number of covariates
X.complete <- matrix(rnorm(n*p), nrow=n)%*%chol(Sigma.X) +
matrix(rep(mu.X,n), nrow=n, byrow = TRUE)
b <- c(2, 3, -1)
sigma.eps <- 0.25
y <- cbind(rep(1, n), X.complete) %*% b +
rnorm(n, 0, sigma.eps)
# Add missing values
p.miss <- 0.10
patterns <- runif(n*p)<p.miss #missing completely at random
X.obs <- X.complete
X.obs[patterns] <- NA
# Estimate regression using EM
df.obs = data.frame(y ,X.obs)
miss.list = miss.lm(y~., data=df.obs)
# Generate new dataset for prediction
nt <- 20
Xt <- matrix(rnorm(nt*p), nrow=nt)%*%chol(Sigma.X)+
matrix(rep(mu.X,nt), nrow=nt, byrow = TRUE)
# Generate missingness in new dataset
patterns <- runif(nt*p)<p.miss
Xt.obs <- Xt
Xt.obs[patterns] <- NA
# Prediction with missing values
miss.pred = predict(miss.list, data.frame(Xt.obs))
print(miss.pred)
Print miss.glm
Description
Print results for class miss.glm
.
Usage
## S3 method for class 'miss.glm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
an object of class " |
digits |
minimal number of significant digits. |
... |
further arguments passed to or from other methods. |
Value
No return value, called for coefficient and standard error estimates print.
Examples
## For examples see example(miss.glm)
Print miss.lm
Description
Print results for class miss.lm
.
Usage
## S3 method for class 'miss.lm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
an object of class " |
digits |
minimal number of significant digits. |
... |
further arguments passed to or from other methods. |
Value
No return value, called for coefficient and standard error estimates print.
Examples
## For examples see example(miss.lm)
Print Summary of miss.glm
Description
Print results for class summary.miss.glm
.
Usage
## S3 method for class 'summary.miss.glm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
an object of class " |
digits |
minimal number of significant digits. |
... |
further arguments passed to or from other methods. |
Value
No return value, called for summary print.
Examples
## For examples see example(miss.glm)
Print Summary of miss.lm
Description
Print results for class summary.miss.lm
.
Usage
## S3 method for class 'summary.miss.lm'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
an object of class " |
digits |
minimal number of significant digits. |
... |
further arguments passed to or from other methods. |
Value
No return value, called for summary print.
Examples
## For examples see example(miss.lm)
Summarizing Fits for miss.glm
Description
Summary for class miss.glm
.
Usage
## S3 method for class 'miss.glm'
summary(object, ...)
Arguments
object |
an object of class " |
... |
Further arguments passed to or from other methods. |
Value
An object of class "summary.miss.glm
", a list with following components:
coefficients |
The matrix of coefficients and standard errors |
loglikelihood |
Observed log-likelihood. |
call |
the component from |
formula |
the component from |
Examples
## For examples see example(miss.glm)
Summarizing Fits for miss.lm
Description
Summary for class miss.lm
.
Usage
## S3 method for class 'miss.lm'
summary(object, ...)
Arguments
object |
an object of class " |
... |
Further arguments passed to or from other methods. |
Value
An object of class "summary.miss.lm
", a list with following components:
coefficients |
The matrix of coefficients and standard errors. |
loglikelihood |
Observed log-likelihood. |
call |
the component from |
formula |
the component from |
Examples
## For examples see example(miss.lm)