| Type: | Package | 
| Title: | Estimating the Error Variance in a High-Dimensional Linear Model | 
| Version: | 0.9.0 | 
| Maintainer: | Guo Yu <gy63@cornell.edu> | 
| Description: | Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <doi:10.48550/arXiv.1712.02412>. | 
| URL: | https://arxiv.org/abs/1712.02412 | 
| BugReports: | https://github.com/hugogogo/natural/issues | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 6.0.1 | 
| Imports: | Matrix, glmnet | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | yes | 
| Packaged: | 2018-01-16 01:32:01 UTC; hugo | 
| Author: | Guo Yu [aut, cre] | 
| Repository: | CRAN | 
| Date/Publication: | 2018-01-16 10:35:43 UTC | 
natural: Natural and Organic lasso estimates of error variance in high-dimensional linear models
Description
The package contains implementation of the two methods introduced in Yu, Bien (2017) https://arxiv.org/abs/1712.02412.
Details
The main functions are nlasso_cv, olasso_cv, and olasso.
Get the two (theoretical) values of lambdas used in the organic lasso
Description
Get the two (theoretical) values of lambdas used in the organic lasso
Usage
getLam_olasso(x)
Arguments
| x | design matrix | 
Get the two (theoretical) values of lambdas used in scaled lasso
Description
Get the two (theoretical) values of lambdas used in scaled lasso
Usage
getLam_slasso(n, p)
Arguments
| n | number of observations | 
| p | number of features | 
Generate sparse linear model and random samples
Description
Generate design matrix and response following linear models
y = X \beta + \epsilon, where
\epsilon ~ N(0, \sigma^2), and X ~ N(0, \Sigma).
Usage
make_sparse_model(n, p, alpha, rho, snr, nsim)
Arguments
| n | the sample size | 
| p | the number of features | 
| alpha | sparsity, i.e.,  | 
| rho | pairwise correlation among features | 
| snr | signal to noise ratio, defined as  | 
| nsim | the number of simulations | 
Value
A list object containing:
- x:
- The - nby- pdesign matrix
- y:
- The - nby- nsimmatrix of response vector, each column representing one replication of the simulation
- beta:
- The true regression coefficient vector 
- sigma:
- The true error standard deviation 
Cross-validation for natural lasso
Description
Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
Usage
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
  flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08,
  glmnet_output = NULL)
Arguments
| x | An  | 
| y | A response vector of size  | 
| lambda | A user specified list of tuning parameter. Default to be NULL, and the program will compute its own  | 
| intercept | Indicator of whether intercept should be fitted. Default to be  | 
| nlam | The number of  | 
| flmin | The ratio of the smallest and the largest values in  | 
| nfold | Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal  | 
| foldid | A vector of length  | 
| thresh | Threshold value for underlying optimization algorithm to claim convergence. Default to be  | 
| glmnet_output | Should the estimate be computed using a user-specified output from  | 
Value
A list object containing:
- nand- p:
- The dimension of the problem. 
- lambda:
- The path of tuning parameter used. 
- beta:
- Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation. 
- a0:
- Estimate of intercept 
- mat_mse:
- The estimated prediction error on the test sets in cross-validation. A matrix of size - nlamby- nfold. If- glmnet_outputis not- NULL, then- mat_msewill be NULL.
- cvm:
- The averaged estimated prediction error on the test sets over K folds. 
- cvse:
- The standard error of the estimated prediction error on the test sets over K folds. 
- ibest:
- The index in - lambdathat attains the minimal mean cross-validated error.
- foldid:
- Fold assignment. A vector of length - n.
- nfold:
- The number of folds used in cross-validation. 
- sig_obj:
- Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation. 
- sig_obj_path:
- Natural lasso estimates of standard deviation of the error. A vector of length - nlam.
- sig_naive:
- Naive estimates of the error standard deviation based on lasso regression, i.e., - ||y - X \hat{\beta}||_2 / \sqrt n, selected by cross-validation.
- sig_naive_path:
- Naive estimate of standard deviation of the error based on lasso regression. A vector of length - nlam.
- sig_df:
- Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016). 
- sig_df_path:
- Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length - nlam.
- type:
- whether the output is of a natural or an organic lasso. 
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with natural lasso
Description
Calculate a solution path of the natural lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the lasso problems and returns the lasso objective function values as estimates of the error variance:
\hat{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1.
The output also includes a path of naive estimates and a path of degree of freedom adjusted estimates of the error standard deviation.
Usage
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
  thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
Arguments
| x | An  | 
| y | A response vector of size  | 
| lambda | A user specified list of tuning parameter. Default to be NULL, and the program will compute its own  | 
| nlam | The number of  | 
| flmin | The ratio of the smallest and the largest values in  | 
| thresh | Threshold value for the underlying optimization algorithm to claim convergence. Default to be  | 
| intercept | Indicator of whether intercept should be fitted. Default to be  | 
| glmnet_output | Should the estimate be computed using a user-specified output from  | 
Value
A list object containing:
- nand- p:
- The dimension of the problem. 
- lambda:
- The path of tuning parameters used. 
- beta:
- Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size - pby- nlam. The- j-th column represents the estimate of coefficient corresponding to the- j-th tuning parameter in- lambda.
- a0:
- Estimate of intercept. A vector of length - nlam.
- sig_obj_path:
- Natural lasso estimates of the error standard deviation. A vector of length - nlam.
- sig_naive_path:
- Naive estimates of the error standard deviation based on lasso regression, i.e., - ||y - X \hat{\beta}||_2 / \sqrt n. A vector of length- nlam.
- sig_df_path:
- Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length - nlam. See Reid, et, al (2016).
- type:
- whether the output is of a natural or an organic lasso. 
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
Error standard deviation estimation using organic lasso
Description
Solve the organic lasso problem
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2
with two pre-specified values of tuning parameter:
\lambda_1 = log p / n, and \lambda_2, which is a Monte-Carlo estimate of ||X^T e||_\infty^2 / n^2, where e is n-dimensional standard normal.
Usage
olasso(x, y, intercept = TRUE, thresh = 1e-08)
Arguments
| x | An  | 
| y | A response vector of size  | 
| intercept | Indicator of whether intercept should be fitted. Default to be  | 
| thresh | Threshold value for underlying optimization algorithm to claim convergence. Default to be  | 
Value
A list object containing:
- nand- p:
- The dimension of the problem. 
- lam_1,- lam_2:
- log(p) / n, and an Monte-Carlo estimate of- ||X^T e||_\infty^2 / n^2, where- eis n-dimensional standard normal.
- a0_1,- a0_2:
- Estimate of intercept, corresponding to - lam_1and- lam_2.
- beta_1,- beta_2:
- Organic lasso estimate of regression coefficients, corresponding to - lam_1and- lam_2.
- sig_obj_1,- sig_obj_2:
- Organic lasso estimate of the error standard deviation, corresponding to - lam_1and- lam_2.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol <- olasso(x = sim$x, y = sim$y[, 1])
Cross-validation for organic lasso
Description
Provide organic lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value
Usage
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
  flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
Arguments
| x | An  | 
| y | A response vector of size  | 
| lambda | A user specified list of tuning parameter. Default to be NULL, and the program will compute its own  | 
| intercept | Indicator of whether intercept should be fitted. Default to be  | 
| nlam | The number of  | 
| flmin | The ratio of the smallest and the largest values in  | 
| nfold | Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal  | 
| foldid | A vector of length  | 
| thresh | Threshold value for underlying optimization algorithm to claim convergence. Default to be  | 
Value
A list object containing:
- nand- p:
- The dimension of the problem. 
- lambda:
- The path of tuning parameter used. 
- beta:
- Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation. 
- a0:
- Estimate of intercept 
- mat_mse:
- The estimated prediction error on the test sets in cross-validation. A matrix of size - nlamby- nfold
- cvm:
- The averaged estimated prediction error on the test sets over K folds. 
- cvse:
- The standard error of the estimated prediction error on the test sets over K folds. 
- ibest:
- The index in - lambdathat attains the minimal mean cross-validated error.
- foldid:
- Fold assignment. A vector of length - n.
- nfold:
- The number of folds used in cross-validation. 
- sig_obj:
- Organic lasso estimate of the error standard deviation, selected by cross-validation. 
- sig_obj_path:
- Organic lasso estimates of the error standard deviation. A vector of length - nlam.
- type:
- whether the output is of a natural or an organic lasso. 
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with organic lasso
Description
Calculate a solution path of the organic lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the squared-lasso problems and returns the objective function values as estimates of the error variance:
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2.
Usage
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
  thresh = 1e-08, intercept = TRUE)
Arguments
| x | An  | 
| y | A response vector of size  | 
| lambda | A user specified list of tuning parameter. Default to be NULL, and the program will compute its own  | 
| nlam | The number of  | 
| flmin | The ratio of the smallest and the largest values in  | 
| thresh | Threshold value for underlying optimization algorithm to claim convergence. Default to be  | 
| intercept | Indicator of whether intercept should be fitted. Default to be  | 
Details
This package also includes the outputs of the naive and the degree-of-freedom adjusted estimates, in analogy to nlasso_path.
Value
A list object containing:
- nand- p:
- The dimension of the problem. 
- lambda:
- The path of tuning parameter used. 
- a0:
- Estimate of intercept. A vector of length - nlam.
- beta:
- Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size - pby- nlam. The- j-th column represents the estimate of coefficient corresponding to the- j-th tuning parameter in- lambda.
- sig_obj_path:
- Organic lasso estimates of the error standard deviation. A vector of length - nlam.
- sig_naive:
- Naive estimate of the error standard deviation based on the squared-lasso regression. A vector of length - nlam.
- sig_df:
- Degree-of-freedom adjusted estimate of the error standard deviation, based on the squared-lasso regression. A vector of length - nlam.
- type:
- whether the output is of a natural or an organic lasso. 
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Description
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Usage
olasso_slow(x, y, thresh = 1e-08)
Arguments
| x | An  | 
| y | A response vector of size  | 
| thresh | Threshold value for underlying optimization algorithm to claim convergence. Default to be  | 
plot a natural.cv object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.cv'
plot(x, ...)
Arguments
| x | an object of class  | 
| ... | additional argument(not used here, only for S3 generic/method consistency) | 
plot a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
plot(x, ...)
Arguments
| x | an object of class  | 
| ... | additional argument(not used here, only for S3 generic/method consistency) | 
print a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
print(x, ...)
Arguments
| x | an object of class  | 
| ... | additional argument(not used here, only for S3 generic/method consistency) | 
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Description
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Usage
standardize(x, center = TRUE)
Arguments
| x | design matrix | 
| center | should we set column means equal to zero |