| Type: | Package | 
| Title: | Efficiently Impute Large Scale Incomplete Matrix | 
| Version: | 0.2.4 | 
| Date: | 2024-07-22 | 
| Author: | Zhe Gao [aut, cre], Jin Zhu [aut], Junxian Zhu [aut], Xueqin Wang [aut], Yixuan Qiu [cph], Gael Guennebaud [cph, ctb], Jitse Niesen [cph, ctb], Ray Gardner [ctb] | 
| Maintainer: | Zhe Gao <gaozh8@mail.ustc.edu.cn> | 
| Description: | Efficiently impute large scale matrix with missing values via its unbiased low-rank matrix approximation. Our main approach is Hard-Impute algorithm proposed in https://www.jmlr.org/papers/v11/mazumder10a.html, which achieves highly computational advantage by truncated singular-value decomposition. | 
| License: | GPL-3 | file LICENSE | 
| Imports: | Rcpp (≥ 0.12.6) | 
| LinkingTo: | Rcpp, RcppEigen | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.1 | 
| NeedsCompilation: | yes | 
| Packaged: | 2024-07-22 12:57:11 UTC; AMA | 
| Suggests: | knitr | 
| VignetteBuilder: | knitr | 
| Repository: | CRAN | 
| Date/Publication: | 2024-07-22 22:10:05 UTC | 
Data standardization
Description
Standardize a matrix rows and/or columns to have zero mean or unit variance
Usage
biscale(x, thresh.sd = 1e-05, maxit.sd = 100, control = list(...), ...)
Arguments
| x | an  | 
| thresh.sd | convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. | 
| maxit.sd | maximum number of iterations. | 
| control | a list of parameters that control details of standard procedure. See biscale.control. | 
| ... | arguments to be used to form the default control argument if it is not supplied directly. | 
Value
A list is returned
| x.st | The matrix after standardization. | 
| alpha | The row mean after iterative process. | 
| beta | The column mean after iterative process. | 
| tau | The row standard deviation after iterative process. | 
| gamma | The column standard deviation after iterative process. | 
References
Hastie, Trevor, Rahul Mazumder, Jason D. Lee, and Reza Zadeh. Matrix completion and low-rank SVD via fast alternating least squares. The Journal of Machine Learning Research 16, no. 1 (2015): 3367-3402.
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
###### Standardize both mean and variance
xs <- biscale(x_na)
###### Only standardize mean ######
xs_mean <- biscale(x_na, row.mean = TRUE, col.mean = TRUE)
###### Only standardize variance ######
xs_std <- biscale(x_na, row.std = TRUE, col.std = TRUE)
Control for standard procedure
Description
Various parameters that control aspects of the standard procedure.
Usage
biscale.control(
  row.mean = FALSE,
  row.std = FALSE,
  col.mean = FALSE,
  col.std = FALSE
)
Arguments
| row.mean | if  | 
| row.std | if  | 
| col.mean | similar to  | 
| col.std | similar to  | 
Value
A list with components named as the arguments.
Efficiently impute missing values for a large scale matrix
Description
Fit a low-rank matrix approximation to a matrix with missing values. The algorithm iterates like EM: filling the missing values with the current guess, and then approximating the complete matrix via truncated SVD.
Usage
eimpute(
  x,
  r,
  svd.method = c("tsvd", "rsvd"),
  noise.var = 0,
  thresh = 1e-05,
  maxit = 100,
  init = FALSE,
  init.mat = 0,
  override = FALSE,
  control = list(...),
  ...
)
Arguments
| x | an  | 
| r | the rank of low-rank matrix for approximating  | 
| svd.method | a character string indicating the truncated SVD method.
If  | 
| noise.var | the variance of noise. | 
| thresh | convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. | 
| maxit | maximal number of iterations. | 
| init | if init = FALSE(the default), the missing entries will initialize with mean. | 
| init.mat | the initialization matrix. | 
| override | logical value indicating whether the observed elements in  | 
| control | a list of parameters that control details of standard procedure, See biscale.control. | 
| ... | arguments to be used to form the default control argument if it is not supplied directly. | 
Value
A list containing the following components
| x.imp | the matrix after completion. | 
| rmse | the relative mean square error of matrix completion, i.e., training error. | 
| iter.count | the number of iterations. | 
References
Rahul Mazumder, Trevor Hastie and Rob Tibshirani (2010) Spectral Regularization Algorithms for Learning Large Incomplete Matrices, Journal of Machine Learning Research 11, 2287-2322
Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2011) Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, Siam Review Vol. 53, num. 2, pp. 217-288
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
x_impute <- eimpute(x_na, r)
head(x_impute[["x.imp"]][, 1:6])
x_impute[["rmse"]]
Incomplete data generator
Description
Generate a matrix with missing values, where the indices of missing values are uniformly randomly distributed in the matrix.
Usage
incomplete.generator(m, n, r, snr = 3, prop = 0.5, seed = 1)
Arguments
| m | the rows of the matrix. | 
| n | the columns of the matrix. | 
| r | the rank of the matrix. | 
| snr | the signal-to-noise ratio in generating the matrix. Default  | 
| prop | the proportion of missing observations. Default  | 
| seed | the random seed. Default  | 
Details
We generate the matrix by UV + \epsilon, where U, V are m by r, r by n matrix satisfy standard normal
distribution. \epsilon has a normal distribution with mean 0 and variance \frac{r}{snr}.
Value
A matrix with missing values.
Examples
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
Search rank magnitude of the best approximating matrix
Description
Estimate a preferable matrix rank magnitude for fitting a low-rank matrix approximation to a matrix with missing values. The algorithm use GIC/CV to search the rank in a given range, and then fill the missing values with the estimated rank.
Usage
r.search(
  x,
  r.min = 1,
  r.max = "auto",
  svd.method = c("tsvd", "rsvd"),
  rule.type = c("gic", "cv"),
  noise.var = 0,
  init = FALSE,
  init.mat = 0,
  maxit.rank = 1,
  nfolds = 5,
  thresh = 1e-05,
  maxit = 100,
  override = FALSE,
  control = list(...),
  ...
)
Arguments
| x | an  | 
| r.min | the start rank for searching. Default  | 
| r.max | the max rank for searching. | 
| svd.method | a character string indicating the truncated SVD method.
If  | 
| rule.type | a character string indicating the information criterion rule.
If  | 
| noise.var | the variance of noise. | 
| init | if init = FALSE(the default), the missing entries will initialize with mean. | 
| init.mat | the initialization matrix. | 
| maxit.rank | maximal number of iterations in searching rank. Default  | 
| nfolds | number of folds in cross validation. Default  | 
| thresh | convergence threshold, measured as the relative change in the Frobenius norm between two successive estimates. | 
| maxit | maximal number of iterations. | 
| override | logical value indicating whether the observed elements in  | 
| control | a list of parameters that control details of standard procedure, See biscale.control. | 
| ... | arguments to be used to form the default control argument if it is not supplied directly. | 
Value
A list containing the following components
| x.imp | the matrix after completion with the estimated rank. | 
| r.est | the rank estimation. | 
| rmse | the relative mean square error of matrix completion, i.e., training error. | 
| iter.count | the number of iterations. | 
Examples
################# Quick Start #################
m <- 100
n <- 100
r <- 10
x_na <- incomplete.generator(m, n, r)
head(x_na[, 1:6])
x_impute <- r.search(x_na, 1, 15, "rsvd", "gic")
x_impute[["r.est"]]