| Title: | Factor Model Estimation Using Proxy Variables | 
| Version: | 1.0 | 
| Description: | Functions to estimate a factor model using discrete and continuous proxy variables. The function 'dproxyme' estimates a factor model of discrete proxy variables using an EM algorithm (Dempster, Laird, Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>; Hu (2008) <doi:10.1016/j.jeconom.2007.12.001>; Hu(2017) <doi:10.1016/j.jeconom.2017.06.002> ). The function 'cproxyme' estimates a linear factor model (Cunha, Heckman, and Schennach (2010) <doi:10.3982/ECTA6551>). | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.1.1 | 
| Imports: | dplyr, nnet, pracma, stats, utils, gtools | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2021-06-01 16:45:31 UTC; yujung | 
| Author: | Yujung Hwang | 
| Maintainer: | Yujung Hwang <yujungghwang@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2021-06-04 07:40:05 UTC | 
cproxyme
Description
This function estimates a linear factor model using continuous variables. The linear factor model to estimate has the following form. proxy = intercept + factorloading * (latent variable) + measurement error The measurement error is assumed to follow a Normal distribution with a mean zero and a variance, which needs to be estimated.
Usage
cproxyme(dat, anchor = 1, weights = NULL)
Arguments
| dat | A proxy variable data frame list. | 
| anchor | This is a column index of an anchoring proxy variable. Default is 1. That is, the code will use the first column in dat data frame as an achoring variable. | 
| weights | An optional weight vector | 
Value
Returns a list of 3 components :
- alpha0
- This is a vector of intercepts in a linear factor model. The k-th entry is the intercept of k-th proxy variable factor model. 
- alpha1
- This is a vector of factor loadings. The k-th entry is the factor loading of k-th proxy variable. The factor loading of anchoring variable is normalized to 1. 
- varnu
- This is a vector of variances of measurement errors in proxy variables. The k-th entry is the variance of k-th proxy measurement error. The measurement error is assumed to follow a Normal distribution with mean 0. 
- mtheta
- This is a mean of the latent variable. It is equal to the mean of the anchoring proxy variable. 
- vartheta
- This is a variance of the latent variable. 
Author(s)
Yujung Hwang, yujungghwang@gmail.com
References
- Cunha, F., Heckman, J. J., & Schennach, S. M. (2010)
- Estimating the technology of cognitive and noncognitive skill formation. Econometrica, 78(3), 883-931. doi: 10.3982/ECTA6551 
- Hwang, Yujung (2021)
- Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper. 
Examples
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(0.1,0.3,0.6),proxy3=c(2,3,5))
cproxyme(dat=dat1,anchor=1)
## you can specify weights
cproxyme(dat=dat1,anchor=1,weights=c(0.1,0.5,0.4))
dproxyme
Description
This function estimates measurement stochastic matrices of discrete proxy variables.
Usage
dproxyme(
  dat,
  sbar = 2,
  initvar = 1,
  initvec = NULL,
  seed = 210313,
  tol = 0.005,
  maxiter = 200,
  miniter = 10,
  minobs = 100,
  maxiter2 = 1000,
  trace = FALSE,
  weights = NULL
)
Arguments
| dat | A proxy variable data frame list. | 
| sbar | A number of discrete types. Default is 2. | 
| initvar | A column index of a proxy variable to initialize the EM algorithm. Default is 1. That is, the proxy variable in the first column of "dat" is used for initialization. | 
| initvec | This vector defines how to group the initvar to initialize the EM algorithm. | 
| seed | Seed. Default is 210313 (birthday of this package). | 
| tol | A tolerance for EM algorithm. Default is 0.005. | 
| maxiter | A maximum number of iterations for EM algorithm. Default is 200. | 
| miniter | A minimum number of iterations for EM algorithm. Default is 10. | 
| minobs | Compute likelihood of a proxy variable only if there are more than "minobs" observations. Default is 100. | 
| maxiter2 | Maximum number of iterations for "multinom". Default is 1000. | 
| trace | Whether to trace EM algorithm progress. Default is FALSE. | 
| weights | An optional weight vector | 
Value
Returns a list of 5 components :
- M_param
- This is a list of estimated measurement (stochastic) matrices. The k-th matrix is a measurement matrix of a proxy variable saved in the kth column of dat data frame (or matrix). The ij-th element in a measurement matrix is the conditional probability of observing j-th (largest) proxy response value conditional on that the latent type is i. 
- M_param_col
- This is a list of column labels of 'M_param' matrices 
- M_param_row
- This is a list of row labels of 'M_param' matrices. It is simply c(1:sbar). 
- mparam
- This is a list of multinomial logit coefficients which were used to compute 'M_param' matrices. These coefficients are useful to compute the likelihood of proxy responses. 
- typeprob
- This is a type probability matrix of size N-by-sbar. The ij-th entry of this matrix gives the probability of observation i to have type j. 
Author(s)
Yujung Hwang, yujungghwang@gmail.com
References
- Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin (1977)
- "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39.1 : 1-22. doi: 10.1111/j.2517-6161.1977.tb01600.x 
- Hu, Yingyao (2008)
- Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution. Journal of Econometrics, 144(1), 27-61. doi: 10.1016/j.jeconom.2007.12.001 
- Hu, Yingyao (2017)
- The econometrics of unobservables: Applications of measurement error models in empirical industrial organization and labor economics. Journal of Econometrics, 200(2), 154-168. doi: 10.1016/j.jeconom.2017.06.002 
- Hwang, Yujung (2021)
- Identification and Estimation of a Dynamic Discrete Choice Models with Endogenous Time-Varying Unobservable States Using Proxies. Working Paper. 
- Hwang, Yujung (2021)
- Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper. 
Examples
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(2,3,4),proxy3=c(4,3,2))
## default minimum num of obs to run an EM algorithm is 10
dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3)
## you can specify weights
dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3,weights=c(0.1,0.5,0.4))
makeDummy
Description
This function is to make dummy variables using a discrete variable.
Usage
makeDummy(tZ)
Arguments
| tZ | An input vector | 
Value
Returns dZ, a matrix of size length(tZ)-by-card(tZ) :
The ij-th element in dZ is 1 if tZ[i] is equal to the j-th largest value of tZ. And the ij-th element in DZ is 0 otherwise. The row sum of dZ must be 1 by construction.
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
makeDummy(c(1,2,3))
weighted.cov
Description
This function is to compute an unbiased sample weighted covariance. The function uses only pairwise complete observations.
Usage
weighted.cov(x, y, w = NULL)
Arguments
| x | An input vector to compute a covariance, cov(x,y) | 
| y | An input vector to compute a covariance, cov(x,y) | 
| w | A weight vector | 
Value
Returns an unbiased sample weighted covariance
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
# If you do not specify weights, 
# it returns the usual unweighted sample covariance 
weighted.cov(x=c(1,3,5),y=c(2,3,1)) 
weighted.cov(x=c(1,3,5),y=c(2,3,1),w=c(0.1,0.5,0.4))
weighted.var
Description
This function is to compute an unbiased sample weighted variance.
Usage
weighted.var(x, w = NULL)
Arguments
| x | A vector to compute a variance, var(x) | 
| w | A weight vector | 
Value
Returns an unbiased sample weighted variance
Author(s)
Yujung Hwang, yujungghwang@gmail.com
Examples
## If you do not specify weights, 
## it returns the usual unweighted sample variance
weighted.var(x=c(1,3,5)) 
weighted.var(x=c(1,3,5),w=c(0.1,0.5,0.4))