Title: | Method for Clustering Partially Observed Data |
Version: | 1.1 |
Description: | Software for k-means clustering of partially observed data from Chi, Chi, and Baraniuk (2016) <doi:10.1080/00031305.2015.1086685>. |
URL: | http://jocelynchi.com/kpodclustr |
Depends: | R (≥ 3.1.0) |
License: | MIT + file LICENSE |
LazyData: | true |
RoxygenNote: | 7.1.0 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2020-06-23 15:46:38 UTC; jtc |
Author: | Jocelyn T. Chi [aut, cre], Eric C. Chi [aut, ctb], Richard G. Baraniuk [aut] |
Maintainer: | Jocelyn T. Chi <jtchi@ncsu.edu> |
Repository: | CRAN |
Date/Publication: | 2020-06-24 09:10:06 UTC |
Function for assigning clusters to rows in a matrix
Description
assign_clustpp
Function for assigning clusters to rows in a matrix
Usage
assign_clustpp(X, init_centers, kmpp_flag = TRUE, max_iter = 20)
Arguments
X |
Data matrix containing missing entries whose rows are observations and columns are features |
init_centers |
Centers for initializing k-means |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
max_iter |
(Optional) Maximum number of iterations |
Author(s)
Jocelyn T. Chi
Examples
p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
Orig <- Data$Orig
clusts <- assign_clustpp(Orig, k)
Function for finding indices of missing data in a matrix
Description
findMissing
Function for finding indices of missing data in a matrix
Usage
findMissing(X)
Arguments
X |
Data matrix containing missing entries whose rows are observations and columns are features |
Value
A numeric vector containing indices of the missing entries in X
Author(s)
Jocelyn T. Chi
Examples
p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
missing <- findMissing(X)
Function for initial imputation for k-means
Description
initialImpute
Initial imputation for k-means
Usage
initialImpute(X)
Arguments
X |
Data matrix containing missing entries whose rows are observations and columns are features |
Value
A data matrix containing no missing entries
Author(s)
Jocelyn T. Chi
Examples
p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
X_copy <- initialImpute(X)
k-means++
Description
kmpp
Computes initial centroids via kmeans++
Usage
kmpp(X, k)
Arguments
X |
Data matrix whose rows are observations and columns are features |
k |
Number of clusters. |
Value
A data matrix whose rows contain initial centroids for the k clusters
Examples
n <- 10
p <- 2
X <- matrix(rnorm(n*p),n,p)
k <- 3
kmpp(X,k)
Function for performing k-POD
Description
kpod
Function for performing k-POD, a method for k-means clustering on partially observed data
Usage
kpod(X, k, kmpp_flag = TRUE, maxiter = 100)
Arguments
X |
Data matrix containing missing entries whose rows are observations and columns are features |
k |
Number of clusters |
kmpp_flag |
(Optional) Indicator for whether or not to initialize with k-means++ |
maxiter |
(Optional) Maximum number of iterations |
Value
cluster: Clustering assignment obtained with k-POD
cluster_list: List containing clustering assignments obtained in each iteration
obj_vals: List containing the k-means objective function in each iteration
fit: Fit of clustering assignment obtained with k-POD (calculated as 1-(total withinss/totss))
fit_list: List containing fit of clustering assignment obtained in each iteration
Author(s)
Jocelyn T. Chi
Examples
p <- 5
n <- 200
k <- 3
sigma <- 0.15
missing <- 0.20
Data <- makeData(p,n,k,sigma,missing)
X <- Data$Missing
Orig <- Data$Orig
truth <- Data$truth
kpod_result <- kpod(X,k)
kpodclusters <- kpod_result$cluster
Make test data
Description
makeData
Function for making test data
Usage
makeData(p, n, k, sigma, missing, seed = 12345)
Arguments
p |
Number of features (or variables) |
n |
Number of observations |
k |
Number of clusters |
sigma |
Variance |
missing |
Desired missingness percentage |
seed |
(Optional) Seed (default seed is 12345) |
Author(s)
Jocelyn T. Chi
Examples
p <- 2
n <- 100
k <- 3
sigma <- 0.25
missing <- 0.05
X <- makeData(p,n,k,sigma,missing)$Orig