The DEmixR package provides tools for fitting and
evaluating two-component mixture models (normal and
lognormal) using robust global optimization via Differential Evolution
(DEoptim), with local refinement by quasi-Newton
optimization.
It is designed to be more robust than standard EM-based methods, particularly in challenging situations where the mixture components overlap strongly or have complex likelihood surfaces.
You can install the released version of DEmixR from CRAN
with:
prelim_plots() – Diagnostic plots (histogram, QQ, PP,
log-QQ) select_best_mixture() – Compare lognormal vs normal
mixtures using BIC fit_norm2() – Fit a two-component
normal mixture fit_lognorm2() – Fit a
two-component lognormal mixture
bootstrap_mix2() – Bootstrap mixture parameters (parametric
or nonparametric) evaluate_init() – Refine initial
parameter values with local optimization
Use prelim_plots() for initial data exploration and
distribution assessment Use select_best_mixture() when
unsure whether normal or lognormal
distribution is appropriate Use
fit_norm2()/fit_lognorm2() when you know the
appropriate distribution family Use bootstrap_mix2() for
uncertainty quantification and confidence intervals Use
evaluate_init() for testing specific starting values or
refining solutions
We first generate synthetic data from a two-component
normal mixture and draw the basic plots with
DEmixR.
# Load the package
library(DEmixR)
# Simulation data
set.seed(123)
x <- c(rnorm(3000, mean = 0, sd = 1),
       rnorm(2000, mean = 3, sd = 1.2))
# Diagnostic plots
prelim_plots(x, which = c("hist", "qq"))# Automatically choose the best mixture family
best_model <- select_best_mixture(x, n_runs = 3, NP = 50, itermax = 2000)##   |                                                                              |                                                                      |   0%  |                                                                              |+++++++++++++++++++++++                                               |  33%  |                                                                              |+++++++++++++++++++++++++++++++++++++++++++++++                       |  67%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%## [1] "normal"##   normal 
## 19547.24##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%##           p          m1          s1          m2          s2 
## 0.599738895 0.005086783 0.983210109 2.984878390 1.181022679## [1] -9752.326## [1] 19514.65## [1] 19547.24# Simulation data
set.seed(123)
y <- c(rlnorm(3000, meanlog = 0, sdlog = 0.5),
       rlnorm(2000, meanlog = 1, sdlog = 0.7))
# Fit a two-component lognormal mixture
fit_ln <- fit_lognorm2(y, n_runs = 5, parallelType = 0)##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%##           p          m1          s1          m2          s2 
##  0.57503793 -0.01180295  0.48857490  0.95267679  0.69951039# Bootstrap confidence intervals
boot_res <- bootstrap_mix2(fit_ln, B = 100, parametric = TRUE, parallelType = 0)
boot_res$central##           p          m1          s1          m2          s2 
##  0.57428385 -0.01681622  0.48844615  0.95834689  0.69416098##               p          m1        s1        m2        s2
## 2.5%  0.4115206 -0.06678695 0.4343919 0.7041057 0.6095682
## 97.5% 0.7041991  0.05896932 0.5272649 1.2061600 0.7729422## $success
## [1] TRUE
## 
## $par
## [1]  0.57497232 -0.01185789  0.48853997  0.95258083  0.69950326
## 
## $logLik
## [1] -7547.12
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"# Simulated biomarker data with two populations
set.seed(456)
biomarker <- c(rlnorm(4000, meanlog = 2.5, sdlog = 0.4),  # Healthy group
               rlnorm(1000, meanlog = 3.8, sdlog = 0.6))  # Disease group
# Initial exploration
prelim_plots(biomarker, which = c("hist", "logqq"))# Fit lognormal mixture
fit_bio <- try(fit_lognorm2(biomarker, n_runs = 5, parallelType = 0, quiet = 2))##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%if (!inherits(fit_bio, "try-error")) {
  cat("Biomarker analysis results:\n")
  print(fit_bio$par)
  
  # Estimate proportion in each group
  cat("\nEstimated proportion in group 1 (healthy):", 
      round(fit_bio$par[1], 3), "\n")
  cat("Estimated proportion in group 2 (disease):", 
      round(1 - fit_bio$par[1], 3), "\n")
  
  # Bootstrap for confidence intervals
  boot_bio <- try(bootstrap_mix2(fit_bio, B = 100, quiet = 2))
  if (!inherits(boot_bio, "try-error")) {
    cat("\n95% CI for proportion in group 1:\n")
    print(boot_bio$ci[, "p"])
  }
}## Biomarker analysis results:
##         p        m1        s1        m2        s2 
## 0.7942352 2.5045852 0.3877220 3.7999468 0.6123135 
## 
## Estimated proportion in group 1 (healthy): 0.794 
## Estimated proportion in group 2 (disease): 0.206 
## 
## 95% CI for proportion in group 1:
##      2.5%     97.5% 
## 0.7543291 0.8244741This section illustrates some of the optional parameters for fine-tuning performance and diagnostics.
fit_*# Fit lognormal mixture with custom optimization settings
fit_ln <- fit_lognorm2(
  x,
  NP = 150,           # population size for DEoptim
  n_runs = 20,        # independent runs to avoid local optima
  itermax = 200,      # maximum iterations per run
  parallelType = 1,   # use parallelization across platforms
  quiet = 1,          # print minimal progress info
  par_init = NULL     # optionally supply initial values
)Key options: - NP: larger = better
exploration, but slower. - n_runs: increase for more
robustness; with 1000, it is very slow, but have ultra robustness -
parallelType: 0 = serial,
1 = socket clusters (Windows/Mac/Linux),
2 = forking (Mac/Linux only). - quiet:
0 = verbose, 1 = minimal,
2 = silent.
bootstrap_mix2boot_res <- bootstrap_mix2(
  fit_ln,
  B = 500,            # number of bootstrap replications
  parallelType = 1,   # parallel computation
  parametric = TRUE,  # parametric bootstrap
  ci_level = 0.90,    # confidence interval level
  quiet = 1           # show progress bar
)
boot_res$central   # means for normal and medians for lognormal
boot_res$ci        # bootstrap confidence intervalsNotes: - Supports both parametric and nonparametric bootstrap. - Uses an internal function to avoid label switching.
prelim_plotsprelim_plots(
  x,
  which = c("hist", "qq", "pp", "logqq"),
  hist_bins = 40,
  col_hist = "grey85",
  col_density = "darkorange",
  col_qq = "grey60",
  col_line = "darkorange"
)Options: - which: choose one or more of
"hist", "qq", "pp", "logqq". - hist_bins:
number of bins for histogram. - col_*: customize plot
colors.
select_best_mixturesel <- select_best_mixture(
  x,
  n_runs = 5,
  NP = 100,
  itermax = 150,
  quiet = 2
)
sel$best$family
sel$best$parWhat it does: - Fits both Normal and Lognormal mixtures. - Compares them via BIC. - Returns the best-performing family.
evaluate_initinit_eval <- evaluate_init(
  x,
  family = "normal",
  n_starts = 10,      # number of random initializations
  NP = 50,            # DEoptim population size
  itermax = 100,      # maximum iterations
  quiet = 2           # suppress output
)
print(init_eval)Usage: - Helps check the stability of optimization. - Useful to diagnose poor convergence.
The DEmixR package provides robust and flexible tools for mixture model analysis, making it a strong alternative to existing EM-based approaches. Its use of global optimization helps avoid local minima and improve reliability, especially in complex or overlapping mixtures.
Note: Harmless socket connection warnings may appear when using parallelization.
prelim_plots() to understand your
data distributionselect_best_mixture() when uncertain about
distribution familyn_runs for challenging optimization
problemsevaluate_init() if
convergence is problematic For more information, see the help files for
each function: ?fit_norm2, ?bootstrap_mix2,
etc.