Version: | 0.5-5 |
Date: | 2018-06-04 |
Title: | Finite Mixture Distribution Models |
Packaged: | 2018-06-04 18:18:27 UTC; pdmmac1 |
Author: | Peter Macdonald <pdmmac@mcmaster.ca>, with contributions from Juan Du <duduyy@hotmail.com> |
Maintainer: | Peter Macdonald <pdmmac@mcmaster.ca> |
Depends: | R (≥ 1.4.0) |
Imports: | graphics, stats |
Description: | Fit finite mixture distribution models to grouped data and conditional data by maximum likelihood using a combination of a Newton-type algorithm and the EM algorithm. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://www.r-project.org/, https://ms.mcmaster.ca/peter/mix/mix.html |
NeedsCompilation: | no |
Repository: | CRAN |
Date/Publication: | 2018-06-04 18:30:51 UTC |
ANOVA Tables for Mixture Model Objects
Description
Compute analysis of variance tables for one or two mixture model objects.
Usage
## S3 method for class 'mix'
anova(object, mixobj2, ...)
Arguments
object |
an object of class |
mixobj2 |
an object of the same type to be compared with |
... |
additional objects of the same type. |
Value
An object of class "anova"
inheriting from class
"data.frame"
. When given a single argument this
function produces a table which tests whether the model
is significant. The table contains the residual
degrees of freedom, Chi-square statistic and P value.
If the class of the argument is not "mix"
, this function
returns NULL
. When given two objects, it tests the
models against one another and lists them in the order
of number of parameters fitted. For the model with
fewer parameters fitted, the change in degrees of
freedom is given. This only make statistical sense if
the models are nested. If one of arguments does not
belong to the class "mix"
, the function will give
the anova table for the other argument; if both of
them do not, it returns NULL
.
Warning
The comparison between two models will only be valid if they are fitted to the same dataset. And the two models should be nested.
See Also
The model fitting function mix
, the generic function
anova
.
Examples
data(pike65) # load the grouped data `pike65'
data(pikepar) # load the initial values of parameters for the data `pike65'
fitpike3 <- mix(pike65, pikepar, "lnorm", mixconstr(conmu = "MFX",
fixmu = c(FALSE, FALSE, FALSE, FALSE, TRUE), consigma = "CCV"), emstep = 3)
anova(fitpike3)
fitpike4 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3)
anova(fitpike4)
anova(fitpike3, fitpike4)
anova(fitpike4, fitpike3)
Grouped Binomial Data
Description
We randomly generate four groups of binomial distribution data with
means 4, 8, 12, 16, and corresponding variances 3.2, 4.8, 4.8 and 3.2.
Then we mix the four data groups with 100 observations for each group,
i.e., with equal proportions. After grouping the mixture data, we obtain
the grouped data bindat
.
The bindat
data frame has 21 rows and 2 columns.
Usage
data(bindat)
Format
This data frame contains the following columns:
- x
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Examples
data(bindat)
data(binpar)
plot.mixdata(bindat)
fit <- mix(bindat, binpar, "binom", mixconstr(conpi = "PFX",
fixpi = c(TRUE, TRUE, TRUE, TRUE), consigma = "BINOM", size = c(20, 20, 20, 20)))
fit
plot(fit)
Starting Values of Parameters for the Binomial Data Set
Description
Starting values of parameters for fitting a mixture distribution to the data set bindat
.
The binpar
data frame has 4 rows and 3 columns.
Usage
data(binpar)
Format
This data frame contains the following columns:
- pi
the starting values for proportions.
- mu
the starting values for means.
- sigma
the starting values for standard deviations.
Examples
data(binpar)
Cassie's Length-Frequency Example
Description
Data for Cassie's (1954) analysis of size frequency distributions.
The cassie
data frame has 40 rows and 2 columns.
Usage
data(cassie)
Format
This data frame contains the following columns:
- length
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Source
Cassie, R.M. (1954). Some uses of probability paper in the analysis of size frequency distributions. Aust. J. Mar. Freshwater Res. 5 , 513-522.
The data, lengths (in) of 256 snapper (Chrysophrys auratus Forster) taken by a trawl with a mesh of about 1.5 in, are given in Table 5 of that paper. Cassie's results are given in his Table 1.
References
http://www.math.mcmaster.ca/peter/mix/demex/excass.html
Examples
data(cassie)
plot.mixdata(cassie)
Extract Mixture Model Coefficients
Description
coef.mix
is a function which extracts mixture model coefficients
from objects returned by the model fitting function mix
. It is
called via the generic function coef
.
Usage
## S3 method for class 'mix'
coef(object, natpar = FALSE, ...)
Arguments
object |
an object of class |
natpar |
a logical scalar specifying whether the natural parameters should be given. |
... |
other arguments. |
Value
A data frame containing three variables, which are,
in order, the proportions, means, and standard
deviations, respectively. If natpar
is TRUE
,
then the natural parameters of component
distributions are also displayed.
See Also
mix
for model fitting.
Examples
data(pike65) # load the grouped data `pike65'
data(pikepar) # load the initial values of parameters for the data `pike65'
fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3)
coef(fit)
coef(fit, natpar = TRUE)
Add Conditional Data to Grouped Data
Description
It combines automatically grouped data with conditional data when enter the conditional samples.
Usage
conditdat(mixdat, k, conditsamples)
Arguments
mixdat |
a data frame containing grouped data, whose first column should be the right boundaries of grouping intervals, and the second one should be the numbers of observations falling into each interval. |
k |
the number of components. |
conditsamples |
a vector containing conditional data, which consists of the conditional samples, the first element of each sample is a number indicating which interval this sample comes from. |
Value
A data frame containing the grouped data with conditional data.
See Also
mixgroup
for constructing grouped and conditional
data.
Examples
data(pike65) # load the data set `pike65'
pike65 # display the data set `pike65'
conditdat(pike65, k = 5, conditsamples =
c(c(4, 9, 2, 0, 0, 0), c(5, 8, 6, 0, 0,0),
c(12, 0, 2, 34, 0, 0), c(13, 0, 0, 21, 0, 0),
c(15, 0, 0, 5, 5, 0), c(16, 0, 0, 6, 5, 1),
c(17, 0, 0, 5, 7, 0), c(18, 0, 0, 4, 4, 3),
c(19, 0, 0, 0, 8, 0), c(20, 0, 0, 0, 2, 1),
c(21, 0, 0, 0, 1, 5), c(22, 0, 0, 0, 2, 4)))
# add conditional data to the grouped data `pike65'
A Mixture Data of Three Exponential Distributions
Description
A total of 1000 observations was generated by computer to follow the mixture distribution 1/3 E(1) + 1/3 E(4) + 1/3 E(16) where E(m) denotes an exponential distribution with mean m.
The expdat
data frame has 25 rows and 2 columns.
Usage
data(expdat)
Format
This data frame contains the following columns:
- x
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Source
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
http://www.math.mcmaster.ca/peter/mix/demex/exexp.html
Examples
data(expdat)
plot.mixdata(expdat)
A Mixed Data with Fifteen Normal Components
Description
Fifteen normal components grouped over eighty intervals.
The fiftn80
data frame has 80 rows and 2 columns.
Usage
data(fiftn80)
Format
This data frame contains the following columns:
- x
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Details
A total of 820 observations were generated by computer to follow the distribution 1/15 N(5, 1) + 1/15 N(10, 1) + ... + 1/15 N(75, 1) where N(m, s) denotes a normal distribution with mean m and standard deviation s.
Source
http://www.math.mcmaster.ca/peter/mix/demex/ex1580.html
Examples
data(fiftn80)
plot.mixdata(fiftn80)
Compute Mixture Model Fitted Values
Description
fitted.mix
is a function which computes fitted
values from objects returned by the modeling function
mix
. It is called via the generic function
fitted
.
Usage
## S3 method for class 'mix'
fitted(object, digits = NULL, ...)
Arguments
object |
an object of class |
digits |
a specified number of decimal places to be reserved. |
... |
other arguments. |
Value
List with the following components:
mixed |
the estimated mixed data, that is, the fitted numbers of observations falling into each interval. |
joint |
the estimated joint data, that is, the fitted numbers of observations from each component falling into every interval. |
conditional |
the estimated conditional data to be
returned if |
conditprob |
the estimated conditional probabilities of observations from given interval belonging to each component. |
See Also
mix
for fitting mixture distributions.
Examples
data(pike65)
data(pikepar)
fit1 <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3)
fitted(fit1)
data(pike65sg)
fit2 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE)
fitted(fit2, digits = 2)
Estimate Parameters of One-Component Mixture Distribution
Description
groupstats
is a function which estimates the
proportion, mean and standard deviation for a mixture
distribution with one component.
Usage
groupstats(mixdat)
Arguments
mixdat |
A data frame containing grouped data, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. |
Value
A list containing the following components:
pi |
the value is |
mu |
the estimated mean of |
sigma |
the estimated standard deviation of |
See Also
mixgroup
for grouping data, mixparam
for
constructing starting values of parameters.
Examples
data(pike65)
groupstats(pike65)
Compute Probabilities of an Observation Falling into a Grouping Interval
Description
Compute probabilities of an observation falling into a grouping interval when given component distribution which the observation comes from.
Usage
grpintprob(mixdat, mixpar, dist, constr)
Arguments
mixdat |
a data frame containing grouped data, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. |
mixpar |
a data frame containing the parameter values of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of component distributions. |
Value
It produces a matrix, whose each column contains the probabilities of the observations from one component falling into each grouping interval.
Examples
data(bindat)
data(binpar)
grpintprob(bindat, binpar, "binom", mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20)))
Estimate Parameters of Mixture Distributions
Description
Find a set of overlapping component distributions that gives the best fit to grouped data and conditional data, using a combination of a Newton-type method and EM algorithm.
Usage
mix(mixdat, mixpar, dist = "norm", constr = list(conpi = "NONE",
conmu = "NONE", consigma = "NONE", fixpi = NULL, fixmu = NULL,
fixsigma = NULL, cov = NULL, size = NULL), emsteps = 1,
usecondit = FALSE, exptol = 5e-06, print.level = 0, ...)
Arguments
mixdat |
A data frame containing grouped data, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. If conditional data are available, this data frame should have k + 2 columns, where k is the number of components, whose element in row j and column i + 2 is the number of observations from the jth interval belonging to the ith component. |
mixpar |
A data frame containing starting values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of
component distributions. See function |
emsteps |
a non-negative integer specifying the number of EM steps to be performed. |
usecondit |
logical. If |
exptol |
a positive scalar giving the tolerance at which the scaled fitted value is considered large enough to be a degree of freedom. |
print.level |
this argument determines the level of printing
which is done during the optimization process. The default
value of |
... |
additional arguments to the optimization function
|
.
Value
A list containing the following items:
parameters |
A data frame containing estimated values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
se |
A data frame containing estimated values for standard errors of parameters of component distributions. |
distribution |
the distribution used to fit the data. |
constraint |
the constraints on parameters. |
chisq |
the goodness-of-fit chi-square statistic. |
df |
degrees of freedom of the fitted mixture model. |
P |
a significance level (P-value) for the goodness-of-fit test. |
vmat |
covariance matrix for the estimated parameters. |
mixdata |
the original data, i.e. the argument |
usecondit |
the value of the argument |
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mixgroup
for grouping data, mixparam
for
organizing the parameter values, mixconstr
for
constructing constraints. nlm
for additional
arguments.
Examples
data(pike65)
data(pikepar)
fitpike1 <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3)
fitpike1
plot(fitpike1)
data(pike65sg)
fitpike2 <- mix(pike65sg, pikepar, "lnorm", emsteps = 3, usecondit = TRUE)
fitpike2
plot(fitpike2)
data(bindat)
data(binpar)
fitbin1 <- mix(bindat, binpar, "binom",
constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20)))
plot(fitbin1)
fitbin2 <- mix(bindat, binpar, "binom", constr = mixconstr(conpi = "PFX",
fixpi = c(TRUE, TRUE, TRUE, TRUE),
consigma = "BINOM", size = c(20, 20, 20, 20)))
plot(fitbin2)
Construct Constraints on Parameters
Description
Construct constraints on parameters and check if the constraints are invalid. See the reference for details.
Usage
mixconstr(conpi = "NONE", conmu = "NONE", consigma = "NONE",
fixpi = NULL, fixmu = NULL, fixsigma = NULL, cov = NULL,
size = NULL)
Arguments
conpi |
a constraint on proportions, it can be either
|
conmu |
a constraint on means, it can be |
consigma |
a constraint on standard deviations, it can be
|
fixpi |
|
fixmu |
similar to |
fixsigma |
similar to |
cov |
|
size |
|
Value
A list containing the following components, which are,
in order, conpi
, conmu
, consigma
, fixpi
,
fixmu
, fixsigma
, cov
, size
.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mixgroup
for grouping data, mixparam
for
constructing starting values of parameters.
Examples
mixconstr()
mixconstr(conmu = "MEQ", consigma = "SFX", fixsigma = c(TRUE, FALSE, TRUE, TRUE, FALSE))
mixconstr(consigma = "BINOM", size = c(25, 25, 25))
Mixed Data
Description
as.mixdata
checks if its argument is mixed data, if true,
it returns the data with class "mixdata"
, if false, it
returns NULL
.
is.mixdata
returns TRUE
if its argument is of class
"mixdata"
and FALSE
otherwise.
Usage
as.mixdata(x)
is.mixdata(x)
Arguments
x |
object to be tested. |
Details
Mixed data consist of grouped data and conditional data (if available). Grouped data is either a data frame or a matrix, whose first column should be right boundaries of grouping intervals where the first and last intervals are open-ended; whose second column should consist of the frequencies indicating numbers of observations falling into each interval. If conditional data are available, mixed data should have k + 2 columns, where k is the number of components, whose element in row j and column i + 2 is the number of observations from the jth interval belonging to the ith component.
See Also
mixgroup
to construct mixed data.
Examples
data(pike65) # load data set `pike65'
pike65 # display the mixed data `pike65'
data(pike65sg) # load data set `pike65sg'
pike65sg # display the mixed data `pike65sg'
data(pikepar)
as.mixdata(pikepar)
as.mixdata(pike65)
is.mixdata(pike65)
is.mixdata(as.mixdata(pike65))
Construct Grouped Data from Raw Data
Description
Group raw data in the form of numbers of observations over successive intervals.
Usage
mixgroup(x, breaks = NULL, xname = NULL, k = NULL, usecondit = FALSE)
Arguments
x |
a data frame or matrix containing raw data, whose first column should be the measurements to be grouped, and second column, if available, includes the numbers indicating which component each individual belongs to. |
breaks |
one of: * a vector giving the boundaries of intervals which raw data are grouped into, * a single number giving the number of intervals, * a character string naming an algorithm to compute the number of intervals, * a function to compute the number of intervals. In the last three cases the number is a suggestion only. |
xname |
the name of measurement. |
k |
the number of components. |
usecondit |
if |
Value
A data frame containing grouped data derived from raw data,
whose first column includes the right boundaries of grouping
intervals, where the first and last intervals are open-ended;
whose second column consists of the frequencies which are
the numbers of observations falling into each interval. If
usecondit
is TRUE
and the numbers indicating which
component the individual comes from are available, conditional
data which can be regarded as a table, whose element in row j
and column i is the number of observations from the jth
interval belonging to the ith component, will be displayed
with grouped data.
See Also
hist
for more information about the argument
breaks
, is.mixdata
for checking the class of
data sets, mixparam
for organizing the parameter
values, mixconstr
for constructing constraints.
Examples
data(pikeraw) # load raw data `pikeraw'
pikeraw # display the data set `pikeraw'
mixgroup(pikeraw) # group raw data
pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80))
plot(pikemd)
mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80), usecondit = TRUE, k = 5)
# construct grouped data associated with conditional data
mixgroup(pikeraw, usecondit = TRUE)
mixgroup(pikeraw, usecondit = TRUE, k = 3) # grouping data with a warning message
mixgroup(pikeraw, usecondit = TRUE, k = 8)
Find the Parameters to be Estimated
Description
When there are constraints on parameters, we only estimate some parameters in terms of the constraints. This function is to find the parameters to be estimated. See the reference for details.
Usage
mixpar2theta(mixpar, constr, mixprop = TRUE)
Arguments
mixpar |
A data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
constr |
a list of constraints on parameters of component distributions. |
mixprop |
if |
Value
A vector containing the values for the parameters to be estimated.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mix
for fitting mixture model, mixtheta2par
for
computing all the parameters from the estimated
parameters.
Construct Starting Values for Parameters
Description
Construct starting values for parameters of a mixture model.
Usage
mixparam(mu, sigma, pi = NULL)
Arguments
mu |
a vector of means of component distributions, which should be in ascending order. |
sigma |
a vector of standard deviations of component
distributions, which are corresponding to the means. |
pi |
the corresponding mixing proportions of components.
If |
Value
A data frame containing three variables, which are, in order, the proportions, means, and standard deviations.
See Also
mixgroup
for grouping data, mixconstr
for constructing constraints.
Examples
mixparam(mu = c(20, 30, 40), sigma = c(2, 3, 4))
mixparam(c(20, 30, 40), c(3), c(0.15, 0.78, 0.07))
Compute All of Parameters from the Estimated Parameters
Description
When there are constraints on parameters, we only estimate some parameters in terms of the constraints. This function is to compute all of parameters from the estimated ones.
Usage
mixtheta2par(mixtheta, mixpar, constr, mixprop = TRUE)
Arguments
mixtheta |
a vector containing the values for the estimated
parameters, usually, a result of the function |
mixpar |
A data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
constr |
a list of constraints on parameters of component
distributions. See function |
mixprop |
if |
Value
A data frame containing three variables, which are, in order, the proportions, means, and standard deviations, respectively.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mixpar2theta
for finding the parameters to be estimated.
Scale Mixture Data with Three Normal Components
Description
Scale mixture of three normal distributions.
The normals
data frame has 25 rows and 2 columns.
Usage
data(normals)
Format
This data frame contains the following columns:
- x
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Details
A total of 249 observations were generated by computer to follow the mixture distribution 1/3 N(12.5, 1) + 1/3 N(12.5, 3) + 1/3 N(12.5, 5) where N(m, s) denotes a normal distribution with mean m and standard deviation s.
Source
http://www.math.mcmaster.ca/peter/mix/demex/exscle.html
Examples
data(normals)
plot.mixdata(normals)
Karl Pearson's Crab Data
Description
The data give the ratio of "forehead" breadth to body length for 1000 crabs sampled at Naples by Professor W.F.R. Weldon.
The pearson
data frame has 29 rows and 2 columns.
Usage
data(pearson)
Format
This data frame contains the following columns:
- ratio
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Source
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Phil. Trans. Roy. Soc. London A 185, 71-110.
References
http://www.math.mcmaster.ca/peter/mix/demex/excrabs.html
Examples
data(pearson)
plot.mixdata(pearson)
Starting Values of Parameters for the Pearson's Data
Description
Starting values of parameters for fitting a mixture distribution to the data set pearson
.
The pearsonpar
data frame has 2 rows and 3 columns.
Usage
data(pearsonpar)
Format
This data frame contains the following columns:
- pi
the starting values for proportions.
- mu
the starting values for means.
- sigma
the starting values for standard deviations.
Source
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Phil. Trans. Roy. Soc. London A 185, 71-110.
References
http://www.math.mcmaster.ca/peter/mix/demex/excrabs.html
Examples
data(pearsonpar)
Heming Lake Pike Data
Description
The raw data pikeraw
give the lengths of 523 pike (Esox lucius), and there are known to
be five age-groups in the sample. We grouped the lengths over 25 intervals to obtain the grouped
data given as separate samples for each age group determined by scale reading.
The pikdat5
data frame has 25 rows and 6 columns.
Usage
data(pikdat5)
Format
This data frame contains the following columns:
- length
the boundaries of grouping intervals.
- age1
the numbers of observation from each interval belonging to the first age group.
- age2
the numbers of observation from each interval belonging to the second age group.
- age3
the numbers of observation from each interval belonging to the third age group.
- age4
the numbers of observation from each interval belonging to the fourth age group.
- age5
the numbers of observation from each interval belonging to the fifth age group.
Source
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
References
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
Examples
data(pikdat5)
Length-Frequency Data for Heming Lake Pike
Description
The raw data pikeraw
give the lengths of 523 pike (Esox lucius). We
grouped the lengths over 25 intervals to obtain this length-frequency data.
The pike65
data frame has 25 rows and 2 columns.
Usage
data(pike65)
Format
This data frame contains the following columns:
- length
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
Source
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
References
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
Examples
data(pike65)
data(pikepar)
plot.mixdata(pike65)
fit <- mix(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3)
plot(fit)
Length-Frequency Data with Subsamples for Heming Lake Pike
Description
The raw data pikeraw
give the lengths of 523 pike (Esox lucius), and there are known to
be five age-groups in the sample. After grouping the data, we take subsamples from some
intervals to determine the age group, and then obtain this data set.
The pike65sg
data frame has 25 rows and 7 columns.
Usage
data(pike65sg)
Format
This data frame contains the following columns:
- length
the boundaries of grouping intervals.
- freq
the frequencies of observation falling into each interval.
- age1
the numbers of observation in the subsamples belonging to the first age group.
- age2
the numbers of observation in the subsamples belonging to the second age group.
- age3
the numbers of observation in the subsamples belonging to the third age group.
- age4
the numbers of observation in the subsamples belonging to the fourth age group.
- age5
the numbers of observation in the subsamples belonging to the fifth age group.
Source
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
References
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
Examples
data(pike65sg)
data(pikepar)
fit1 <- mix(pike65sg, pikepar, "gamma", mixconstr(consigma = "CCV"), usecondit = TRUE)
plot(fit1)
fit2 <- mix(pike65sg, pikepar, "gamma", usecondit = TRUE)
plot(fit2)
Starting Values of Parameters for the Pike Data
Description
Starting values of parameters for fitting a mixture distribution to the data set pike65
.
The pikepar
data frame has 5 rows and 3 columns.
Usage
data(pikepar)
Format
This data frame contains the following columns:
- pi
the starting values for proportions.
- mu
the starting values for means.
- sigma
the starting values for standard deviations.
Source
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
References
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
Examples
data(pikepar)
A Sample of Pike Lengths
Description
The data give the lengths of 523 pike (Esox lucius), sampled in 1965 from Heming Lake, Manitoba, Canada. There are known to be five age-groups in the sample. For each fish, the age group is determined by scale reading.
The pikeraw
data frame has 523 rows and 2 columns.
Usage
data(pikeraw)
Format
This data frame contains the following columns:
- length
the lengths of 523 pike
- age
the age groups of 523 pike
Source
Macdonald, P.D.M. and T.J. Pitcher (1979). Age-groups from size-frequency data: a versatile and efficient method of analysing distribution mixtures. Journal of the Fisheries Research Board of Canada 36, 987-1001.
References
Macdonald, P.D.M. (1987). Analysis of length-frequency distributions. In R.C. Summerfelt and G.E. Hall [editors], Age and Growth of Fish, Iowa State University Press, Ames, Iowa. pp 371-384.
http://www.math.mcmaster.ca/peter/mix/demex/expike.html
Examples
data(pikeraw)
Mix Object Plotting
Description
A function for plotting of Mix objects. It is called
via the generic function plot
.
Usage
## S3 method for class 'mix'
plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL,
clwd = 1, main, sub, xlab, ylab, bty, BW = FALSE, ...)
Arguments
x |
an object of class |
mixpar |
|
dist |
the distribution of components, it can be
|
root |
if |
ytop |
a scalar which determines the top of the y-axis. |
clwd |
a positive number denoting line width, defaulting to |
main |
an overall title for the plot. |
sub |
a subtitle for the plot. |
xlab |
a title for the x-axis. |
ylab |
a title for the y-axis. |
bty |
A character string which determined the type of box which is
drawn about plots. If |
BW |
logical; if TRUE the plot will be drawn in black and white. |
... |
additional arguments to the function |
Details
If the argument x
gives an object of class
"mix"
, the plot will be a histogram for the grouped
data which come from the element mixdata
of x
.
Although the leftmost (first) and rightmost (mth) intervals are
always open-ended, on the histogram the first interval is shown
as being twice the width of the second interval and the mth is
shown as being twice the width of the m - 1st interval. When the
fitted distribution is one of "lnorm"
, "gamma"
and
"weibull"
, the left boundary of the first interval will be
taken zero since negative values and zeroes are not allowed for
these distribution. For the distributions "binom"
, "nbinom"
and "pois"
negative data are not permitted, so the left
boundary of the first interval is taken -0.5. The component
distributions weighted by their respect proportions and the
mixture distribution are computed by the estimated parameter
values from the element parameters
of x
, and
superimposed on the histogram. The distribution of components
will be taken the value of the element distribution
. If sub
,
xlab
, ylab
and bty
are not specified, the default
values will be used. The positions of the means are indicated with
triangles. When the argument root
is TRUE
, a hanging
rootogram will be displayed, that is, if only grouped data are
given, this option plots the histogram with the square root of
relative frequency on the y-axis. If there is a model as well as
data, not only is the y-axis the square root of relative frequency,
also the bars of the histogram, instead of rising from 0, are
shifted up or down so that the mid-point of the top of the bar
is exactly on the curve indicating the mixture distribution
and the bottom of the bar may therefore be above or below the
x-axis. If the bar goes below the x-axis, the portion below is
shown as a blue rectangle. If the bar does not reach the x-axis,
the space between the bottom of the bar and the x-axis is shown
as a blue rectangle. If the blue rectangles are almost above or
below in an area of the x-axis, we may say that the mixture
curve around that area is not fitting well.
See Also
mixparam
for organizing the parameter values, mix
for fitting mixture model, plot.mixdata
for plotting
Mixdata objects, plot.default
for additional arguments.
Examples
data(pike65)
data(pikepar)
fit1 <- mix(pike65, pikepar, "lnorm",
constr = mixconstr(consigma = "CCV"), emsteps = 3)
plot(fit1)
plot(fit1, root = TRUE)
data(bindat)
data(binpar)
fit2 <- mix(bindat, binpar, "binom",
constr = mixconstr(consigma = "BINOM", size = c(20, 20, 20, 20)))
plot(fit2)
plot(fit2, root = TRUE)
Mixdata Object Plotting
Description
A function for plotting of Mixdata objects. It is called
via the generic function plot
.
Usage
## S3 method for class 'mixdata'
plot(x, mixpar = NULL, dist = "norm", root = FALSE, ytop = NULL,
clwd = 1, main, sub, xlab, ylab, bty, ...)
Arguments
x |
an object of class |
mixpar |
|
dist |
the distribution of components, it can be
|
root |
if |
ytop |
a scalar which determines the top of the y-axis. |
clwd |
a positive number denoting line width, defaulting to |
main |
an overall title for the plot. |
sub |
a subtitle for the plot. |
xlab |
a title for the x-axis. |
ylab |
a title for the y-axis. |
bty |
A character string which determined the type of box which is
drawn about plots. If |
... |
additional arguments to the function |
Details
If the argument mixpar
is NULL
, then only the
histogram of the data will be displayed; if mixpar
gives
the values of parameters, the component distributions and the
mixture distribution are computed from the parameter values
and superimposed on the histogram.
See Also
plot.mix
for plotting Mix objects, plot.default
for additional arguments.
Examples
data(cassie)
as.mixdata(cassie) # if the result isn't `NULL', then cassie is mixed data
plot.mixdata(cassie)
data(pikeraw)
data(pikepar)
pikemd <- mixgroup(pikeraw, breaks = c(0, seq(19.75, 65.75, 2), 80))
plot(pikemd)
plot(pikemd, pikepar, "lnorm")
fit <- mix(pikemd, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"), emsteps = 3)
plot(fit)
plot(pikemd, pikepar, "lnorm", root = TRUE)
plot(fit, root = TRUE)
Grouped Poisson Data
Description
The poisdat
data frame has 15 rows and 2 columns.
Usage
data(poisdat)
Format
This data frame contains the following columns:
- X
the boundaries of grouping intervals.
- samppois
the frequencies of observation falling into each interval.
Examples
data(poisdat)
plot.mixdata(poisdat)
Starting Values of Parameters for the Poisson Data Set
Description
Starting values of parameters for fitting a mixture distribution to the data set poisdat
.
The poispar
data frame has 4 rows and 3 columns.
Usage
data(poispar)
Format
This data frame contains the following columns:
- pi
the starting values for proportions.
- mu
the starting values for means.
- sigma
the starting values for standard deviations.
Examples
data(poispar)
Print Mix Object
Description
print.mix
is a function which prints objects of
class "mix"
and returns it invisibly. It is called
via the generic function print
.
Usage
## S3 method for class 'mix'
print(x, digits = 4, ...)
Arguments
x |
an object of class |
digits |
how many significant digits are to be used. |
... |
further arguments passed to or from other methods. |
Details
This function only prints information about the mixture
model, which are the estimated parameters of the mixture,
the distribution of components and the constraints on
the parameters. Also, the values for the parameters are
rounded to the specified number of decimal places (default 4).
The whole object can be printed out using the function
print.default
.
See Also
mix
for model fitting. print.default
for
printing the whole object.
Examples
data(pike65)
data(pikepar)
fit <- mix(pike65, pikepar, "gamma", mixconstr(consigma = "CCV"), emsteps = 3)
fit
print(fit)
print.mix(fit)
print.default(fit)
Summarizing Mixture Model Fits
Description
summary
method for class "mix"
. It is called via
the generic function summary
.
Usage
## S3 method for class 'mix'
summary(object, digits = 4, ...)
Arguments
object |
an object of class |
digits |
how many significant digits are to be used. |
... |
additional arguments affecting the summary produced. |
Value
A list containing the following items:
parameters |
a data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
standard errors |
a data frame giving the standard errors of estimated parameters. |
anova table |
analysis of variance table for the
|
See Also
mix
for model fitting, summary
for
summarizing other kinds of object. anova.mix
for
information about anova table
.
Examples
data(pike65)
data(pikepar)
fit <- mix(pike65, pikepar, "lnorm", mixconstr(consigma = "CCV"), emsteps = 3)
fit
summary(fit)
Check Constraints
Description
Check if constraints on parameters are valid. See the reference for details.
Usage
testconstr(mixdat, mixpar, dist, constr)
Arguments
mixdat |
a data frame containing grouped data, whose first
column should be right boundaries of grouping intervals, whose
second column should consist of the frequencies indicating
numbers of observations falling into each interval. If conditional
data are available, this data frame should have |
mixpar |
a data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of component
distributions. See function |
Value
If the constraints are valid, this function will give a
logical value TRUE
. If not, it will give an error
message to illustrate the reason.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mixgroup
for grouping data, mixparam
for
organizing the parameter values, mixconstr
for constructing
constraints.
Examples
## Not run:
testconstr(pike65, pikepar, "lnorm", constr = mixconstr(consigma = "CCV"))
testconstr(bindat, binpar, "binom", constr = mixconstr())
testconstr(bindat, binpar, "binom", constr = mixconstr(consigma = "BINOM"))
testconstr(bindat, binpar, "pois", constr = mixconstr(conmu = "MEQ", consigma = "POIS"))
## End(Not run)
Check Parameters
Description
Check if the values of parameters are valid. See the reference for details.
Usage
testpar(mixpar, dist, constr)
Arguments
mixpar |
a data frame containing the values for parameters of component distributions, which are, in order, the proportions, means, and standard deviations. |
dist |
the distribution of components, it can be one of
|
constr |
a list of constraints on parameters of component
distributions. See function |
Details
Any of the parameter values can not be missing value (NA
or NaN
) or infinity (Inf
), and the proportions
can only take the values between 0 and 1. Besides, the
standard deviations can not be negative. The components
must be indexed so that the means are in non-decreasing
order. If any two consecutive means are equal, then the
corresponding standard deviations must be in strictly
ascending order. Furthermore, the parameter values should
be consistent with the constraints and the distribution
of components. For example, if one wants to constrain
the means to lie along a growth curve, then (\mu_3
- \mu_2
) <
(\mu_2
- \mu_1
) is required. Also, negative means are not permitted by
the constraints "FCV"
, "CCV"
, "BINOM"
, "NBINOM"
,
"POIS"
and all the distributions but Normal. If the
Binomial distribution components with the constraint "BINOM"
are fitted, then the relation \mu_i
> (\sigma_i)^2
need to be
satisfied. And the Negative Binomial components with the
constraint "NBINOM"
require \mu_i
< (\sigma_i)^2
.
Value
logical. If TRUE
, the parameters are valid. If FALSE
,
some of the parameters are invalid. Since this function is for
internal use, it doesn't give error messages.
References
Macdonald, P.D.M. and Green, P.E.J. (1988) User's Guide to Program MIX: An Interactive Program for Fitting Mixtures of Distributions. ICHTHUS DATA SYSTEMS.
See Also
mixparam
for organizing the parameter values,
mixconstr
for constructing constraints, testconstr
for checking constraints.
Compute Shape and Scale Parameters for Weibull Distribution
Description
Compute the parameters shape and scale for Weibull distribution given the mean, standard deviation and location.
Usage
weibullpar(mu, sigma, loc = 0)
Arguments
mu |
the mean of weibull distribution. |
sigma |
the standard deviation of weibull distribution. |
loc |
the location parameter of weibull distribution defaulting to |
Value
A data frame containing three parameters, which are, in order, shape, scale, and location.
See Also
weibullparinv
for computing mean and standard
deviation from the parameters shape, scale and location.
Examples
weibullpar(2, 1.2)
weibullpar(2, 1.2, 1)
Compute the Mean and Standard Deviation of Weibull Distribution
Description
Compute mean and standard deviation of weibull distribution given the values of shape, scale and location.
Usage
weibullparinv(shape, scale, loc = 0)
Arguments
shape |
the shape parameter of weibull distribution. |
scale |
the scale parameter of weibull distribution. |
loc |
the location parameter of weibull distribution defaulting to 0. |
Value
A data frame containing three parameters, which are, in order, mean, standard deviation and location.
See Also
weibullpar
for computing the parameters shape and scale
from mean and standard deviation.
Examples
weibullparinv(weibullpar(2, 1.2)$shape, weibullpar(2, 1.2)$scale)