% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cellMarkers.R
\name{cellMarkers}
\alias{cellMarkers}
\title{Identify cell markers}
\usage{
cellMarkers(
  scdata,
  bulkdata = NULL,
  subclass,
  cellgroup = NULL,
  nsubclass = 25,
  ngroup = 10,
  expfilter = 0.5,
  noisefilter = 2,
  noisefraction = 0.25,
  min_cells = 10,
  remove_subclass = NULL,
  dual_mean = FALSE,
  meanFUN = "logmean",
  postFUN = NULL,
  verbose = TRUE,
  sliceMem = 16,
  cores = 1L,
  ...
)
}
\arguments{
\item{scdata}{Single-cell data matrix with genes in rows and cells in
columns. Can be sparse matrix or DelayedMatrix. Must have rownames
representing gene IDs or gene symbols.}

\item{bulkdata}{Optional data matrix containing bulk RNA-Seq data with genes
in rows. This matrix is only used for its rownames (gene IDs), to ensure
that cell markers are selected from genes in the bulk dataset.}

\item{subclass}{Vector of cell subclasses matching the columns in \code{scdata}}

\item{cellgroup}{Optional grouping vector of major cell types matching the
columns in \code{scdata}. \code{subclass} is assumed to contain subclasses which are
subsets within \code{cellgroup} overarching classes.}

\item{nsubclass}{Number of genes to select for each single cell subclass.
Either a single number or a vector with the number of genes for each
subclass.}

\item{ngroup}{Number of genes to select for each cell group. Either a single
number or a vector with the number of genes for each group.}

\item{expfilter}{Genes whose maximum mean expression on log2 scale per cell
type are below this value are removed and not considered for the signature.}

\item{noisefilter}{Sets an upper bound for \code{noisefraction} cut-off below
which gene expression is set to 0. Essentially gene expression above this
level must be retained in the signature. Setting this higher can allow more
suppression via \code{noisefraction} and can favour more highly expressed genes.}

\item{noisefraction}{Numeric value. Maximum mean log2 gene expression across
cell types is calculated and values in celltypes below this fraction are
set to 0. Set in conjunction with \code{noisefilter.} Note: if this is set too
high (too close to 1), it can have a deleterious effect on deconvolution.}

\item{min_cells}{Numeric value specifying minimum number of cells in a
subclass category. Subclass categories with fewer cells will be ignored.}

\item{remove_subclass}{Character vector of \code{subclass} levels to be removed
from the analysis.}

\item{dual_mean}{Logical whether to calculate arithmetic mean of counts as
well as mean(log2(counts +1)). This is mainly useful for simulation.}

\item{meanFUN}{Either a character value or function for applying mean which
is passed to \code{\link[=scmean]{scmean()}}. Options include \code{"logmean"} (the default) or
\code{"trimmean"} which is a trimmed after excluding the top/bottom 5\% of
values.}

\item{postFUN}{Optional function applied to \code{genemeans} matrices after mean
has been calculated. If \code{meanFUN} is set to \code{"trimmean"}, then \code{postFUN}
is set to \code{log2s}. See \code{\link[=scmean]{scmean()}}.}

\item{verbose}{Logical whether to show messages.}

\item{sliceMem}{Max amount of memory in GB to allow for each subsetted count
matrix object. When \code{scdata} is subsetted by each cell subclass, if the
amount of memory would be above \code{sliceMem} then slicing is activated and
the subsetted count matrix is divided into chunks and processed separately.
This is indicated by addition of '...' in the printed timings. The limit is
just under 17.2 GB (2^34 / 1e9). Above this the subsetted matrix breaches
the long vector limit (>2^31 elements).}

\item{cores}{Integer, number of cores to use for parallelisation using
\code{mclapply()}. Parallelisation is not available on windows. Warning:
parallelisation has increased memory requirements. See \code{\link[=scmean]{scmean()}}.}

\item{...}{Additional arguments passed to \code{\link[=scmean]{scmean()}} such as \code{use_future}.}
}
\value{
A list object with S3 class 'cellMarkers' containing:
\item{call}{the matched call}
\item{best_angle}{named list containing a matrix for each cell type with
genes in rows. Rows are ranked by lowest specificity angle for that cell
type and highest maximum expression. Columns are:
\code{angle} the specificity angle in radians,
\code{angle.deg} the same angle in degrees,
\code{max} the maximum mean expression across all cell types,
\code{rank} the rank of the mean gene expression for that cell type compared to
the other cell types}
\item{group_angle}{named list of matrices similar to \code{best_angle}, for each
cell subclass}
\item{geneset}{character vector of selected gene markers for cell types}
\item{group_geneset}{character vector of selected gene markers for cell
subclasses}
\item{genemeans}{matrix of mean log2+1 gene expression with genes in rows
and cell types in columns}
\item{genemeans_filtered}{matrix of gene expression for cell types
following noise reduction}
\item{groupmeans}{matrix of mean log2+1 gene expression with genes in rows
and cell subclasses in columns}
\item{groupmeans_filtered}{matrix of gene expression for cell subclasses
following noise reduction}
\item{cell_table}{factor encoded vector containing the groupings of the
cell types within cell subclasses, determined by which subclass contains
the maximum number of cells for each cell type}
\item{spillover}{matrix of spillover values between cell types}
\item{subclass_table}{contingency table of the number of cells in each
subclass}
\item{opt}{list storing options, namely arguments \code{nsubclass}, \code{ngroup},
\code{expfilter}, \code{noisefilter}, \code{noisefraction}}
\item{genemeans_ar}{if \code{dual_mean} is \code{TRUE}, optional matrix of arithmetic
mean, i.e. log2(mean(counts)+1)}
\item{genemeans_filtered_ar}{optional matrix of arithmetic mean
following noise reduction}
The 'cellMarkers' object is designed to be passed to \code{\link[=deconvolute]{deconvolute()}} to
deconvolute bulk RNA-Seq data. It can be updated rapidly with different
settings using \code{\link[=updateMarkers]{updateMarkers()}}. Ensembl gene ids can be substituted for
recognisable gene symbols by applying \code{\link[=gene2symbol]{gene2symbol()}}.
}
\description{
Uses geometric method based on vector dot product to identify genes which are
the best markers for individual cell types.
}
\details{
If \code{verbose = TRUE}, the function will display an estimate of the required
memory. But importantly this estimate is only a guide. It is provided to help
users choose the optimal number of cores during parallelisation. Real memory
usage might well be more, theoretically up to double this amount, due to R's
use of copy-on-modify.
}
\seealso{
\code{\link[=deconvolute]{deconvolute()}} \code{\link[=updateMarkers]{updateMarkers()}} \code{\link[=gene2symbol]{gene2symbol()}}
}
\author{
Myles Lewis
}
