% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Main.codes.R
\name{Data.filter}
\alias{Data.filter}
\title{Filtering for Microbial Features of Low Abundance and Low Prevalence.}
\usage{
Data.filter(
  Data,
  metadata,
  OTU_counts_filter_value = 1000,
  OTU_filter_value = NA,
  log_base = NA,
  Group_var = NULL
)
}
\arguments{
\item{Data}{A data frame or a list object which contains the selected biomarker count table
(generated by \code{\link[MicrobTiSDA]{Data.rf.classifier}}), where rows represent OTUs/ASVs
and columns represent samples.}

\item{metadata}{A data frame. Containing information about all samples, including at least the grouping of all samples as well as
individual information (\code{Group} and \code{ID}), the sampling \code{Time} point for each sample, and other relevant information.}

\item{OTU_counts_filter_value}{An integer, indicating the sum of the minimum abundances of OTUs/ASVs in all samples. If the sum
of the abundances that OTU is below the given positive integer threshold, the OTU is excluded, and vice versa, it is retained.
The default is 1000. Note: if the input \code{Data} is the important OTU table that produced via sample classification,
this argument should be \code{NA}, As the low abundance OTUs/ASVs might be filtered out during the classification progress by
\code{\link[MicrobTiSDA]{Data.rf.classifier}}.}

\item{OTU_filter_value}{Numeric between 0 and 1. This specifies the minimum prevalence rate of an OTU/ASV across all samples within
each group or individual. OTUs/ASVs with a prevalence rate below the given threshold will be removed.}

\item{log_base}{This argument specifies the base of the logarithm. When the dataset is not very large, the default is \code{NA},
and no logarithmic transformation is applied. For large datasets, the logarithm base can be 2, "e", or 10.}

\item{Group_var}{A string or a vector. This specifies the grouping variables, which should match the column names in
the \code{metadata} used to designate sample groups, and for pre-processing OTU data of each group or individual separately.
For instance, to split the OTU table based on the \code{Group} variable, set \code{Group_var = "Group"};
to split the data based on the \code{Group} and \code{Diet} (if in \code{metadata})categorical variables to study the
interaction between different grouping variables, set \code{Group_var = c("Group","Diet")}.}
}
\value{
A list of class \code{FilteredData} containing:
\describe{
\item{filtered_table}{The filtered OTU count table, optionally log-transformed.}
\item{parameters}{A list of the filtering parameters used.}
\item{metadata}{The input metadata, possibly augmented with a combined grouping variable if multiple \code{Group}s were provided.}
}
}
\description{
This function filteres an OTU/ASV table based on overall counts and prevalence thresholds, and
optionally applies a logarithmic transformation. When grouping variables are provided,
the function performs abundance and prevalence filtering within each group separately.
}
\details{
The function executes several key steps:
\enumerate{
\item \strong{Input Validation:}
It first checks whether the input \code{Data} is a data frame or a list generated by function \code{\link[MicrobTiSDA]{Data.rf.classifier}}.
If \code{Data} is a list but not a data frame, the first element is extracted. Otherwise, if \code{Data} is neither a data frame nor an appropriate
list, the function stops with an error.

\item \strong{OTU Count Filtering:}
If an \code{OTU_counts_filter_value} is provided (i.e., not \code{NA}), OTUs with total counts (across all samples) less than or equal to this value are removed.

\item \strong{Logarithmic Transformation:}
If a \code{log_base} is specified (allowed values are 10, 2, or e), a log transformation (with an offset of 1 to avoid log(0)) is applied to the data.
If \code{log_base} is \code{NA}, the data remains untransformed.

\item \strong{Prevalence Filtering without Grouping:}
When \code{Group_var} is not provided (\code{NULL}), if an \code{OTU_filter_value} is specified, the function filters out OTUs whose prevalence (the proportion of samples
with a non-zero count) is less than the threshold. If \code{OTU_filter_value} is not provided, a warning is issued and no prevalence filtering is applied.

\item \strong{Group-based Prevalence Filtering:}
If one or more grouping variables are specified in \code{Group_var}, the function first checks that these variables exist in \code{metadata}.
For each group (or combination of groups if multiple variables are provided), the prevalence of each OTU is calculated, and OTUs are retained if they
meet the prevalence threshold in at least one group. The filtered OTU table is then returned.
}
}
\examples{
# Example OTU table
set.seed(123)
otu_table <- as.data.frame(matrix(sample(0:100, 100, replace = TRUE), nrow = 10))
rownames(otu_table) <- paste0("OTU", 1:10)
colnames(otu_table) <- paste0("Sample", 1:10)


# Example metadata
metadata <- data.frame(
  Group = rep(c("A", "B"), each = 5),
  row.names = paste0("Sample", 1:10)
)

# Filter OTU table without grouping
filtered_data <- Data.filter(
  Data = otu_table,
  metadata = metadata,
  OTU_counts_filter_value = 50,
  OTU_filter_value = 0.2
)

# Filter OTU table with grouping
filtered_data_grouped <- Data.filter(
  Data = otu_table,
  metadata = metadata,
  OTU_filter_value = 0.5,
  Group_var = "Group"
)


}
\author{
Shijia Li
}
