Chemical analysis of proteins based on their amino acid compositions. Amino acid compositions can be read from FASTA files and used to calculate chemical metrics including carbon oxidation state and stoichiometric hydration state as described in Dick et al. (2020). Other properties that can be calculated include protein length, grand average of hydropathy (GRAVY), isoelectric point (pI), molecular weight (MW), standard molal volume (V0), and metabolic costs (Akashi and Gojobori, 2002; Wagner, 2005; Zhang et al., 2018). A database of amino acid compositions of human proteins derived from UniProt is provided.
See the vignettes at https://chnosz.net/canprot/vignettes/.
First install the remotes package from CRAN, then install canprot from GitHub. This also installs several other R packages as dependencies:
install.packages("remotes")
::install_github("jedick/canprot") remotes
Three demos are available. One of them is shown below.
demo("thermophiles")
#demo("locations")
#demo("redoxins")
This is a scatter plot of standard specific entropy
(S° per gram) and carbon oxidation state
(ZC) for proteins in Nitrososphaeria (syn.
Thaumarchaeota) metagenome-assembled genomes (MAGs) reported by
Luo et al. (2024).
S° is calculated using amino acid group
contributions (Dick et al,
2006) via canprot::S0g()
. This plot reveals that
proteins tend to have higher specific entropy in MAGs from thermal
habitats compared to those from nonthermal habitats with similar carbon
oxidation state. This implies that, after correcting for
ZC, proteins in thermophiles have a more
negative derivative of the standard Gibbs energy per gram of
protein with respect to temperature. See the Demos for
canprot vignette for a similar plot for genomes of methanogenic
archaea.