\documentclass{article} \usepackage{natbib} \usepackage{graphics} \usepackage{amsmath} \usepackage{indentfirst} \usepackage[utf8]{inputenc} \usepackage{hyperref} \newcommand{\Rfunction}[1]{{\texttt{#1}}} \newcommand{\Rpackage}[1]{{\textit{#1}}} %\VignetteIndexEntry{Tutorial: Reading, Cleaning, and Aggregating} \begin{document} \setkeys{Gin}{width=1.1\textwidth} <>= options(keep.source = TRUE, width = 60) foo <- packageDescription("primate") @ \title{Aggregation and merging in the \\ \Rpackage{primate} R-package (Version \Sexpr{foo$Version}):\\ Using locomotion and height SQL database tables to assess arboreal risk. } \author{ David Schruth \texttt{dschruth@anthropoidea.net}\\ \\ Marc Myers \texttt{mmyers@primate.org}\\ Noel Rowe \texttt{nrowe@primate.org}\\ } \maketitle \section{Licensing} This package is licensed under the Apache License v2.0: it is therefore free to use and redistribute, however, we, the copyright holders, wish to maintain primary control over any further development. Please be sure to cite us if you use this package in work leading to publication. \section{Installation} You currently need to have java installed on your system in order to use RJDBC which depends on rJava and jre and the jdk. \begin{verbatim} # sudo apt-get install default-jre default-jdk #R CMD javareconf -e # within the same shell that you run R \end{verbatim} Building the \Rpackage{primate} package from source requires that you have the proper dependency packages, \Rpackage{caroline}, installed from CRAN. This can typically be accomplished via the following commands from within the R command line environment: \begin{verbatim} install.packages(c('caroline','RJDBC')) \end{verbatim} After a successful installation the \Rpackage{primate} package can be loaded in the normal way: by starting R and invoking the following \Rfunction{library} command: <>= library(primate) @ \section{Introduction} \Sexpr{foo$Description} \section{Reading in Data} First we need to read the data into R using either locally cached datasets ("pkg.tab") or by using functions to access a remote SQL server mirror (commented out below). The local caches are a fraction of the full dataset and temporary as they will soon exceed the maximum allowable package size. <>= primates.tab <- AWP.read.pkg.tab(tab.nm='dbo_tblGrovesMonkeys', id.clmn='MonkeyNumberGroves') heights.tab <- AWP.read.pkg.tab(tab.nm='HMeasure') locomo.tab <- AWP.read.pkg.tab(tab.nm='Locomotion') locomot.types <- AWP.read.pkg.tab(tab.nm='dbo_LMType') taxa.clmns <- c('Superfamily','Family','Genus','Species') @ \section{Aggregate and Merge} First we need to aggreate the data so that we have one record per species. We will use the groupBy() and nerge() functions (from the \Rpackage{caroline} package) as well as the regroup.gnsp() functions from the \Rpackage{primate} package. <>= ### HEIGHTS ### getHMmeans <- function(code){ ret <- groupBy(df=heights.tab[heights.tab$HMTypeCd==code,], aggregation='mean', by='PrimateID', clmns='Mean') names(ret) <- code; return(ret) } heights <- list() levels <- c('GRND','G5','510','1020','2030','3040','40+') for(lev in levels) heights[[lev]] <- getHMmeans(lev) #not sure why these numbers are so high... are they observations? #names(heights) <- c(paste('h',names(heights)[1:6],sep=''),'h40up') AWPheight <- nerge(heights) AWPheightsp <- nerge(list(h=tab2df(AWPheight), p=primates.tab[,taxa.clmns])) AWPheightgs <- regroup.gnsp(df=AWPheightsp,clmns=colnames(AWPheightsp)[1:7]) names(AWPheightgs) <- c(sub('X','h',x=names(AWPheightgs)[-ncol(AWPheightgs)]),"h40up") h.levs <- names(AWPheightgs) ### LOCOMOTION ### AWPlocLst<-list() loc.modes <- c("BRAC","CLIM","LEAP","QUAD","SUSP") for(loc in loc.modes){ dfsub=locomo.tab[locomo.tab$LMTypeCd==loc,] AWPlocLst[[loc]] <- groupBy(df=dfsub,aggregation='mean',by='PrimateID',clmns='PctOfTime') } AWPlocDF <- nerge(AWPlocLst, all=T) AWPlocDFsp <- nerge(list(AWPlocDF, primates.tab[,taxa.clmns])) ## add primate names and re-group by them instead (the ids are non-unique at the genus_species level) loc.pct.modes <- paste("PctOfTime",loc.modes,sep='.') AWPlocDFgs <- regroup.gnsp(df=AWPlocDFsp,clmns=loc.pct.modes) @ \section{Merge} Finally we merge these two aggregated tables together and look for an interdisciplinary correlation. <>= lh <- loc.high <- nerge(list(h=AWPheightgs,l=AWPlocDFgs)) risky.modes <- loc.pct.modes[c(1,3)] tmp <- lapply(h.levs, function(x) {ret <- subset(loc.high,get(x)>3 ,risky.modes); names(ret) <-sub('PctOfTime\\.','',x=risky.modes);ret}) names(tmp)<- h.levs @ <>= oldpar <- par(mfrow=c(1,length(tmp)), mar=c(2,2,3,0)) sapply(names(tmp), function(i){boxplot(tmp[[i]], ylim=c(0,100), varwidth=T, main=i)}) par(oldpar) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Brachiation and leaping frequencies by canopy height} \label{fig:one} \end{figure} \begin{thebibliography}{} \bibitem[Rowe & Myers, \textit{et~al}. (2017)]{Rowe2017} Noel Rowe and Marc Myers, (2017), "All the World's Primates", Pagonias Press, Charlestown RI \end{thebibliography} \end{document}