%\VignetteIndexEntry{DEU_PISA2012_description}

\documentclass{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc} %jhh geändert in utf8
\usepackage{pdflscape}% ergänzung um Tabellen im Querformat darzustellen
\usepackage{Sweave}
\usepackage{apacite}
\usepackage{tabularx}
\usepackage{fancyvrb}

\title{Description for the PISA 2012 data included in the package pairwise}
\date{26.03.2014}
\author{Joerg-Henrik Heine}


\begin{document}
\SweaveOpts{concordance=TRUE}

\maketitle

\section{Loading data}

First install and load the package 'pairwise'.

<<>>=
library(pairwise)
@

Once you have installed the package 'pairwise' the selected PISA 2012 Data for the German sub sample can be loaded by simply typing the following command into the R console.

<<>>=
data(DEU_PISA2012)
@

\section{Structure of the data}

The data is organized as an multiple nested list. 

The respective name of every list entry at any level should give an idea of its content. For a first overview look at the names of the first levels of the list object.

<<>>=
names(DEU_PISA2012)
@

On the first list level of the object \texttt{DEU\_PISA2012} there are \Sexpr{length(DEU_PISA2012)} list entries. For a first closer inspection of the content check the object class for every list entry 

<<>>=
lapply(DEU_PISA2012,class)
@

Three of the five list entries at first level are data frames, which hold the following kind of data. ID-Variables in the first list entry, additional variables like gender in the second list entry and case weights and replicate weights in the last list entry. 

\subsection{ID variables}
<<>>=
names(DEU_PISA2012$id)
@
\begin{itemize}
  \item the variable \verb+\Sexpr{names(DEU_PISA2012$id)[1]}+  is an unique identifier for every case in regard to the international data set -- which is not included in this package.
  \item the variable \Sexpr{names(DEU_PISA2012$id)[2]}  groups every case into the school level when drawing the sample using school lists for the German sub sample.
    \item the variable \Sexpr{names(DEU_PISA2012$id)[3]}  is an unique identifier for every case in regard to the international German sub sample.
      \item the variable \Sexpr{names(DEU_PISA2012$id)[4]}  indicates which booklet each participant from the German sub sample was assigned to. The following command returns the frequencies for the booklet variable:
<<>>=
table(DEU_PISA2012$id$BOOKID)
@   
There are \Sexpr{length(table(DEU_PISA2012$id$BOOKID))}  booklets. Each booklet from "1"  to "13"  is part of the rotated design for the assessment of the three competencies (math, reading, science). Booklet "20"  is the so called UH-booklet, containing items from the regular item set but is a shortened version of the regular booklets.       
        \item the variable \Sexpr{names(DEU_PISA2012$id)[5]}  indicates which of the rotated questionnaire-booklets, assessing the non cognitive constructs, was assigned to each participant from the German sub sample. 
<<>>=
table(DEU_PISA2012$id$QuestID)
@   
There are \Sexpr{length(table(DEU_PISA2012$id$QuestID))} different questionnaire-booklets. \\ Questionaire-booklets "1" to "3" are regular versions and "5" was a short version (UH-version).
\end{itemize}

\subsection{Additional variables (covariate)}

<<>>=
names(DEU_PISA2012$covariate)
@

\begin{itemize}
  \item   \Sexpr{names(DEU_PISA2012$covariate)[1]}  indicates whether an easier booklet, containing only easy items from the regular item set, was given to the participant. As there was no easier booklet assigned to any participant in the international German sub sample, this variable is rather a constant.
  \item   \Sexpr{names(DEU_PISA2012$covariate)[2]}  indicates the relative grade of each participant, relating to the target population.
  \item   \Sexpr{names(DEU_PISA2012$covariate)[3]}  indicates national study program, which is 'Schulart' in German language.
  \item   \Sexpr{names(DEU_PISA2012$covariate)[4]}  indicates whether an participant repeated a class (at any grade).
  \item   \Sexpr{names(DEU_PISA2012$covariate)[5]}  the age of the participant. 
  \item   \Sexpr{names(DEU_PISA2012$covariate)[6]}  the gender.
 \end{itemize} 


\subsection{Cognitive variables (PISA competencies)}

<<>>=
names(DEU_PISA2012$cog)
@

This list level contains the plausible values \texttt{pv}, drawn from the international scalig procedure as well as the scored responses \texttt{dat} of the participants answering to the questions coresponding to the three PISA competencies (\textbf{math}, \textbf{reading} and \textbf{science} ).

\subsubsection{Plausible values (pv)}
<<>>=
names(DEU_PISA2012$cog$pv)
@

for any of the three PISA competencies 5 plausible values were drawn, which are stored as a list with length 5.

<<>>=
names(DEU_PISA2012$cog$pv$MATH)
@

<<>>=
names(DEU_PISA2012$cog$pv$READ)
@

<<>>=
names(DEU_PISA2012$cog$pv$SCIE)
@

Any of the respective list entries covers all participants in the German sub sample and therefor has a lenght of \Sexpr{length(DEU_PISA2012$cog$pv$MATH$PV1MATH)} (cases) -- e.g.:
<<>>=
length(DEU_PISA2012$cog$pv$MATH$PV1MATH)
@


\subsubsection{Scored data and missing incidenz matrices (dat)}
<<>>=
names(DEU_PISA2012$cog$dat)
@
The list level \texttt{DEU\_PISA2012\$cog\$dat} contains the scored data and missing incidence matrices for any of the three PISA competencies.

The complete (sub-) structure of the list level \texttt{DEU\_PISA2012\$cog\$dat} is returned by the following R-command.   
<<>>=
rapply(DEU_PISA2012$cog$dat,names,classes = "list",how="list")
@


As an example we demonstrate the structure using the \texttt{MATH} domain:

<<>>=
names(DEU_PISA2012$cog$dat$MATH)
@

For any of the three competence domains there are three list entries at that list level, which contain the scored responses ( \texttt{resp} ), an icidence matrix for missing by design ( \texttt{inc7} ) and an icidence matrix for not reached items ( \texttt{inc8} ). All of the three entries are stored as matrices, which have the same dimensionality.

<<>>=
dim(DEU_PISA2012$cog$dat$MATH$resp)
dim(DEU_PISA2012$cog$dat$MATH$inc7)
dim(DEU_PISA2012$cog$dat$MATH$inc8)
@

The values in \texttt{DEU\_PISA2012\$cog\$dat\$MATH\$resp} range from \Sexpr{min(DEU_PISA2012$cog$dat$MATH$resp,na.rm=TRUE)} to \Sexpr{max(DEU_PISA2012$cog$dat$MATH$resp,na.rm=TRUE)} -- with \texttt{NA} values removed. These vlues in \texttt{DEU\_PISA2012\$cog\$dat\$MATH\$resp} represent the scored responses of the participants following an partial-credit-scoring approach -- with the folowing meaning:
\begin{itemize}
  \item 0 = wrong
  \item 1 = correct / or partial correct (\emph{if there is a third category for the respective item})
  \item 2 = correct (\emph{if there is a third category for the respective item})     
\end{itemize}

The \texttt{NA} values in the \texttt{resp} matrix have two meanings: either missing by (rotated) design or not reached. Either meaning of an respective \texttt{NA} value is coded in the two incidence matices named \texttt{inc7} and \texttt{inc8} .


\subsection{"Non cognitive"\ ~variables}

<<>>=
names(DEU_PISA2012$ncog)
@

this list level contains the scales and item responses of the "non-cognitive"\ constructs, assessed in the student-questionnaire. 

\subsubsection{Weighted likelihood estimates for constructs (WLE) }
<<>>=
names(DEU_PISA2012$ncog$wle)
@

The level \texttt{DEU\_PISA2012\$ncog\$wle} contains the weighted likelihood estimates (WLE) for a selection of \Sexpr{length(names(DEU_PISA2012$ncog$wle))} constructs out of 52 constructs beeing assesed in PISA 2012. The selection of the \Sexpr{length(names(DEU_PISA2012$ncog$wle))} constructs is more or less arbitrary, following the personal intresst of the author of the package 'pairwise'. The WLE-estimates are based on the responses on the respective items in the student-questionnaire, as a result of the international scaling procedure.

\subsubsection{Item responses for "non-cognitive"\ constructs (dat) }
<<>>=
names(DEU_PISA2012$ncog$dat)
@

For any of the \Sexpr{length(names(DEU_PISA2012$ncog$dat))} "non-cognitive" constructs there are four list entries at that list level. 

As an example we demontstrate the further structure using the "non-cognitive" construct \texttt{CLSMAN}: 

<<>>=
names(DEU_PISA2012$ncog$dat$CLSMAN)
@

These \Sexpr{length(names(DEU_PISA2012$ncog$dat$CLSMAN))} matrices contain the scored responses \texttt{resp}  , an icidence matrix for missing (by design) \texttt{inc7}, an icidence matrix for invalid responses \texttt{inc8} and an icidence matrix for missing (by testee) responses \texttt{inc9}. All of the three entries are stored as matrices, which have the same dimensionality.

<<>>=
lapply(DEU_PISA2012$ncog$dat$CLSMAN,dim)
@

For any of the \Sexpr{length(names(DEU_PISA2012$ncog$dat))} "non-cognitive" constructs the list level entry \texttt{resp} is a matrix with named columns giving the international PISA item names.

<<>>=
colnames(DEU_PISA2012$ncog$dat$CLSMAN$resp)
@


\subsection{Case weights and replicate weights}

The last list level \texttt{DEU\_PISA2012\$weights} \ finaly contains the FINAL STUDENT WEIGHT ( \texttt{W\_FSTUWT} \ ) and the FINAL STUDENT REPLICATE BRR-FAY WEIGHTS ( \texttt{W\_FSTR1} \ to \texttt{W\_FSTR80} \ ) and few other weighting variables used in the international analysis procedure.
<<>>=
names(DEU_PISA2012$weights)
@

\section{Analysis with the German PISA 2012 data set}

Compute the percentge of missing by design:

for mathmatic competenc assessment
  
<<>>=
(sum(DEU_PISA2012$cog$dat$MATH$inc7))/(prod(dim(DEU_PISA2012$cog$dat$MATH$inc7)))*100
@

for reading competenc assessment
  
<<>>=
(sum(DEU_PISA2012$cog$dat$READ$inc7))/(prod(dim(DEU_PISA2012$cog$dat$READ$inc7)))*100
@
  
for science competenc assessment
  
<<>>=
(sum(DEU_PISA2012$cog$dat$SCIE$inc7))/(prod(dim(DEU_PISA2012$cog$dat$SCIE$inc7)))*100
@


\end{document}