TMB Model Validation

class: center, middle, inverse, title-slide

# TMB Model Validation
### Andrea Havron NOAA Fisheries, OST

---

layout: true

.footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service]

code.cpp{
  font-size: 14px;
}
code.r{
  font-size: 14px;
}

.remark-slide-content > h1 {
  font-size: 40px;
}
.remark-slide-content > h3 {
  font-size: 25px;
  text-align: left !important;
  margin-left: 235px
}

</style>

---
# Relevancy to Fishery Stock Assessments

- .large[Models currently using TMB]:
  - WHAM
  - SAM
  - LIME
  - VAST/sdmTMB

- .large[Next Gen Stock Assessment Projects using TMB]:
  - FIMS
  - gadget3
  
- .large[Methods relevant to models with or without random effects]
  - Composition data: Trijoulet et al. 2023
  
- .large[Clear guidelines lacking]

---
# Quantile Residuals

.pull-left[
`$r_i = \phi^{-1}\big(F(Y_i,\Theta)\big)$`

- `$F(Y,\Theta)$`: cdf of `$f(y,\Theta)$`

- `$\phi^{-1}$`: cdf inverse of the standard normal distribution

* **PIT Residuals**: Adds random noise to quantile residuals from discrete distributions using the probabilty integral transform

]

.pull-right[
.img90[
![](data:image/png;base64,#static/dgamma2dnorm.png)
]]
 
Dunn & Smyth, 1996
---
# Pearson residuals - Issues
.large[**Pearson assumes independent standard normal**]

.pull-left-narrow[
.large[**Wrong**] **Pearson residual** 
`$r = (y - \mu)/sd(y)$` 
.img70[
![](data:image/png;base64,#static/pearson.png)]
]

.three-column[
**Gamma variance depends on mean**: 
`$y \sim Gamma(\alpha = 1, \beta = 2)$` 
`$var(y) = \mu\beta$` 
<center>
.img70[
![](data:image/png;base64,#static/dgamma.png)]
</center>
]

.pull-right-narrow[
.large[**Right**] **Quantile residual** 
`$r = qnorm(pgamma(y, 1, 2))$` 
.img80[
![](data:image/png;base64,#static/qqnorm_quantile.png)]
]

![](data:image/png;base64,#static/caution-small.png) Pearson residuals will invalidate correct models more often than expected when assumptions invalid 
---
# Pearson residuals - Issues
.large[**Pearson assumes identical and independently distributed data (iid)**]

<center>
.img50[
![](data:image/png;base64,#static/MN-resid.png)]
</center>

![](data:image/png;base64,#static/caution-small.png) Pearson residuals will invalidate correct models more often than expected when assumptions invalid

---
# Validating GLMMs

`\begin{align}
RE &\sim Normal(0, \sigma_{re}) \\
\mu &= FE + RE \\
y &\sim f(g(\mu)^{-1}, \sigma_{obs})
\end{align}`

.pull-left[
**Pearson residuals invalid** 
.p[
- Distribution is not approximately standard normal
- Observations are correlated
]]

.pull-right[
**Quantile residual issues** 
.p[
- Often no closed form expression to the joint cdf
- Still need to account for correlation in residuals
]]

---
# No closed form solution to the joint cdf
.large[ **Solutions: Approximations to the quantile resdiual**]
 
.p[ 
.pull-left[
**TMB R package**
- Analytical Calculations
- Laplace approximations to the cdf
- Bayesian simulations
- Methods: Full gaussian, one-step gaussian, one-step generic, cdf, MCMC via tmbstan
 
Kristensen et al., 2016 
Thygesen, U. et al., 2017 
Monnahan & Kristensen, 2018
]

.pull-right[
**DHARMa R package**
- Simulation-based approach
- Methods: Empirical cdf
 
Hartig, F., 2022
]
]
---
# Account for correlation in residuals

.large[**Solutions**: 
**1. Apply decorrelation transformation** 
**2. Rely on the Principle of Conditioning**]
.p[ 
.pull-left[
**TMB**
- Full gaussian: Cholesky decorrelation transformation
- One-step methods: Prequential conditioning
- Bayesian simulations: Random Effect conditioning
 
Kristensen et al., 2016 
Thygesen, U. et al., 2017 
Monnahan & Kristensen, 2018
]

.pull-right[
**DHARMa**
- Cholesky decorrelation transformation
- Random effect conditioning
 
Hartig, F., 2022
]
]

---
# TMB Methods

|Method          |Definition                                                                           |
|:---------------|:------------------------------------------------------------------------------------|
|FullGaussian    |Best approach when data and random effects are both normally distributed             |
|oneStepGaussian |Most efficient one-step method when data and random effects are approximately normal |
|cdf             |One-step method that does not require normality but does require a closed form cdf   |
|oneStepGeneric  |One-step method useful when no closed form cdf but slow                              |
* The cdf and oneStepGeneric methods are the only methods available for discrete data: randomized quantile residuals
* oneStepGeneric is dependent on Laplace approximations of likelihoods, so probably important to use TMB::checkConsistency.
* OSA residuals are challenging for some distributions (eg. multinomial, delta-lognormal, tweedie).

---
# Full Gaussian Method

Analytical calculation that assumes joint distribution of data and random effects is Gaussian. Applies rotation to transform to univariate space.

.pull-left-wide[
**Correlated MNV**

]
.pull-right-narrow[
 
`\begin{align}
y &\sim MVN(0, \Sigma) \\
\end{align}`
]
---
# Full Gaussian Method

Analytical calculation that assumes joint distribution of data and random effects is Gaussian. Applies rotation to transform to univariate space.

.pull-left-wide[
**Scale to unit variance**

<img src="data:image/png;base64,#static/demo_pairs2.png" width="75%" style="display: block; margin: auto;" />
]
.pull-right-narrow[
 
`\begin{align}
y &\sim MVN(0, \Sigma) \\
r &= y / sqrt(diag(\Sigma))
\end{align}`
]
---
# Full Gaussian Method

Analytical calculation that assumes joint distribution of data and random effects is Gaussian. Applies rotation to transform to univariate space. 
.pull-left-wide[
**Cholesky Transformation**

<img src="data:image/png;base64,#static/demo_pairs3.png" width="75%" style="display: block; margin: auto;" />
]
.pull-right-narrow[
 
`\begin{align}
y &\sim MVN(0, \Sigma) \\
L &= cholesky(\Sigma)^{T} \\
r &= L^{-1}y
\end{align}`
]

---
# TMB One-Step Methods
 
Steps through the data calculating quantile residuals for observation `$i$` given all previous observations:

`$r_{i} = \phi^{-1}(F(Y_{i}|Y_{1},\ldots, Y_{i-1},\Theta))$`

`$\phi^{-1}$`: inverse cdf of the standard normal

`$F(Y,\Theta)$`: cdf of `$f(Y,\Theta)$`

* These residuals are independent standard normal when the model is appropriate for the data

* For non-random effects models these are equivalent to more familiar quantile residuals used with GLMs because
`$F(Y_{i}|Y_{1},\ldots, Y_{i-1},\Theta) = F(Y_{i}|\Theta)$`

|Method |Definition |
|:---------------|:------------------------------------------------------------------------------------|
|oneStepGaussian |Most efficient one-step method when data and random effects are approximately normal |
|cdf |One-step method that does not require normality but does require a closed form cdf |
|oneStepGeneric |One-step method useful when no closed form cdf but slow |
---
# TMB One-Step Methods

.pull-left[
C++ code needs to be modified:

```cpp
DATA_VECTOR(y);
DATA_VECTOR_INDICATOR(keep, y);
...
for(int i=0; i < N.size(); i++){
 nll -= keep[i] * dpois(y[i], exp(eta[i]), true);
 /* For cdf OSA residuals only: */
 Type cdf = squeeze( ppois(y[i], exp(eta[i])) );
 nll -= keep.cdf_lower[i] * log( cdf ); // NaN protected
 nll -= keep.cdf_upper[i] * log( 1.0 - cdf ); // NaN protected
}

```
]

.pull-right[
Call from R:

```r
#TMB model
obj <- MakeADFun(data, parameters, 
 random = "u")
opt <- nlminb(obj$par, obj$fn, obj$gr)

#TMB residual calculation for continuous data
oneStepPredict(obj, observation.name = "y",
               data.term.indicator = "keep",
               method = "oneStepGaussian")

#TMB residual calculation for discrete data
oneStepPredict(obj, observation.name = "y",
 data.term.indicator = "keep",
 method = "cdf", discrete = TRUE)
```
]
---
# TMB MCMC Method 
Bayesian simulation approach
.pull-left[
**Simplified algorithm**
.p[
1. Map fixed parameters to their MLEs
2. Create a new object
3. Draw a single posterior from an MCMC chain
4. Use the posterior random vector to recalculate the expected value and plug into cdf calculations
5. Relies on conditional independence rather than rotation
]
]

.pull-right[

```r
#TMB model
obj <- MakeADFun(data, parameters, 
 random = "u")
opt <- nlminb(obj$par, obj$fn, obj$gr)

#Fix estimated values to their MLEs
map <- c(parameters = factor(NA)*length(opt$par),
 u = factor(NA)*length(u))
#Draw one posterior from a tmbstan run
obj2 <- MakeADFun(..., map = map)
sample <- extract( tmbstan(obj2, iter = warmup+1) )$u

#residuals using posterior u
Fy <- pnorm(y, u, sigma)
r <- qnorm(Fy)
```
]
---
# ecdf method with DHARMa

.pull-left[

|Method |Definition |
|:------------------|:------------------------------------------------------------------|
|Conditional ecdf |Simulate new observations conditional on the fitted random effects |
|Unconditional ecdf |Simulate new observations given new simulated random effects |
]
.pull-right[
<img src="data:image/png;base64,#static/dharma-ecdf.png" width="100%" style="display: block; margin: auto;" />
.small[source: Florian Hartig, DHARMa package]
]

---
# ecdf method with DHARMa 
.large[**Validate the data and random effect model simultaneously**]

.pull-left-narrow[
**DHARMa**
- **Unconditional**
]
.pull-right-wide[
 
Empircal cdf approximation using simulated random effects and rotated data (optional)

```r
#Simulate new random effects and data use MLEs
u.sim <- rnorm(mu, tau)
y.sim <- rnorm(u.sim, sigma)

#cholesky decomposition based on estimated Sigma
L <- t(chol(Sigma))

#rotate simulated data and observations
y.sim <- apply(y.sim, 2, function(x) solve(L,x))
y <- solve(L, y)

#empirical cdf on rotated data and simulation
r <- ecdf(y.sim)(y)
```

]
---
# ecdf method with DHARMa 
.large[**Validate the data and random effect model simultaneously**]

.pull-left-narrow[
**DHARMa**
- **Unconditional**
]

.pull-right-wide[
 
Empircal cdf approximation using simulated random effects and rotated data (optional)

```r
#TMB model
obj <- MakeADFun(data, parameters, 
 random = "u")
opt <- nlminb(obj$par, obj$fn, obj$gr)

obj$env$data$simRE <- 1
y.sim <- obj$simulate()$y

#DHARMa residual function:
createDHARMa(y.sim, y, rotation = "estimated")
```

]
---
# ecdf method with DHARMa 
.large[**Validate the data model conditional on random effect MLEs**]

.pull-left-narrow[
**DHARMa**
- Unconditional
- **Conditional**
]
.pull-right-wide[
 
Empircal cdf approximation using estimated random effects

```r
#Simulate data using MLEs
y.sim <- rnorm(u, sigma)

#TMB model
obj <- MakeADFun(data, parameters, 
 random = "u")
opt <- nlminb(obj$par, obj$fn, obj$gr)

obj$env$data$simRE <- 0
y.sim <- obj$simulate()$y

#empirical cdf on conditionally iid data
r <- ecdf(y.sim)(y)
```
]
---
#Simulation Study

![](data:image/png;base64,#static/study-overview.png)
---
# Linear Regression example 
When models do not contain random effects, quantile residual approximations from a normal model should collapse to Pearson

.pull-left-narrow[
`\begin{align}
\mu &= X\beta\\
y &\sim N(\mu, \sigma)
\end{align}`
]
.pull-right-wide[
<img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /><img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-15-2.png" style="display: block; margin: auto;" />
]
---
# Linear Regression example 
When models do not contain random effects, quantile residual approximations from a normal model should collapse to Pearson

.pull-left-narrow[
**Mis-specified model**: 
`\begin{align}
OM:&\\
y &= X\beta + \epsilon\\
\epsilon &\sim LN(0,1)\\
EM:&\\
y &= X\beta + \epsilon\\
\epsilon &\sim N(0,1)
\end{align}`
]
.pull-right-wide[
<img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /><img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-17-2.png" style="display: block; margin: auto;" />
]

---
# Random walk example 
When models contain correlation, quantile residual approximations outperform Pearson

.pull-left-narrow[
`\begin{align}
\mu_{i} &= u_{i-1} + a \\
u_{i} &\sim N(\mu_{i},\tau) \\
y_{i} &\sim N(u_{i}, \sigma) 
\end{align}`

`\begin{align}
a &= 0.75 \\
\tau &= 1 \\ 
\sigma &= 1 
\end{align}`
]
.pull-right-wide[
<img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /><img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-19-2.png" style="display: block; margin: auto;" />
]
---
# Random walk example 
When models contain correlation, quantile residual approximations outperform Pearson

.pull-left-narrow[
**Mis-specified model**: 
`\begin{align}
OM:&\\
\mu_{i} &= u_{i-1} + a \\
u_{i} &\sim N(\mu_{i},\tau) \\
y_{i} &\sim N(u_{i}, \sigma) 
\end{align}`

`\begin{align}
EM:&\\
\mu_{i} &= u_{i-1} \\
u_{i} &\sim N(\mu_{i},\tau) \\
y_{i} &\sim N(u_{i}, \sigma) 
\end{align}`
]
.pull-right-wide[
<img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /><img src="data:image/png;base64,#04_TMBvalidation_files/figure-html/unnamed-chunk-21-2.png" style="display: block; margin: auto;" />
]

---

# Guidelines in Progress

- Models with correlated random effect structure need to account for correlation in residuals

- TMB's Full Gaussian method is preferred if data and RE are normal
  - DHARMa rotation needs to be implemented. This method only works when the observations are approximately normal.
  - TMB's one-step methods account for correlation through conditioning 
  - DHARMa's conditional methods account for correlation through conditioning but may underperform one-step methods
  - MCMC approach accounts for correlation but may lack power to detect mis-specification

- TMB limitations:

- One-step methods can be difficult to use when data are a mixture of continuous and discrete observations
  
- DHARMa limitations:

- Estimating the data covariance matrix is more prone to error when data are not normal