Fitting the STAGE Model

This vignette shows how to:

fit the STAGE model to a simple simulated dataset,
extract posterior transition points, and
extend to a hierarchical (multi-population) model.

Simulating data

To demonstrate the workflow, we simulate a simple dataset of lengths in $[L, U]$ with a smooth transition around a true $m_{50}$ .

N <- 200
L <- 1000
U <- 1500
true_m50 <- 1250
true_d   <- 100

# Simulate covariate
x <- runif(N, L, U)

# Smooth transition in P(mature)
p <- plogis((x - true_m50) / (true_d / 4))

# Binary maturity indicator
y <- rbinom(N, 1, p)

head(tibble(x = x, y = y))
#> # A tibble: 6 × 2
#>       x     y
#>   <dbl> <int>
#> 1 1133.     0
#> 2 1186.     0
#> 3 1286.     1
#> 4 1454.     1
#> 5 1101.     0
#> 6 1449.     1

We now fit the STAGE model using fit_stage(). Because STAGE is a Bayesian model fit via Stan, we keep chains and iter small in this vignette to keep runtime acceptable; for real applications you should increase these.

fit <- fit_stage(x, y, L = L, U = U, chains = 2, iter = 1000)
fit

You can inspect the posterior with standard tools such as summary(fit$fit), bayesplot, or posterior utilities if you want more diagnostics.

Extracting the transition point

The helper transition_point() returns posterior summaries of the global transition point $m_{50}$ .

transition_point(fit)

For a single-population fit, this returns a list with a single element global: a named numeric vector with posterior mean, median, q2.5, and q97.5.

Predicting maturity probabilities

We can also predict $\Pr(\text{mature} \mid x)$ over a grid of lengths, and plot the posterior mean transition curve.

x_grid <- seq(L, U, length.out = 200)

p_hat <- predict(fit, x_grid, type = "prob")

plot(
  x_grid, p_hat, type = "l",
  xlab = "Length (x)",
  ylab = "P(mature)",
  las = 1
)

Hierarchical (multi-population) model

In many applications we have multiple populations (e.g. regions, stocks, or sexes) and we expect:

a shared global transition point $m_{50}$ , and
population-specific deviations.

STAGE implements this using a hierarchical structure $\texttt{m50\_pop}[j] = m_{50} + z_j \,\sigma_\alpha,$ with a non-centred parameterisation for efficient sampling.

A simple two-group example:

group <- rep(c("A", "B"), each = N/2)

fit2 <- fit_stage(
  x, y,
  group = group,
  L = L, U = U,
  chains = 2, iter = 1000
)

transition_point(fit2)

transition_point(fit2) returns a list with:

global: posterior summary of the global $m_{50}$ (mean, median, q2.5, q97.5), and
pop1, pop2, …: population-specific entries with summaries of m50_pop[j].

Summary

In this vignette we:

introduced the plateau–Gaussian likelihood underlying STAGE,
visualised the class densities and resulting transition probabilities,
simulated a simple dataset,
fit the STAGE model to estimate the transition point, and
extended the model to a hierarchical multi-population setting.

In practice, you will want to:

choose $[L, U]$ to capture the transition zone plus some plateau on each side,
check posterior diagnostics (divergences, R-hat, effective sample sizes),
and interpret $m_{50}$ and its uncertainty in the biological context.

Future vignettes will expand on:

comparing STAGE to logistic regression and Gaussian LDA, and
sensitivity to the choice of $[L, U]$ and prior settings.