Skip to contents

Overview

This vignette describes the likelihood and parameterisation used in the STAGE model.

STAGE is a hierarchical Bayesian generative model designed to estimate a transition point m50m_{50} on variable xx between two classes—for example, the length at which individuals transition from immature to mature. The model produces a posterior distribution for m50m_{50}, representing the value of xx at which the probabilities Pr(y=0)\Pr(y=0) and Pr(y=1)\Pr(y=1) are equal. The model can also be used to derive other useful quantities (such as class probabilities Pr(y=1x)\Pr(y = 1 \mid x)).

Conceptually, STAGE is a Bayesian generative classifier for estimating a transition point (e.g. length at maturity). Each class is fit with an asymmetric plateau–Gaussian density:

  • the lower class has a uniform plateau followed by a Gaussian tail down into the transition,
  • the higher class has a Gaussian tail up out of the transition followed by a uniform plateau.

This construction focuses inference on the overlap region and prevents distant observations (e.g. very small immatures, very large matures) from unduly influencing the transition point.

The central idea is to model each class with a piecewise density that combines a uniform plateau in regions far from the transition, and a Gaussian tail in the transition region. The left group (y=0y = 0) is modelled with a uniform on the left that begins at the lower truncation value LL. The distribution remains flat until it approaches the transition range, at which point the distribution declines according to a Gaussian. The right group (y=1y = 1) is a mirror image of the left—it increases according to a Gaussian distribution and then levels off to a uniform, terminating at the upper truncation value UU.

The model focuses inference on the transition zone, where the values of xx of the two classes overlap, and down-weights observations that are biologically uninformative because they are far away from the transition point.

The uniform–Gaussian pdf used here is adapted from a plateau proposal distribution combining a uniform and two Gaussians described by Lau & Krumscheid (2022).

Model summary

We observe pairs (xi,yi)(x_i, y_i) for i=1,,Ni = 1, \dots, N, where

  • xix_i is a continuous covariate (e.g., length), truncated to an interval [L,U][L, U], and
  • yi{0,1}y_i \in \{0, 1\} is a binary indicator of state (e.g., immature / mature).

The model assumes that the distribution of xx conditional on state is piecewise:

  • For y=0y = 0 (immature individuals), the distribution of xx is approximately uniform up to a point μ0\mu_0, after which it follows a Gaussian decay.
  • For y=1y = 1 (mature individuals), the distribution is Gaussian below a point μ1\mu_1, and uniform above μ1\mu_1 up to the upper truncation UU.

The transition point m50m_{50} is the value of xx where the probability of observing either class is equal:

Pr(y=0x=m50)=Pr(y=1x=m50). \Pr(y = 0 \mid x = m_{50}) = \Pr(y = 1 \mid x = m_{50}).

Additional parameters govern the width and sharpness of the transition:

  • dd controls the distance between μ0\mu_0 and μ1\mu_1 (the width of the transition region),
  • σ\sigma (denoted sigma_x in the Stan code) controls the spread (standard deviation) of the Gaussian tails.

By allowing different behaviours on either side of the transition, the model can capture asymmetric transition patterns: the immature class can decline towards the transition at a different rate than the mature class rises out of it.

This is particularly useful in settings where:

  • the boundary between classes is not sharp, and
  • the data exhibit gradual overlap between the two classes.

Likelihood

The likelihood is built from two unnormalised log-density functions for the two classes. We work on the log scale for numerical stability and then subtract log-normalising constants to obtain valid densities.

Let 0(x)\ell_0(x) and 1(x)\ell_1(x) denote the unnormalised log-densities for y=0y = 0 and y=1y = 1, respectively. The corresponding (unnormalised) densities are f̃k(x)=exp{k(x)}\tilde f_k(x) = \exp\{\ell_k(x)\}.

We restrict attention to the truncated interval [L,U][L, U]; the densities are zero outside this range.

Lower class (y=0y = 0)

For yi=0y_i = 0, the unnormalised piecewise log-density is

0(xiμ0,σ,L)={0if Lxiμ0,(xiμ0)22σ2if xi>μ0. \ell_0(x_i \mid \mu_0, \sigma, L) = \begin{cases} 0 & \text{if } L \leq x_i \leq \mu_0, \\ -\dfrac{(x_i - \mu_0)^2}{2\sigma^2} & \text{if } x_i > \mu_0 . \end{cases}

Here:

  • the plateau region [L,μ0][L, \mu_0] has constant log-density 0 (density 1),
  • the tail region xi>μ0x_i > \mu_0 decays according to a Gaussian kernel.

Any constant could be added to 0()\ell_0(\cdot) without changing the posterior; we fix the plateau at 0 so that the plateau density is 1 after exponentiation.

Higher class (y=1y = 1)

For yi=1y_i = 1, the unnormalised log-density is

1(xiμ1,σ,U)={(xiμ1)22σ2if xi<μ1,0if μ1xiU. \ell_1(x_i \mid \mu_1, \sigma, U) = \begin{cases} -\dfrac{(x_i - \mu_1)^2}{2\sigma^2} & \text{if } x_i < \mu_1, \\ 0 & \text{if } \mu_1 \leq x_i \leq U . \end{cases}

Now:

  • the Gaussian tail is below μ1\mu_1, and
  • the plateau covers the upper region [μ1,U][\mu_1, U].

In both cases we have a plateau–Gaussian shape: a flat density where the class “dominates” and a Gaussian tail in the transition region where the other class is present.

Normalisation constants

To convert f̃k(x)=exp{k(x)}\tilde f_k(x) = \exp\{\ell_k(x)\} into proper probability density functions on [L,U][L, U], we divide by constants C0C_0 and C1C_1 chosen so that the densities integrate to 1.

For the immature class (y=0y = 0):

C0=Lμ01dx+μ0exp((xμ0)22σ2)dx=(μ0L)+2πσ2. C_0 = \int_L^{\mu_0} 1 \, dx + \int_{\mu_0}^{\infty} \exp\!\left( -\frac{(x - \mu_0)^2}{2\sigma^2} \right) dx = (\mu_0 - L) + \frac{\sqrt{2\pi}\,\sigma}{2}.

For the mature class (y=1y = 1):

C1=μ1exp((xμ1)22σ2)dx+μ1U1dx=2πσ2+(Uμ1). C_1 = \int_{-\infty}^{\mu_1} \exp\!\left( -\frac{(x - \mu_1)^2}{2\sigma^2} \right) dx + \int_{\mu_1}^{U} 1 \, dx = \frac{\sqrt{2\pi}\,\sigma}{2} + (U - \mu_1).

Intuitively:

  • (μ0L)(\mu_0 - L) and (Uμ1)(U - \mu_1) are the widths of the uniform plateaus,
  • 2πσ/2\sqrt{2\pi}\,\sigma / 2 is the contribution from the half-Gaussian tail.

The corresponding log-likelihood contribution for observation ii is

logi={0(xi)logC0if yi=0,1(xi)logC1if yi=1. \log \mathcal{L}_i = \begin{cases} \ell_0(x_i) - \log C_0 & \text{if } y_i = 0, \\ \ell_1(x_i) - \log C_1 & \text{if } y_i = 1. \end{cases}

Summing over all observations,

logp(x,ym50,d,σ)=i=1N[yi(xi)logCyi], \log p(x, y \mid m_{50}, d, \sigma) = \sum_{i=1}^{N} \big[ \ell_{y_i}(x_i) - \log C_{y_i} \big],

which is exactly the likelihood implemented in the Stan code used by fit_stage().

Prior specification

The priors reflect prior beliefs about plausible transition points and transition widths in typical length-at-maturity problems. In practice, these are defaults that should be adapted to the scale of the data and prior biological knowledge.

By default, STAGE uses weakly informative normal priors:

  • Transition point m50m_{50} (Stan parameter m50):

m50𝒩(1300,50), m_{50} \sim \mathcal{N}(1300, 50),

which centres the transition point around 1300 (mm) with moderate uncertainty.

  • Transition width dd:

d𝒩(100,100), d \sim \mathcal{N}(100, 100),

which allows the width of the overlap region to vary around 100 units with a wide spread. Note that dd is the distance between the class-specific Gaussian centres.

  • Gaussian spread σ\sigma (Stan parameter sigma_x):

σ𝒩(100,50), \sigma \sim \mathcal{N}(100, 50),

representing prior beliefs about the scale of variability in the transition tails.

Because dd and sigma_x are constrained to be positive in the Stan model, their Normal priors are effectively half-Normal distributions (the prior mass below zero is folded onto the positive half).

In practice, you should adapt these priors to your scale (e.g., lengths, ages) and prior biological knowledge. The STAGE implementation allows you to pass in hyperparameters via helper functions such as stage_priors().

Transformed parameters

The Gaussian “centres” for the two classes are defined in terms of the transition point m50m_{50} and the width parameter dd:

μ0=m50d2,μ1=m50+d2. \mu_0 = m_{50} - \frac{d}{2}, \qquad \mu_1 = m_{50} + \frac{d}{2}.

Thus:

  • dd controls the distance between the immature and mature Gaussian means,
  • m50m_{50} is the midpoint between them.

From class densities to class probabilities (Bayes rule)

The likelihood above defines the class-conditional densities f0(x)=p(xy=0)f_0(x) = p(x \mid y = 0) and f1(x)=p(xy=1)f_1(x) = p(x \mid y = 1) on [L,U][L, U], obtained by normalising the unnormalised densities f̃k(x)=exp{k(x)}\tilde f_k(x) = \exp\{\ell_k(x)\} with the constants CkC_k.

To obtain the probability that an individual with a particular xx value is in the higher class (y=1y = 1), we use Bayes rule:

Pr(y=1x)=π1f1(x)π0f0(x)+π1f1(x), \Pr(y = 1 \mid x) = \frac{\pi_1 f_1(x)}{\pi_0 f_0(x) + \pi_1 f_1(x)},

and similarly

Pr(y=0x)=π0f0(x)π0f0(x)+π1f1(x), \Pr(y = 0 \mid x) = \frac{\pi_0 f_0(x)}{\pi_0 f_0(x) + \pi_1 f_1(x)},

where π0\pi_0 and π1\pi_1 are the prior class probabilities (e.g., prior probabilities of being immature or mature before seeing xx).

In many applications, such as estimating length-at-maturity, we are interested in a transition point that is not driven by sample sizes or prevalence, so it is natural to use equal class priors, π0=π1=1/2\pi_0 = \pi_1 = 1/2. In that case, Bayes rule simplifies to

Pr(y=1x)=f1(x)f0(x)+f1(x),Pr(y=0x)=f0(x)f0(x)+f1(x). \Pr(y = 1 \mid x) = \frac{f_1(x)}{f_0(x) + f_1(x)}, \qquad \Pr(y = 0 \mid x) = \frac{f_0(x)}{f_0(x) + f_1(x)}.

These are the class probabilities that the STAGE model uses for prediction, and they are what you see when you plot Pr(y=1x)\Pr(y = 1 \mid x) as a function of xx.

Connection to m50m_{50}

By definition, the transition point m50m_{50} is the value of xx where the two classes are equally probable:

Pr(y=0x=m50)=Pr(y=1x=m50)=12. \Pr(y = 0 \mid x = m_{50}) = \Pr(y = 1 \mid x = m_{50}) = \frac{1}{2}.

Using Bayes rule with equal class priors, this equality is equivalent to

f0(m50)=f1(m50), f_0(m_{50}) = f_1(m_{50}),

i.e. the two normalised class densities are equal at m50m_{50}.

Why m50m_{50} is the midpoint between μ0\mu_0 and μ1\mu_1. In the transition region (μ0<x<μ1\mu_0 < x < \mu_1), both unnormalised densities take Gaussian form:

f̃0(x)=exp((xμ0)22σ2),f̃1(x)=exp((xμ1)22σ2). \tilde f_0(x) = \exp\!\left(-\frac{(x - \mu_0)^2}{2\sigma^2}\right), \qquad \tilde f_1(x) = \exp\!\left(-\frac{(x - \mu_1)^2}{2\sigma^2}\right).

Since both Gaussians share the same σ\sigma, the unnormalised densities are equal at the arithmetic midpoint x*=(μ0+μ1)/2=m50x^* = (\mu_0 + \mu_1)/2 = m_{50}. With equal class priors, the normalising constants C0C_0 and C1C_1 cancel in the likelihood ratio, so the 50% crossing point is exactly m50m_{50} regardless of the survey bounds LL and UU. This is the key invariance property: changing the sampling domain does not shift the estimated transition point.

Visualising a single STAGE transition

We can visualise a simple STAGE configuration, showing:

  • the immature and mature densities, and
  • the resulting transition probability Pr(y=1x)\Pr(y=1 \mid x).

Relationship to the stage implementation

In the stage package:

  • The single-population model directly samples m50 (m50m_{50}), d, and sigma_x (σ\sigma). The transition point is a primary sampling parameter, not a derived quantity.
  • The multi-population (hierarchical) model introduces population-specific transition points via a random-effects structure: 𝚖𝟻𝟶_𝚙𝚘𝚙[j]=m50+zjσα, \texttt{m50\_pop}[j] = m_{50} + z_j \, \sigma_\alpha, where m50m_{50} is the global mean transition point, zjz_j are standard-normal effects, and σα\sigma_\alpha is the population-level standard deviation.

The Stan code used in fit_stage() is a direct implementation of the likelihood and parameterisation described above.

Future vignettes will provide a more detailed walk-through of the Stan code and extensions such as alternative priors and model comparison.

References

Lau, F. Din-Houn, and Sebastian Krumscheid. 2022. “Plateau Proposal Distributions for Adaptive Component-Wise Multiple-Try Metropolis.” METRON 80 (3): 343–70. https://doi.org/10.1007/s40300-022-00235-y.