Lecture 20: Latent variables and Rank Likelihood

1 Data: General Social Survey

In this dataset we will be looking at an ordinal variable: educational attainment.

load("../data/socmob.RData") 
head(socmob)

  INCOME DEGREE CHILDREN PINCOME PDEGREE PCHILDREN AGE
1     NA      1        3       3       1         5  59
2     11      0        3      NA       0         7  59
3      8      1        1      NA       0         9  25
4     25      3        2      NA       0         5  55
5    100      3        2       4       3         2  56
6     40      4        0      NA       4         5  36

\(\mathrm{DEG}_{i}=\) highest degree obtained by individual \(i\)
\(\mathrm{CHILD}_{i}=\) number of children they have,\
\(\mathrm{PDEG}_{i}=\) binary, whether or not either parent of \(i\) obtained a college degree.

Question: how is educational attainment related to the number of children and the education of the parents?

2 Regression

Q: Why Can’t We Just Use Linear Regression?

0 = no high school
1 = high school,
2 = associate’s, 3 = bachelor’s,
4 = graduate degree.

\[ \mathrm{DEG}_{i}=\beta_{1}+\beta_{2} \times \mathrm{CHILD}_{i}+\beta_{3} \times \mathrm{PDEG}_{i}+\beta_{4} \times \mathrm{CHILD}_{i} \times \mathrm{PDEG}_{i}+\varepsilon_{i}, \]

\[ \\[5cm] \]

2.1 Latent variable formulation

Imagine there’s a hidden continuous variable \(Z\) representing educational propensity or academic achievement tendency.

\[ \begin{aligned} \varepsilon_{1}, \ldots, \varepsilon_{n} & \sim \text { i.i.d. } \operatorname{normal}(0,1) \\ Z_{i} & = \underline{\beta}^{T} \underline{x}_{i}+\varepsilon_{i} \\ Y_{i} & =g\left(Z_{i}\right)\\ \underline{x}_{i}&=\left(\mathrm{CHILD}_{i}, \mathrm{PDEG}_{i}, \mathrm{CHILD}_{i} \times \mathrm{PDEG}_{i}\right) \end{aligned} \]

3 Approximating the joint posterior

\[ \underline{\beta}, g_{1}, \ldots, g_{K-1}, Z_{1}, \ldots, Z_{n}\mid \underline{y}, \underline{X} \]

3.1 Full conditional distribution of \(\underline{\beta}\)

\[ p( \underline{\beta} \mid \underline{y}, \underline{z}, \underline{g}) \ =\ p( \underline{\beta} \mid \underline{z}) \propto p( \underline{\beta})\, p( \underline{z} \mid \underline{\beta}) \] \[ \begin{aligned} \text{Prior: }\qquad \underline{\beta} &\sim MVN\left(\mathbf{0}, n\left(\mathbf{X}^{T} \mathbf{X}\right)^{-1}\right)\\ \text{Full conditional: } \underline{\beta}\mid \underline{z} &\sim MVN\left(\mathbf{m}, \mathbf{V}\right)\\ \mathbf{V}&=\operatorname{Var}[ \underline{\beta} \mid \underline{z}] =\frac{n}{n+1}\left(\mathbf{X}^{T} \mathbf{X}\right)^{-1}, \\ \mathbf{m}&=\mathrm{E}[ \underline{\beta} \mid \underline{z}] =\frac{n}{n+1}\left(\mathbf{X}^{T} \mathbf{X}\right)^{-1} \mathbf{X}^{T} \underline{z} \end{aligned} \]

3.2 Full conditional distribution of \(\underline{Z}\)

\[ \begin{aligned} p\left(z_{i} \mid \underline{\beta}\right) &\propto \operatorname{dnorm}\left(z_{i}, \underline{\beta}^{T} \underline{x}_{i}, 1\right)\\ p\left(z_{i} \mid \underline{\beta}, \underline{y}, \underline{g}\right) &\propto \operatorname{dnorm}\left(z_{i}, \underline{\beta}^{T} \underline{x}_{i}, 1\right) \times \delta_{(a, b)}\left(z_{i}\right) \end{aligned} \]

3.3 Full conditional distribution of \(\underline{g}\)

Once we have \(\underline{Z}\), the values for the thresholds \(g_1,\ldots,g_{K-1}\) are constrained by the observed data \(\underline{y}\) and the latent variables \(\underline{Z}\).

Constraints: \[ g_k > z_i = g^{-1}(y_i=k) \]

\[ g_k < z_i = g^{-1}(y_i=k+1) \]

\[ \mathrm{Support}\{ \underline{g}\}=\{ \underline{g}:a_k<g_k<b_k\} \]

\[ p\left(g_{k} \mid \underline{\beta}, \underline{y}, \underline{g_{-k}}\right) \propto \operatorname{dnorm}\left(g_{k}, \mu_k, \sigma_k\right) \times \delta_{(a_k, b_k)}\left(g_{k}\right) \]

4 Data: Educational Attainment

Remember what the coefficients \(\underline\beta\) represent in this model: the effect of the predictors on the latent variable \(Z\), which in turn determines the observed ordinal variable \(Y\) through the thresholds \(g_k\).

Results from the probit regression analysis.

5 Rank Likelihood: a simpler approach

The ordinal probit model served us well, but it required estimating the threshold parameters \(g_1, \ldots, g_{K-1}\). What if we don’t care about these thresholds? What if we only want to understand how predictors relate to outcomes?

Insight: the likelihood of \(\underline{\beta}\) depends only on the ranking of the \(Z_i\)’s, not on their actual values.

Idea: Just use the ranking information and forget about estimating thresholds. We do not estimate \(g_1,\ldots,g_{K-1}\)!

\[ \begin{aligned} p( \underline{\beta} \mid \underline{Z} \in R( \underline{y})) & \propto p( \underline{\beta}) \times \operatorname{Pr}( \underline{Z} \in R( \underline{y}) \mid \underline{\beta}) \\ & =p( \underline{\beta}) \times \int_{R( \underline{y})} \prod_{i=1}^{n} \operatorname{dnorm}\left(z_{i}, \underline{\beta}^{T} \underline{x}_{i}, 1\right) d z_{i} \end{aligned} \]

\[ R( \underline{y})=\left\{z \in \mathbb{R}^{n}: z_{i_{1}}<z_{i_{2}} \text { if } y_{i_{1}}<y_{i_{2}}\right\} \]

\[ \\[8cm] \]

Marginal posterior distributions of \(\left(\beta_{1}, \beta_{2}, \beta_{3}\right)\), under the ordinal probit regression model (in gray) and the rank likelihood (in black).

5.1 Summary: Choosing Your Model

When deciding between the Ordinal Probit and Rank Likelihood approaches, consider the trade-off between granularity (knowing the cut-points) and robustness (making fewer assumptions).

The Convergence of Results

When the number of categories \(K\) is small and the sample size \(n\) is large, the two models typically yield nearly identical posterior distributions for \(\underline{\beta}\). As \(n \to \infty\), the information provided by the rank constraints \(R(\underline{y})\) becomes equivalent to the information provided by the estimated thresholds \(g_k\).

Comparison Matrix

Feature	Ordinal Probit	Rank Likelihood
Primary Goal	Prediction of specific categories.	Understanding covariate effects \((\underline{\beta})\).
Parameters	Estimates \(\underline{\beta}\) and thresholds \(g_k\).	Estimates \(\underline{\beta}\) only.
Assumptions	Assumes \(g_k\) are fixed, meaningful boundaries.	Assumes only the order of \(Y\) is meaningful.
Flexibility	Rigid; sensitive to how categories are defined.	“Transformation-invariant” to the scale of \(Y\).

Decision Guide

Choose Ordinal Probit if:

Thresholds Matter: You need to interpret the “distance” between categories (e.g., how much harder is it to get a Graduate degree vs. a Bachelor’s?).
Predictive Probabilities: You need to calculate \(Pr(Y_i = k \mid \mathbf{x}_i)\).
Data Structure: You have few categories (small \(K\)) and a massive dataset.

Choose Rank Likelihood if:

Scale Invariance: The categories are somewhat arbitrary (e.g., “Satisfied” vs. “Very Satisfied”) and you don’t want your \(\beta\) estimates to depend on the specific labeling.
Complexity: \(K\) is very large (approaching a continuous scale), making the estimation of \(K-1\) thresholds computationally expensive or unstable.
Robustness: You want a “semi-parametric” feel—focusing purely on the relationship between predictors and the latent propensity \(Z\).

Key Takeaway:

Rank likelihood is essentially the “lazy” (but often more robust) cousin of the ordinal probit. It skips estimating \(g\) while still giving you the same inference on your predictors.