Summary: Marriage and the Intergenerational Mobility of Women

Eriksson, Niemesh, Rashid & Craig (NBER WP 34821, February 2026)

Overview

This paper asks how assortative mating — the tendency to marry within one’s socioeconomic group — changed in the United States between 1850 and 1920, and what those changes meant for women’s intergenerational economic mobility. The central finding is that women’s economic mobility improved substantially during this period, well before married women gained broad access to the labor market. The authors argue this improvement was driven largely by a decline in assortative mating, which weakened the link between a woman’s family background and the economic status of the man she married.

The paper makes three principal contributions: (1) new direct intergenerational links for women constructed from Massachusetts marriage registers; (2) a structural model that recovers an unobserved assortative mating parameter from observable data; and (3) counterfactual analyses that quantify the contribution of changing spousal sorting to trends in women’s mobility.

Data

The authors construct a novel two-generation linked dataset from Massachusetts marriage registers (1850–1914), digitized by FamilySearch, covering over 1.2 million marriage certificates. A key innovation is that these registers record women’s birth surnames, making it possible to link women across censuses and generations despite surname changes at marriage — the fundamental obstacle to tracking women historically.

The record-linkage proceeds in two steps:

  1. Marriage register → post-marriage adult census: couples are matched using names and birth years. This yields 294,105 matched couples (24% match rate).
  2. Adult census → pre-marriage childhood census: each spouse is then individually linked back to their childhood household to observe their father’s occupation. This step benefits from the marriage register’s listing of parents’ names, information unavailable in standard census-to-census links. This step yields ~91,000 father-son pairs and ~95,000 father-daughter pairs.

For the sorting analysis, both spouses must be successfully linked to their fathers, yielding 38,760 couples (3% overall match rate). The algorithm is supervised machine learning (logistic regression, following Feigenbaum 2016), trained on hand-linked records, with hyperparameters chosen to keep false positives below 10% while maximizing the true positive rate. Cross-validation confirms false positive rates of 11% (step 1) and 8% (step 2), with true positive rates of 77% and 93%, respectively.

Economic status is measured using an occupational wealth score derived from total property values reported in the 1870 census, stratified by occupation, census region, and immigrant status. Individuals are then ranked within the cohort-specific national wealth distribution. This produces a percentile rank for each father and adult child, which forms the basis of all regression analyses.

The Structural Model

This is the methodological centerpiece of the paper. Because married women rarely had recorded occupations in 19th-century censuses, the direct spousal correlation in economic status — the standard definition of assortative mating — cannot be observed. The authors adopt the framework of Espín-Sánchez, Ferrie, and Vickers (2023) to recover this unobserved parameter from three observable correlations.

The Three Observable Correlations

Let \(X^h_i\) denote the husband’s occupational wealth rank, \(X^f_i\) his father’s rank, and \(X^{fl}_i\) his wife’s father’s rank (father-in-law). The variables are normalized before entering the structural model — the paper explicitly states that “status variables in the structural model are normalized and interpreted as correlations.”

The three empirical relationships estimated from the data are:

\[b^h_f = E[X^h_i \cdot X^f_i]\] \[b^h_{fl} = E[X^h_i \cdot X^{fl}_i]\] \[b^f_{fl} = E[X^f_i \cdot X^{fl}_i]\]

These capture, respectively:

  • Men’s intergenerational mobility (son on father),
  • Women’s measured intergenerational mobility (husband on wife’s father — the “father-in-law correlation”), and
  • A proxy for marital sorting (husband’s father on wife’s father).

The first two are standard rank-rank mobility estimates. The third is sometimes used in the literature as a proxy for assortative mating, but the authors show this is problematic.

The Structural Inheritance Equations

The model assumes that status is transmitted across generations via:

\[X^h_i = \beta_f X^f_i + \beta_m X^m_i + e^h_i\] \[X^w_i = \beta_{fl} X^{fl}_i + \beta_{ml} X^{ml}_i + e^w_i\]

where \(\beta_f\) and \(\beta_m\) capture inheritance from the husband’s father and mother respectively, and \(\beta_{fl}\), \(\beta_{ml}\) do likewise for the wife. By the model’s symmetry assumption (equal inheritance parameters for men and women), \(\beta_{fl} = \beta_f\) and \(\beta_{ml} = \beta_m\).

These two equations model status transmission separately for husband and wife, which is necessary because both appear in the data. The husband’s status X^h is observed (his occupation in the census). The wife’s status X^w is unobserved but enters the model as a latent variable whose relationship to observable quantities must be structured. Without an equation for X^w, you cannot write down what ρ = E[X^h · X^w] depends on in terms of observables. The symmetry assumption — that β_fl = β_f and β_ml = β_m, i.e., the intergenerational inheritance parameters are the same for men and women, and the same regardless of whether it is the father or mother transmitting status — is the identifying restriction that makes the system tractable. It is a strong assumption, and the paper acknowledges it. The justification is essentially that in this historical period there is no way to separately identify gender-specific inheritance without data on women’s own economic outcomes, which is precisely what is missing. The symmetry assumption allows the two unobserved inheritance parameters for the wife to be replaced by the same β_f and β_m estimated from the husband’s side, collapsing the problem to three unknowns rather than six. It is discussed in the paper as Proposition 1 from Espín-Sánchez et al. (2023), which also requires that cross-parental correlations between the two families are equal — essentially that the marriage market in the parental generation was symmetric in a particular way.

The Assortative Mating Parameter

The structural assortative mating parameter \(\rho\) is defined as:

\[\rho = E[X^h_i \cdot X^w_i]\]

This is the direct correlation in economic status between spouses — the economically meaningful concept. The wife’s status \(X^w_i\) is unobserved, so \(\rho\) must be inferred indirectly.

Linking Structure to Observables

Multiplying the inheritance equations through by the relevant father’s status and taking expectations (with exclusion restrictions on the error terms) yields:

\[b^h_f = \beta_f + \rho \beta_m \tag{1}\] \[b^h_{fl} = b^f_{fl}(\beta_f + \beta_m) \tag{2}\] \[\rho = b^f_{fl}(\beta_f + \beta_m)^2 \tag{3}\]

This is a system of three equations in three unknowns: \(\beta_f\), \(\beta_m\), and \(\rho\). The solution is:

\[\beta_f + \beta_m = \frac{b^h_{fl}}{b^f_{fl}}\]

\[\rho = \frac{(b^h_{fl})^2}{b^f_{fl}}\]

This is the key estimating equation. The structural assortative mating parameter equals the squared women’s mobility slope divided by the father-on-father-in-law slope. Once \(\beta_f + \beta_m\) is recovered, \(\beta_m\) can be separated from \(\beta_f\) using equation (1).

Why the Father-Father-in-Law Correlation Is a Poor Proxy

The model clarifies a critical point: \(b^f_{fl}\) (the correlation between fathers-in-law) is not the same as \(\rho\). From equation (3):

\[b^f_{fl} = \frac{\rho}{(\beta_f + \beta_m)^2}\]

This means \(b^f_{fl}\) can rise even as \(\rho\) falls, if the sum of inheritance parameters \((\beta_f + \beta_m)\) also falls. This is exactly what happens in the data: the father-on-father-in-law correlation rises by 31% from 1850–1870 to 1900–1920, yet the structural \(\rho\) falls by 61%. Prior literature using \(b^f_{fl}\) as a proxy for sorting would incorrectly conclude that assortative mating was rising.

Using the Model for Counterfactuals

Once \(\rho\), \(\beta_f\), and \(\beta_m\) are estimated for each cohort, the authors conduct counterfactual analyses by substituting the \(\rho\) from one cohort into the structural equations for another cohort, while holding the inheritance terms constant. This yields a counterfactual women’s mobility estimate — what the rank-rank slope \(b^h_{fl}\) would have been had sorting been stronger or weaker.

Regression Models and Results

Intergenerational Mobility (Rank-Rank Regressions)

The main mobility equation is estimated separately for each of four 20-year marriage cohorts (1850–70, 1860–80, 1880–1900, 1900–1920):

\[\text{Adult Rank} = \alpha + \beta_0 \text{Woman} + \beta_1 \text{Rank}_{Father} + \beta_2 (\text{Woman} \times \text{Rank}_{Father}) + \varepsilon\]

Key features of estimation:

  • 2SLS: the father’s wealth score rank is instrumented with a second observation of his rank from a nearby census (following Ward 2023), eliminating attenuation bias from measurement error.
  • Controls: quartic polynomials in father’s age and husband’s age (both centered at 40) address life-cycle bias.
  • Inverse propensity-score weighting (IPW) is applied in the Massachusetts-born subsample to correct for non-random selection into the linked sample.

Coefficients of interest: \(\beta_1\) gives the rank-rank slope for men (persistence); \(\beta_1 + \beta_2\) gives the slope for women.

Key Results (Massachusetts-born sample)

Cohort Men (\(\beta_1\)) Women (\(\beta_1 + \beta_2\))
1850–1870 0.382 0.366
1860–1880 0.376 0.370
1880–1900 0.265 0.325
1900–1920 0.208 0.300

Both genders start at similar, high levels of persistence. Men’s mobility improves by 46% over the period; women’s improves by only 18%. A gender gap opens by the 1880–1900 cohort and widens further by 1900–1920, with men becoming substantially more mobile than women.

Father on Father-in-Law Correlation

The same 2SLS rank-rank framework is applied to estimate:

\[Y^f_i = \alpha + b^f_{fl} Y^{fl}_i + v_i\]

where the wife’s father’s rank is instrumented with a second observation of his status. This yields the \(b^f_{fl}\) estimates used in the structural model. Results show a 31% increase from 0.32 (1850–70) to 0.42–0.44 (1880–1920), which — misleadingly — would suggest rising assortative mating if taken at face value.

Structural Assortative Mating Parameter

Plugging the three estimated correlations into the structural system yields the following \(\rho\) estimates (from Table A7):

Cohort \(b^h_f\) \(b^h_{fl}\) \(b^f_{fl}\) \(\rho\)
1850–1870 0.46 0.34 0.35 0.34
1860–1880 0.45 0.33 0.33 0.33
1880–1900 0.33 0.26 0.42 0.16
1900–1920 0.26 0.23 0.41 0.13

The structural assortative mating parameter drops from 0.34 to 0.13, a 61% decline, even as the father-father-in-law proxy rises. The inherited-status sum \((\beta_f + \beta_m)\) falls from 0.97 to 0.57, reconciling the two trends.

Counterfactual Results

Substituting the high 1850–1870 \(\rho = 0.34\) into the 1900–1920 cohort’s structural equations (while keeping that cohort’s inheritance parameters) yields a counterfactual women’s rank-rank slope of 0.592, compared to the observed 0.233 — meaning women’s persistence would have been 154% higher had sorting remained at its mid-century level. Conversely, applying the low 1900–1920 \(\rho = 0.13\) to the 1850–1870 cohort reduces its counterfactual slope from 0.344 to 0.135, a 61% reduction.

Decomposition: Why Did Assortative Mating Fall?

The authors decompose the structural \(\rho\) using an IV-adapted law-of-total-covariance across demographic subgroups. For immigration (the most important factor):

  • Holding the share of immigrant-parent couples constant at early-cohort levels explains 52% of the decline in \(\rho\).
  • Holding constant the wealth score gap between immigrant and native families explains 36% of the decline.
  • Changes in within-group slopes contribute little.

Internal migration and urbanization explain essentially none of the aggregate decline — all subgroups (rural/urban, stayer/migrant) experienced similar parallel declines in sorting, suggesting no group-specific mechanism there.

Relation to the Literature

The paper engages a large literature on intergenerational mobility and assortative mating. A few key points of engagement:

  • Olivetti & Paserman (2015) use given-name pseudo-links to estimate women’s mobility; the current paper provides direct links via birth surnames, avoiding selection biases inherent to pseudo-linking.
  • Buckles et al. (2023) use genealogist-constructed family trees nationally; they find rising father-father-in-law correlations (0.63–0.75), which this paper reconciles by showing this proxy does not track the structural \(\rho\).
  • Bailey & Lin (2024) use Ohio birth/death/marriage records (LIFE-M); they find similar rank-rank mobility levels to this paper for overlapping cohorts.
  • Ward (2023) is followed methodologically for the IV correction of measurement error in father’s status; the paper finds higher persistence than much earlier historical work precisely because of this correction.
  • Espín-Sánchez, Ferrie & Vickers (2023) supply the structural model; this paper applies it empirically at scale to a new historical setting.
  • The finding that U.S. mobility did not monotonically decline reinforces Ward (2023) and challenges simpler narratives.

Conclusion

The paper’s central message is that the marriage market — not the labor market — was the primary channel of women’s intergenerational economic mobility in 19th-century America. The structural model is the key methodological contribution: it disentangles the unobservable spousal correlation (\(\rho\)) from the observable father-father-in-law correlation by embedding both in a common inheritance framework, and uses the ratio \(\rho = (b^h_{fl})^2 / b^f_{fl}\) as the estimating equation. Immigration — through both the growing share of immigrant-parent families and the narrowing wealth gap between immigrant and native families — accounts for most of the measured decline in assortative mating, and thus most of the measured improvement in women’s mobility, well before broad female labor force participation began.