Lecture 3: Panel Experiments and Dynamic Causal Effects
Emory University
Spring 2026
Main reference: Bojinov, Rambachan, and Shephard (2021) — “Panel Experiments and Dynamic Causal Effects: A Finite Population Perspective”
Definition (Potential Outcome). The potential outcome for unit \(i\) at time \(t\) along treatment path \(\mathbf{d}_{i,1:T} \in \mathcal{D}^T\) is: \[Y_{it}(\mathbf{d}_{i,1:T})\]
Assumption (Non-Anticipating Potential Outcomes). For all units \(i\), periods \(t\), and treatment paths \(\mathbf{d}_{i,1:T}, \tilde{\mathbf{d}}_{i,1:T} \in \mathcal{D}^T\): \[Y_{it}(\mathbf{d}_{i,1:T}) = Y_{it}(\tilde{\mathbf{d}}_{i,1:T}) \quad \text{whenever} \quad \mathbf{d}_{i,1:t} = \tilde{\mathbf{d}}_{i,1:t}\]
Interpretation:
Under non-anticipation: \(Y_{it}(\mathbf{d}_{i,1:t})\) instead of \(Y_{it}(\mathbf{d}_{i,1:T})\)
Q: What if treatment is announced in advance?
A: Define the “treatment” as the announcement, not implementation. Non-anticipation then holds relative to the announcement date.
Lecture 2 introduced: \(Y_{it}(\mathbf{d}_{i,1:T})\) where \(\mathbf{d}_{i,1:T} = (d_{i,1}, \ldots, d_{i,T})\) (Robins)
What’s new in this lecture?
This lecture provides the experimental foundation for treatment path dependence. Later: DiD uses parallel trends instead of randomization.
Key insight: Different paths lead to different potential outcomes at \(t=6\): \[Y_{i6}(\text{Path A}) \neq Y_{i6}(\text{Path B}) \neq Y_{i6}(\text{Path C})\]
A dynamic causal effect compares potential outcomes along different treatment paths: \[\tau_{it}(\mathbf{d}_{i,1:t}, \tilde{\mathbf{d}}_{i,1:t}) := Y_{it}(\mathbf{d}_{i,1:t}) - Y_{it}(\tilde{\mathbf{d}}_{i,1:t})\]
Problem: The number of comparisons grows exponentially with \(t\).
Solution: Restrict attention to the most recent \(p+1\) periods.
Definition (Lag-\(p\) Dynamic Causal Effect). For \(0 \leq p < t\) and treatment sequences \(\mathbf{d}, \tilde{\mathbf{d}} \in \mathcal{D}^{p+1}\): \[\tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p) := Y_{it}(\mathbf{d}^{obs}_{i,1:t-p-1}, \mathbf{d}) - Y_{it}(\mathbf{d}^{obs}_{i,1:t-p-1}, \tilde{\mathbf{d}})\] where \(\mathbf{d}^{obs}_{i,1:t-p-1}\) denotes unit \(i\)’s realized path up to period \(t{-}p{-}1\).
Note: \(p < t\) ensures there are enough past periods. Choosing \(p\): bias-variance tradeoff (larger \(p\) captures more carryover but needs more data).
Lag-\(p\) effect with \(p=2\), \(\mathbf{d}=(1,1,1)\), \(\tilde{\mathbf{d}}=(0,0,0)\): \[\tau_{it}((1,1,1), (0,0,0); 2) = Y_{it}(\mathbf{d}^{obs}_{i,1:t-3}, 1, 1, 1) - Y_{it}(\mathbf{d}^{obs}_{i,1:t-3}, 0, 0, 0)\]
Effect of treatment in periods \(t{-}2, t{-}1, t\) vs. control, conditional on the observed earlier path.
Lag-0 dynamic causal effect (\(p=0\)): \[\tau_{it}(d, \tilde{d}; 0) = Y_{it}(\mathbf{d}^{obs}_{i,1:t-1}, d) - Y_{it}(\mathbf{d}^{obs}_{i,1:t-1}, \tilde{d})\]
Key: The lag-0 effect is conditional on history. For the same unit, \(\tau_{it}(1,0;0)\) can differ depending on prior treatment status!
Example: Suppose the treatment is a pain medication.
Definition (Average Lag-\(p\) Dynamic Causal Effects). For \(p < T\) and \(\mathbf{d}, \tilde{\mathbf{d}} \in \mathcal{D}^{p+1}\):
Time-\(t\) average (across units): \(\bar{\tau}_{\cdot t}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{N} \sum_{i=1}^{N} \tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
Unit-\(i\) average (across time): \(\bar{\tau}_{i \cdot}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{T-p} \sum_{t=p+1}^{T} \tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
Total average (across all \(i, t\)): \(\bar{\tau}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{N(T-p)} \sum_{i=1}^{N} \sum_{t=p+1}^{T} \tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
Note: Sums start at \(t = p+1\) because we need \(p\) prior periods. Compare to Lecture 2: ATE\((t)\), unit-specific, overall ATE.
Recall from Lecture 2: We cannot learn causal effects directly from data without structure.
The fundamental problem:
What structure can help?
This lecture: Exploit known assignment probabilities from randomization to construct unbiased estimators via inverse probability weighting.
What makes this a panel experiment? The assignment mechanism is known.
Definition (Sequentially Randomized Assignments). Assignments are sequentially randomized if for all \(t \in \{1, \ldots, T\}\): \[\Pr(\mathbf{D}_{1:N,t} | \mathbf{D}_{1:N,1:t-1}, \mathbf{Y}_{1:N,1:T}) = \Pr(\mathbf{D}_{1:N,t} | \mathbf{D}_{1:N,1:t-1}, \mathbf{Y}_{1:N,1:t-1}(\mathbf{D}_{1:N,1:t-1}))\]
Interpretation: - Assignment at \(t\) can depend on past assignments and past observed outcomes - But not on future potential outcomes or counterfactual past outcomes - This is the panel analogue of “unconfounded” assignment - Knowing which outcomes would have been realized under alternative paths provides no additional information about current assignment
Important: Since we’re in an experiment, assignment probabilities are known to the researcher.
Sequential randomization allows cross-unit dependence in assignment. For unit-level HT estimation, we additionally need:
Definition (Individualistic Assignment). Assignments are individualistic for unit \(i\) if: \[\Pr(D_{it} | D_{-i,t}, \mathcal{F}_{1:N,t-1,T}) = \Pr(D_{it} | \mathbf{D}_{i,1:t-1}, \mathbf{Y}_{i,1:t-1})\] where \(\mathcal{F}_{1:N,t,T}\) is the filtration generated by treatments and potential outcomes.
Interpretation:
Example: Bernoulli assignment where \(\Pr(D_{it} = 1) = q\) for all \(i, t\) independently.
Definition (Adapted Propensity Score). For unit \(i\) at time \(t\) and treatment sequence \(\mathbf{d} = (d_{t-p}, \ldots, d_t) \in \mathcal{D}^{p+1}\): \[\pi_{i,t-p}(\mathbf{d}) := \Pr(\mathbf{D}_{i,t-p:t} = \mathbf{d} | \mathbf{D}_{i,1:t-p-1}, \mathbf{Y}_{i,1:t-1})\]
What this measures: Probability of observing path \(\mathbf{d}\) over periods \(t{-}p\) to \(t\), conditional on the unit’s past history. (The subscript \(t{-}p\) denotes the start of the treatment window.)
Why “adapted”? - The propensity score can change over time as information accumulates - But since the experiment is designed, we know these probabilities
Assumption (Probabilistic Assignment / Overlap). There exist constants \(0 < c_L < c_U < 1\) such that \(c_L < \pi_{i,t-p}(\mathbf{d}) < c_U\) for all \(i, t, \mathbf{d}\)
Step 1: Identify the experimental design (Bernoulli, block, adaptive, etc.)
Step 2: Compute period-by-period probabilities. For path \(\mathbf{d} = (d_{t-p}, \ldots, d_t)\): \[\pi_{i,t-p}(\mathbf{d}) = \prod_{s=t-p}^{t} \Pr(D_{is} = d_s | \text{history up to } s-1)\]
Step 3: Apply to specific designs
Key insight: We only need \(\pi_{i,t-p}(\mathbf{d})\) for the observed path — and we know it by design!
Definition (\((i,t)\)-th Contribution to Lag-\(p\) Effect Estimator). For \(\mathbf{d}, \tilde{\mathbf{d}} \in \mathcal{D}^{p+1}\): \[\hat{\tau}_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p) = \frac{Y_{it} \cdot \mathbf{1}\{\mathbf{D}_{i,t-p:t} = \mathbf{d}\}}{\pi_{i,t-p}(\mathbf{d})} - \frac{Y_{it} \cdot \mathbf{1}\{\mathbf{D}_{i,t-p:t} = \tilde{\mathbf{d}}\}}{\pi_{i,t-p}(\tilde{\mathbf{d}})}\]
Intuition: - When unit \(i\) follows path \(\mathbf{d}\): contribute \(Y_{it} / \pi_{i,t-p}(\mathbf{d})\) - When unit \(i\) follows path \(\tilde{\mathbf{d}}\): contribute \(-Y_{it} / \pi_{i,t-p}(\tilde{\mathbf{d}})\) - Otherwise: contribute zero. IPW corrects for different path probabilities
This is an estimation building block — we aggregate these \((i,t)\) contributions into plug-in averages to estimate population-level effects.
Definition (Plug-in Average Estimators).
Time-\(t\) average: \(\hat{\bar{\tau}}_{\cdot t}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{N} \sum_{i=1}^{N} \hat{\tau}_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
Unit-\(i\) average: \(\hat{\bar{\tau}}_{i \cdot}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{T-p} \sum_{t=p+1}^{T} \hat{\tau}_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
Total average: \(\hat{\bar{\tau}}(\mathbf{d}, \tilde{\mathbf{d}}; p) := \frac{1}{N(T-p)} \sum_{i=1}^{N} \sum_{t=p+1}^{T} \hat{\tau}_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\)
These estimate \(\bar{\tau}_{\cdot t}\), \(\bar{\tau}_{i \cdot}\), and \(\bar{\tau}\) from earlier — unbiased under randomization.
Theorem (Bojinov, Rambachan, and Shephard 2021, Theorem 3.1). Under individualistic and probabilistic assignment: \[\mathbb{E}[\hat{\tau}_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p) | \mathcal{F}_{i,t-p-1}] = \tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p)\] where \(\mathcal{F}_{i,t-p-1}\) is unit \(i\)’s realized history through period \(t{-}p{-}1\). The estimation error is a martingale difference sequence through time and conditionally independent across units.
Implications:
Variance: Can be conservatively estimated from the data. Plug-in averages are also unbiased.
Setup: \(N = 4\), binary treatment, Bernoulli(0.5), at period \(t\) (\(\mathcal{F}_{i,t-p-1}\) realized). \(2^4 = 16\) possible assignments.
| \(D_{1t}\) | \(D_{2t}\) | \(D_{3t}\) | \(D_{4t}\) | \(\hat{\bar{\tau}}_{\cdot t}\) |
|---|---|---|---|---|
| 0 | 0 | 0 | 1 | some number |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| 1 | 1 | 1 | 0 | some number |
“Unbiased” \(=\) average of \(\hat{\bar{\tau}}_{\cdot t}\) over all 16 assignments equals \(\bar{\tau}_{\cdot t}\).
Design-based \(\neq\) model-based. Potential outcomes are fixed; only assignments are random. We average over the randomization distribution, not over hypothetical repeated samples.
Full panel: Total randomization space is \(2^{NT}\), but unbiasedness works period by period (Theorem 3.1 conditions on \(\mathcal{F}_{i,t-p-1}\)), then aggregates via iterated expectations.
Three CLTs govern inference depending on the averaging dimension:
Theorem (Bojinov, Rambachan, and Shephard 2021, Theorem 3.2). Under individualistic, probabilistic assignment with bounded potential outcomes:
Cross-sectional (as \(N \to \infty\)): \(\frac{\sqrt{N}\{\hat{\bar{\tau}}_{\cdot t}(\mathbf{d}, \tilde{\mathbf{d}}; p) - \bar{\tau}_{\cdot t}(\mathbf{d}, \tilde{\mathbf{d}}; p)\}}{\sigma_{\cdot t}} \xrightarrow{d} N(0, 1)\) Time-series (as \(T \to \infty\)): \(\frac{\sqrt{T-p}\{\hat{\bar{\tau}}_{i \cdot}(\mathbf{d}, \tilde{\mathbf{d}}; p) - \bar{\tau}_{i \cdot}(\mathbf{d}, \tilde{\mathbf{d}}; p)\}}{\sigma_{i \cdot}} \xrightarrow{d} N(0, 1)\) Panel (as \(NT \to \infty\)): \(\frac{\sqrt{N(T-p)}\{\hat{\bar{\tau}}(\mathbf{d}, \tilde{\mathbf{d}}; p) - \bar{\tau}(\mathbf{d}, \tilde{\mathbf{d}}; p)\}}{\sigma} \xrightarrow{d} N(0, 1)\)
Key implications: - Time-specific inference (\(N \to \infty\)): Test effects at a given period across units - Unit-specific inference (\(T \to \infty\)): Test effects for a given unit across time - Total average inference (\(NT \to \infty\)): Pool across both dimensions
Practical considerations: - Variance can be conservatively estimated from the data - Enables valid confidence intervals and hypothesis tests
Two testing approaches: 1. Conservative tests: CLT + variance upper bound \(\to\) tests weak null \(H_0: \bar{\tau} = 0\) (average effect is zero) 2. Randomization tests: Exact tests under sharp null \(\tau_{it} = 0\) for all \((i,t)\) (no unit has any effect)
Under the sharp null \(H_0: \tau_{it}(\mathbf{d}, \tilde{\mathbf{d}}; p) = 0\) for all \((i,t)\):
Advantages: - Exact: No asymptotic approximation needed — valid in finite samples - Follows directly from Fisher’s classical randomization inference
Limitation: The sharp null (\(\tau_{it} = 0\) for every unit at every time) is stronger than the weak null (\(\bar{\tau} = 0\), the average is zero). The CLT-based conservative test handles the weak null.
Setup: \(N\) units, \(T=3\) periods. Period \(t{=}1\) is pre-treatment. Treatment randomized at \(t{=}2\) only: \(\Pr(D_{i2} = 1) = q\). No treatment at \(t{=}1\) or \(t{=}3\). All units share \(\mathbf{d}^{obs}_{i,1} = 0\).
Adapted propensity score: \(\pi_{i,2}(1) = q\), \(\pi_{i,2}(0) = 1 - q\) — known by design.
Key insight: Even in a simple RCT with follow-up, a naive “treatment effect” conflates contemporaneous and lagged effects. The lag-\(p\) framework separates them cleanly.
What if treatment is absorbing? Once treated, always treated: paths become \((0,0,0)\) and \((0,1,1)\).
Connection to staggered adoption: This is the simplest case of absorbing treatment. With many adoption dates, the same conflation problem scales up — motivating the lag-\(p\) decomposition.
Setup: Each unit independently assigned \(D_{it} = 1\) with probability \(q\) at every period.
Notable feature: No serial correlation in treatment — successive treatments are independent: \(\text{Cov}(D_{it}, D_{is}) = 0\) for \(s \neq t\). We will see why this independence property matters when we discuss fixed effects estimators.
Setup: Treatment is absorbing — once treated, always treated. Units adopt at different times \(g \in \{1, \ldots, T\}\).
Notable feature: Treatment is perfectly serially correlated once adopted — if \(D_{it} = 1\), then \(D_{is} = 1\) for all \(s > t\). Contrast with iid Bernoulli above!
We’ll see why this serial correlation structure matters for FE estimators.
Setup: \(N = 4\) units, period \(t = 2\), Bernoulli(0.5), lag-0 effect \(\hat{\bar{\tau}}_{\cdot 2}(1, 0; 0)\).
| Unit \(i\) | \(D_{i2}\) | \(Y_{i2}\) | HT contribution |
|---|---|---|---|
| 1 | 1 | 8 | \(8/0.5 = 16\) |
| 2 | 0 | 3 | \(-3/0.5 = -6\) |
| 3 | 1 | 6 | \(6/0.5 = 12\) |
| 4 | 0 | 2 | \(-2/0.5 = -4\) |
where HT contribution \(= \frac{Y_{i2} \cdot \mathbf{1}\{D_{i2} = 1\}}{\pi_{i2}(1)} - \frac{Y_{i2} \cdot \mathbf{1}\{D_{i2} = 0\}}{\pi_{i2}(0)}\)
\[\hat{\bar{\tau}}_{\cdot 2} = \tfrac{1}{4}(16 - 6 + 12 - 4) = 4.5\]
Check: With equal propensity scores (\(\pi_{i2}(1) = 0.5\) for all \(i\)), HT \(=\) difference in means: \(\bar{Y}_{\text{treated}} - \bar{Y}_{\text{control}} = 7 - 2.5 = 4.5\) \(\checkmark\)
What if propensity scores are unequal? Does simple mean comparison still work?
Same units, but now an adaptive design: propensity scores vary across units.
| Unit \(i\) | \(D_{i2}\) | \(Y_{i2}\) | \(\pi_{i2}(D_{i2})\) | HT contribution |
|---|---|---|---|---|
| 1 | 1 | 8 | 0.8 | \(8/0.8 = 10\) |
| 2 | 0 | 3 | 0.7 | \(-3/0.7 = -4.29\) |
| 3 | 1 | 6 | 0.3 | \(6/0.3 = 20\) |
| 4 | 0 | 2 | 0.4 | \(-2/0.4 = -5\) |
\[\hat{\bar{\tau}}_{\cdot 2}^{HT} = \tfrac{1}{4}(10 - 4.29 + 20 - 5) = 5.18\]
Naive difference in means: \(\bar{Y}_{\text{treated}} - \bar{Y}_{\text{control}} = 7 - 2.5 = 4.5 \neq 5.18\)
Why the difference? Unit 3 (\(D_{i2} = 1\), \(\pi = 0.3\)) was unlikely to be treated — its outcome is more “informative” about treatment effects and gets upweighted. Naive means ignore this, producing bias.
Setting: Andreoni and Samuelson (2006) study cooperative behavior in a twice-repeated prisoners’ dilemma.
Design: \(N = 110\) participants, \(T = 20\) rounds, randomly matched into pairs each round. Outcome: cooperation in period 1 of the game.
Treatment (\(D_{it} \in \{0, 1\}\)): The payoff structure parameter \(\lambda\) is randomly varied each round.
Treatment is randomly assigned each round \(\Rightarrow\) Bernoulli-like design. But past game structures may affect current behavior through learning — exactly the carryover concern.
| \(p=0\) | \(p=1\) | \(p=2\) | \(p=3\) | |
|---|---|---|---|---|
| Point estimate \(\hat{\bar{\tau}}^{\dagger}(1,0;p)\) | 0.285 | 0.058 | 0.134 | 0.089 |
| Conservative \(p\)-value | 0.000 | 0.226 | 0.013 | 0.126 |
| Randomization \(p\)-value | 0.000 | 0.263 | 0.012 | 0.114 |
\(\dagger\): Overall average HT estimator \(\hat{\bar{\tau}}\) as defined earlier, pooling across all units and rounds. Conservative tests use CLT; randomization tests simulate under sharp null.
Findings:
Interpretation: The lag-2 effect suggests learning dynamics — players update beliefs based on past experiences. This carryover would bias naive FE regressions.
We’ve established that Horvitz-Thompson estimators are unbiased.
But practitioners often use simpler fixed effects estimators.
Do they work under carryover effects?
Common practice: Run OLS with unit fixed effects: \[Y_{it} = \alpha_i + \beta D_{it} + \epsilon_{it}\]
Why it seems reasonable:
Question: Does the unit FE estimator \(\hat{\beta}_{UFE}\) recover a meaningful causal effect when there are carryover effects?
Answer: In general, no — and the bias can be substantial.
To precisely characterize FE bias, we work within a linear structural model for potential outcomes:
Definition (Bojinov, Rambachan, and Shephard 2021, Definition 7). A linear potential outcome panel satisfies: \[Y_{it}(d_{i,1:t}) = \beta_{it,0} d_{it} + \beta_{it,1} d_{i,t-1} + \cdots + \beta_{it,t-1} d_{i,1} + \epsilon_{it}\]
Interpretation:
This is a structural model for potential outcomes that separates contemporaneous from carryover effects. It nests the no-carryover case (\(\beta_{it,s} = 0\) for \(s > 0\)).
Definition: The unit fixed effects (UFE) estimator is: \[\hat{\beta}_{UFE} = \frac{\sum_{i=1}^{N} \sum_{t=1}^{T} \tilde{Y}_{it} \tilde{D}_{it}}{\sum_{i=1}^{N} \sum_{t=1}^{T} \tilde{D}_{it}^2}\]
where \(\tilde{A}_{it} = A_{it} - \bar{A}_{i\cdot}\) denotes the within-unit deviation from unit \(i\)’s time average.
Equivalently: OLS coefficient from regressing \(Y_{it}\) on \(D_{it}\) with unit fixed effects: \[Y_{it} = \alpha_i + \beta D_{it} + \text{error}_{it}\]
UFE uses within-unit variation in treatment over time to identify effects. It removes time-invariant unit heterogeneity.
Proposition (Bojinov, Rambachan, and Shephard 2021, Proposition 4.1). Under a linear potential outcome panel, as \(N \to \infty\): \[\hat{\beta}_{UFE} \xrightarrow{p} \underbrace{\frac{\sum_{t=1}^{T} \tilde{\kappa}_{D,\beta,t,t}}{\sum_{t=1}^{T} \tilde{\sigma}^2_{D,t}}}_{\text{Target}} + \underbrace{\frac{\sum_{t=1}^{T} \sum_{s=1}^{t-1} \tilde{\kappa}_{D,\beta,t,s}}{\sum_{t=1}^{T} \tilde{\sigma}^2_{D,t}}}_{\text{Carryover Bias}} + \underbrace{\frac{\sum_{t=1}^{T} \tilde{\delta}_t}{\sum_{t=1}^{T} \tilde{\sigma}^2_{D,t}}}_{\text{Specification Error}}\]
(Each quantity defined on the next slide.)
Three components: 1. Target: A variance-weighted average of contemporaneous effects \(\beta_{it,0}\) 2. Carryover bias: Arises when past treatment affects outcomes and treatment is serially correlated 3. Specification error: Arises if untreated potential outcomes \(Y_{it}(\mathbf{0})\) vary over time
UFE is unbiased only when both carryover bias and specification error are zero.
Within-unit deviation: \(\tilde{D}_{it} = D_{it} - \bar{D}_{i\cdot}\), where \(\bar{D}_{i\cdot} = T^{-1}\sum_{t=1}^{T} D_{it}\)
Treatment variation: \(\tilde{\sigma}^2_{D,t} = \lim_{N\to\infty} N^{-1} \sum_i \text{Var}(\tilde{D}_{it})\) \(\quad\) How much within-unit treatment variation exists at time \(t\)?
Effect–correlation interaction: \(\tilde{\kappa}_{D,\beta,t,s} = \lim_{N\to\infty} N^{-1} \sum_i \beta_{it,s} \cdot \text{Cov}(\tilde{D}_{it}, \tilde{D}_{is})\) \(\quad\) How much does carryover from period \(s\) “leak” into the period-\(t\) estimate?
Specification error: \(\tilde{\delta}_t\) captures time-varying untreated outcomes \(\quad\) Vanishes if \(Y_{it}(\mathbf{0})\) is time-invariant (conditional on unit FE)
The critical insight: Carryover bias \(= \sum_{s<t} \beta_{it,s} \cdot \text{Cov}(\tilde{D}_{it}, \tilde{D}_{is})\). This is the product of carryover effects and serial correlation in treatment. If either is zero, the bias vanishes.
Even without carryover or specification error, UFE estimates: \[\frac{\sum_{t=1}^{T} \tilde{\kappa}_{D,\beta,t,t}}{\sum_{t=1}^{T} \tilde{\sigma}^2_{D,t}} = \frac{\sum_{t} \left( N^{-1} \sum_i \beta_{it,0} \cdot \text{Var}(\tilde{D}_{it}) \right)}{\sum_{t} \left( N^{-1} \sum_i \text{Var}(\tilde{D}_{it}) \right)}\]
\(\to\) Weights \(\propto \text{Var}(\tilde{D}_{it})\): Units/periods with more treatment variation get more weight.
Key questions: - Are these weights policy-relevant? Do we care more about \((i,t)\) with high treatment variation? - What if effects are heterogeneous (\(\beta_{it,0}\) varies)? Weighted average \(\neq\) simple avg. - How does this compare to equally-weighted averages from Horvitz-Thompson?
Takeaway: Unbiasedness requires specifying the target. UFE is unbiased for a particular weighted average — but weights come from design, not policy.
Carryover bias \(\propto\) (carryover effects) \(\times\) (serial correlation in treatment)
| No Serial Correlation (iid Bernoulli) | Serial Correlation (staggered) | |
|---|---|---|
| No Carryover (\(\beta_{it,s}=0\) for \(s>0\)) | Unbiased | Unbiased |
| Carryover Present (\(\beta_{it,s} \neq 0\)) | Unbiased | BIASED |
Surprising: Under iid Bernoulli, UFE is unbiased even with carryover! Why? \(\text{Cov}(\tilde{D}_{it}, \tilde{D}_{is}) = 0\) for \(s \neq t\), so the carryover bias term vanishes.
Why staggered = guaranteed bias: Under staggered adoption, \(D_{it} = 1\) for all \(t \geq g_i\) \(\Rightarrow\) \(\text{Cov}(\tilde{D}_{it}, \tilde{D}_{is}) > 0\) for post-adoption periods. Any \(\beta_{it,s} \neq 0\) makes the bias non-zero.
TWFE: \(Y_{it} = \alpha_i + \lambda_t + \beta D_{it} + \text{error}_{it}\)
Proposition (Bojinov, Rambachan, and Shephard 2021, Proposition 4.2). TWFE has the same three-component bias structure as UFE, with different weights.
Important distinction from DiD literature: - Staggered DiD issues (Goodman-Bacon 2021; Sun and Abraham 2021; Borusyak, Jaravel, and Spiess 2024; Imai and Kim 2021): arise even without carryover, under PT assumptions - Here: bias arises specifically because of carryover effects
Athey and Imbens (2022) also take a design-based approach to staggered adoption, but don’t focus on carryover effects.
Takeaway: When carryover effects are possible, use Horvitz-Thompson estimators — not unit FE or TWFE.
Experiments give us unbiased estimation.
But what if we can’t randomize?
From design-based to assumption-based identification.
So far: Propensity scores known by design \(\Rightarrow\) Horvitz-Thompson is unbiased.
What if we don’t run the experiment? Blackwell and Glynn (2018) address this using the same framework:
Blackwell and Glynn’s key insight: With time-varying covariates affected by past treatment, standard regression cannot consistently estimate lagged effects (post-treatment bias). Solutions exist (inverse probability weighting, structural models) — see Blackwell and Glynn (2018) for details.
Key limitation: Sequential ignorability cannot handle time-constant unmeasured confounders — this is the fundamental gap that motivates DiD and parallel trends in the rest of this course.
What’s unique to this lecture:
Main message: When carryover effects are possible, don’t default to TWFE. DiD provides an alternative that handles unmeasured time-constant confounders.
Next lecture: Efficient Estimation with Staggered Designs.
Questions?
Next: Efficient Estimation with Staggered Designs

ECON 730 | Causal Panel Data | Pedro H. C. Sant’Anna