Lecture 14

class: title-slide

# Econ 520: Data Science for Economists

## Lecture 14: Linear Regression Adjustments under Unconfoundedness: Inference
<br>

<p align=center>
Pedro H. C. Sant'Anna
</p>
<div style="margin-top: -.7cm;"></div>
<p align=center>
Emory University
</p>
<br>
<p align=center>
Spring 2024
</p>

---
class: center, middle
name: prologue

# Main Goal

---
# Main Goals
<style type="text/css">
.small-code .remark-code{
  font-size: 40%
}

.medium-code .remark-code{
  font-size: 55%
}
</style>

- The main goal of this lecture is to discuss how we can make inference about ATE and ATT using linear regression adjstments, under unconfoundedness.

- We will cover the principles of it, in theory and in practice.

- We will talk about two-step estimation procedures and their complications.

- Talk about bootstrap and its application to this problem.

- Illustrate the concepts via Monte Carlo Simulations

---
class: center, middle
name: CI

# Estimating ATE and ATT using Linear Regression Adjustments

---

# Unconfoundedness + Overlap

- In the previous lecture, we discussed how we can estimate the ATE and ATT using linear regression adjustments.

- The base of the argument was the unconfoundedness and the overlap assumptions.

- With these two assumptions, we can write the ATE and ATT as
`$$\begin{eqnarray*}
ATE &\equiv& E[Y(1)- Y(0)] ~~~~~~~~~~~~= E[m_{D=1}(X)] - E[m_{D=0}(X)],\\
ATT &\equiv& E[Y(1)- Y(0)|D=1] = E[Y|D=1] - E[m_{D=0}(X)|D=1],
\end{eqnarray*}$$`

where `$m_{D=d}(X) \equiv E[Y|D=d,X]$` is a conditional expectation function (that can be estimated with regressions).
---
# Plug-in estimators

- We can estimate the ATE and ATT by plugging in the estimates of the conditional expectations.

- To do that, all we need to do is come up with a model for `$E[Y|D=1,X]$` and `$E[Y|D=0,X]$`, i.e., models for `$m_{D=1}(X)$` and `$m_{D=0}(X)$`.

- But .hi[Econ 320] has already taught you how to do that, at least to start the game!

- We can use linear regression to estimate these conditional expectations.

---
# Plug-in estimators

- We can pose a linear model for `$E[Y|D=1,X]$`, i.e., `$m_{D=1}(X) = X'\beta^{D=1}$`, and use the linear regression model among treated units to estimate these unknown coefficients, i.e., run the following regression:
`$$\begin{eqnarray*}
Y_i &=& X_i'\beta^{D=1} + \varepsilon^{D=1}_i~ \text{among treated units}.
\end{eqnarray*}$$`

- We can pose a similar model for `$E[Y|D=0,X]$`, i.e., `$m_{D=0}(X) = X'\beta^{D=0}$`, and use the linear regression model among untreated units to estimate these unknown coefficients, i.e., run the following regression:
`$$\begin{eqnarray*}
Y_i = X_i'\beta^{D=0} + \varepsilon^{D=0}_i ~ \text{among untreated units}.
\end{eqnarray*}$$`

- The implied `$CATE(X)$` behind these models is
`$$\begin{eqnarray*}
{CATE}(X) = m_{D=1}(X) - m_{D=0}(X)= X'\beta^{D=1} - X'\beta^{D=0}.
\end{eqnarray*}$$`

---
# Plug-in estimators
- The issue is that we don't know the true values of `$\beta^{D=1}$` and `$\beta^{D=0}$`.

- But we can estimate them!

- Once we have estimates of `$\beta^{D=1}$` and `$\beta^{D=0}$`, we can plug them into the formulas for the ATE and ATT to get the plug-in estimators of these parameters as
`$$\begin{eqnarray*}
\widehat{ATE} &=&  E_n[X'\widehat{\beta}^{D=1}] - E_n[X'\widehat{\beta}^{D=0}], \\
\widehat{ATT} &=& E_n[X'\widehat{\beta}^{D=1}|D=1] - E_n[X'\widehat{\beta}^{D=0}|D=1]
\end{eqnarray*}$$`
where, for a generic variable `$Z$`, `$E_n[Z]$` and `$E_n[Z|D=1]$` are the sample averages of `$Z$` among all units and 
among treated units, respectively.
---
# What if we have more flexible models for `$m_{D=d}(X)$`?
- .hi[What if we were to be more flexible about the functional form of the conditional expectations?]

- .hi[What would be the implications of that?]

- .hi[Would it be a good idea?]

- .hi[Are there challenges?]

- .hi[How would you implement these more flexible models?]
---

class: center, middle
name: Inference

# What about inference?

---
# But what about inference?

- In practice, we do not want only learn about the values of `$ATE$` and `$ATT$`.

- We also want to know how precise our estimates are.

- We want to know how confident we are that our estimates are close to the true values of `$ATE$` and `$ATT$`.

- We want to know if we can reject the null hypothesis that `$ATE$` and `$ATT$` are zero.

- We want to construct confidence intervals for `$ATE$` and `$ATT$`.

- We want to test if `$ATE$` and `$ATT$` are different from each other.

---
# How can we do that?
- Let's first focus on the ATE.

- With some abuse of notation, we can write our regression adjuested (RA) estimator of the ATE as
`$$\begin{eqnarray*}
\widehat{ATE} &=&  \dfrac{1}{n}\sum_{i=1}^n (\widehat{Y}_i(1) - \widehat{Y}_i(0)),
\end{eqnarray*}$$`
where `$\widehat{Y}_i(1)$` and `$\widehat{Y}_i(0)$` are the predicted values of `$Y_i$` from the linear regression models for treated and untreated units, respectively, i.e., for every unit `$i$`,
`$$\begin{eqnarray*}
\widehat{Y}_i(1) &=& X_i'\widehat{\beta}^{D=1}, \\
\widehat{Y}_i(0) &=& X_i'\widehat{\beta}^{D=0}.
\end{eqnarray*}$$`
---
# Quantifying uncertainty
- To make inference about the `$ATE$`, we need to quantify the uncertainty around our point estimate `$\widehat{ATE}$`.

- But our point estimate `$\widehat{ATE}$` looks like a simple average of `$n$` random variables.

- We know how to make inference about the mean of a random variable, right?

- .hi[What kind of statistical tools can we use to make asymptotically valid inference about the mean of a random variable?]

- .hi[What are the assumptions that we need to make to use these tools?]

- .hi[How can we apply these tools to our case?]
---
# Quantifying uncertainty

- Without making distributional assumptions, we would invoke the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) to make inference about the ATE.

- To justify these results, we often rely on the summands of the sample mean being independent and identically distributed (i.i.d.) random variables.

- In our case, that would entail to assume that `$\widehat{Y}_i(1) - \widehat{Y}_i(0)$` are i.i.d.

- If this were the case, we could compute the standard error of `$\widehat{ATE}$` as
`$$\begin{eqnarray*}
\widehat{\text{SE}}(\widehat{ATE}) = \bigg(\dfrac{1}{n^2}\sum_{i=1}^n (\widehat{Y}_i(1) - \widehat{Y}_i(0) - \widehat{ATE})^2.\bigg)^{\frac{1}{2}}
\end{eqnarray*}$$`

- .hi[But can we actually do this and trust the results?]

---
# Quantifying uncertainty

- Unfortunately, even when our data is i.i.d.,  `$\widehat{Y}_i(1) - \widehat{Y}_i(0)$` are not i.i.d. random variables.

- This happens essentially because we are using the entire data to estimate `$\beta^{D=1}$` and `$\beta^{D=0}$`.

- And the estimated coefficients are used to predict the counterfactual outcomes for all units.

- This leads to a correlation between the fitted values `$\widehat{Y}_i(1)$` and `$\widehat{Y}_i(0)$`.

- In summary, because we replace the true coefficients with their estimates, the summands of the sample mean are not i.i.d. random variables.

- .hi[Thus, we need to tackle this issue to make valid inference about the ATE and ATT.]
---
# Addendum: two-step estimation

- Our RA estimator for the ATE is a special case of a more general two-step estimation procedure.

- In the first step, we estimate the conditional expectations `$m_{D=1}(X)$` and `$m_{D=0}(X)$`.

- In the second step, we use the estimated conditional expectations to predict the counterfactual outcomes for all units.

- This two-step estimation procedure is very common in the literature.

- When making inference, we need to account for the uncertainty in both steps.

- If we ignore the uncertainty in the first step, our standard errors (and confidence intervals) will not be justified.

---
# Adjusting the standard errors

- The key to make valid inference about the ATE is to find a way to write `$\sqrt{n} (\widehat{ATE} - ATE)$` as a sum of i.i.d. random variables with mean zero and finite variance (plus some asymptotically negligible term), i.e., 
`$$\begin{eqnarray*}
\sqrt{n} (\widehat{ATE} - ATE) = \dfrac{1}{\sqrt{n}}\sum_{i=1}^n IF_{i, \widehat{ate}} + o_p(1),
\end{eqnarray*}$$`
where `$IF_{i, \widehat{ate}}$` is the influence function of the RA estimator for the ATE.

- The influence function of a parameter is a function that quantifies the effect of a small perturbation in the data on the parameter.
---
# IF of the RA estimator for ATE

- In our case, the influence function of our RA estimator for ATE is given by

`$$\begin{eqnarray*}
IF_{i, \widehat{ate}} &=& X_i'(\beta^{D=1} - \beta^{D=0}) \\
&& + (D_i \varepsilon^{D=1}_i X_i')E[DXX']^{-1}E[X] \\
&& - ((1 - D_i) \varepsilon^{D=0}_i X_i')E[(1-D) XX']^{-1}E[X]
\end{eqnarray*}$$`

- First term: uncertainty from second stage (if we were to know the true `$\beta$`'s)
- Second and third terms: uncertainty from first stage (estimation of `$\beta^{D=1}$` and `$\beta^{D=0}$`)

- Based on these results, we can prove that, as `$n\rightarrow \infty$`, the RA estimator for the ATE is asymptotically normal, i.e.,
`$$\begin{eqnarray*}
\sqrt{n} (\widehat{ATE} - ATE) \rightarrow_d N(0,Var(IF_{i, \widehat{ate}})).
\end{eqnarray*}$$`

---
# IF of the RA estimator for ATE
- Thus, if we can estimate the IF, we can make valid inference about the ATE.

- But recall that the IF contains unknown quantities, such as the error terms `$\varepsilon^{D=1}_i$` and `$\varepsilon^{D=0}_i$`, as well and true coefficients `$\beta^{D=1}$` and `$\beta^{D=0}$`, and population expectations:

- To leverage the influence function, we need to estimate the unknown quantities.

---
# Inference for ATE
- Plug-in principle for the win!

`$$\begin{eqnarray*}
\widehat{IF}_{i, \widehat{ate}} &=& X_i'(\widehat{\beta}^{D=1} - \widehat{\beta}^{D=0}) \\
&& + (D_i \widehat{\varepsilon}^{D=1}_i X_i')E_n[DXX']^{-1}E_n[X] \\
&& - ((1 - D_i) \widehat{\varepsilon}^{D=0}_i X_i')E_n[(1-D) XX']^{-1}E_n[X],
\end{eqnarray*}$$`
where `$\widehat{\beta}^{D=1}$` and `$\widehat{\beta}^{D=0}$` are the OLS estimates of `$\beta^{D=1}$` and `$\beta^{D=0}$`, and `$\widehat{\varepsilon}^{D=1}_i$` and `$\widehat{\varepsilon}^{D=0}_i$` are the residuals from the first stage regressions.

- We can then estimate the asymptotic variance of the RA estimator for the ATE as
`$$\begin{eqnarray*}
\widehat{Var}(\widehat{ATE}) = \dfrac{1}{n}\sum_{i=1}^n \widehat{IF}_{i,\widehat{ate}}^{2}.
\end{eqnarray*}$$`

---
# Inference for ATE
- The standard errors are then given by `$\sqrt{\widehat{Var}(\widehat{ATE})\bigg/n}$`.

- With the standard errors in hand, we can construct confidence intervals and hypothesis tests.

- 95\% confidence intervals for the ATE are given by `$\widehat{ATE} \pm 1.96 \times \sqrt{\widehat{Var}(\widehat{ATE})/n}$`.

- We can test if ATE is zero if `$|\widehat{ATE}|\bigg/ \sqrt{\widehat{Var}(\widehat{ATE})/n} > 1.96$`.

- We can also compute p-values for the null hypothesis that ATE is zero.

---
# Inference for ATE, in practice

-.hi[I am sure you are thinking:] This is all nice and dandy, but how do I do this in practice?

- Luckily, there are packages that automate this for you!

- Here, we will use the `targeted` package in `R`.

- A convenient way to install (if necessary) and load everything is by running the below code chunk.

```r
## Load and install the packages that we'll be using today
if (!require("pacman")) install.packages("pacman")
pacman::p_load(estimatr,tidyverse, targeted, boot, ggplot2)
```
---
# Let's use our simulated data from Lecture 13

```r
# Create a function to simulate the data
simulate_data <- function(n_obs){
  # Generate confounders
  x1 <- rnorm(n_obs, mean = 0, sd = 1); x2 <- rnorm(n_obs, mean = 0, sd = 2)
  x3 <- rpois(n_obs, lambda = 1) ;  x4 <- rbinom(n_obs, size = 1, prob = 0.2)
  # put them all in a matrix
  x <- cbind(1, x1, x2, x3, x4)
  # Generate Potential Outcomes as linear functions of the confounders
  beta_Y0 <- c(1, 2, 3, 4, 5);   beta_Y1 <- c(2, 3, 4, 5, 5)
  epislon_Y0 <- rnorm(n_obs); epislon_Y1 <- rnorm(n_obs)
  y1 <- x %*% beta_Y1 + epislon_Y1
  y0 <- x %*% beta_Y0 + epislon_Y0
  # Generate the treatment indicator
  beta_pscore <- c(1, 1,-1,1,-1)/3
  pscore <- plogis(x %*% beta_pscore)
  d <- 1*(runif(n_obs) <= pscore)
  #observed outcome
  y <- y1*d + y0*(1-d)
  df <- data.frame(y=y, d=d, x1=x1, x2=x2, x3=x3, x4=x4)
  return(df)
}
```
---
# Let's estimate the ATE

```r
set.seed(20240404)
n_obs <- 1000
df <- simulate_data(n_obs)

#Let's estimate the ate
atehat <- targeted::ate(y ~ d |  d*(x1+x2+x3+x4)|1, 
    data=df, binary=FALSE)
```
---
# Let's estimate the ATE
.small-code[

```r
summary(atehat)
```

```
## 
## Augmented Inverse Probability Weighting estimator
##   Response y (Outcome model: gaussian):
## 	 y ~ d * (x1 + x2 + x3 + x4) 
##   Exposure d (Propensity model: logistic regression):
## 	 d ~ 1 
## 
##                   Estimate Std.Err    2.5%   97.5%    P-value
##  d=0                6.3514 0.25937  5.8430  6.8597 2.016e-132
##  d=1                8.4453 0.33804  7.7828  9.1079 9.228e-138
## Outcome model:                                               
##  (Intercept)        1.0277 0.06831  0.8939  1.1616  3.658e-51
##  d                  0.9430 0.09261  0.7615  1.1245  2.367e-24
##  x1                 1.9659 0.04750  1.8728  2.0590  0.000e+00
##  x2                 2.9663 0.02573  2.9159  3.0167  0.000e+00
##  x3                 4.0460 0.04998  3.9481  4.1440  0.000e+00
##  x4                 4.8494 0.11763  4.6188  5.0799  0.000e+00
##  d:x1               1.0288 0.06183  0.9076  1.1500  3.684e-62
##  d:x2               1.0623 0.03169  1.0002  1.1244 2.337e-246
##  d:x3               1.0113 0.06122  0.8913  1.1313  2.724e-61
##  d:x4               0.1570 0.15275 -0.1424  0.4564  3.041e-01
## Propensity model:                                            
##  (Intercept)       -0.4895 0.06515 -0.6172 -0.3619  5.726e-14
## 
## Average Treatment Effect (constrast: 'd=1' - 'd=0'):
## 
##     Estimate Std.Err  2.5% 97.5%   P-value
## ATE    2.094  0.1088 1.881 2.307 1.308e-82
```
]
---
# Inference via bootstrap

- Sometimes, the asymptotic variance of the ATE is not well approximated by the formula we discussed earlier.

- In such cases, we can use the bootstrap to estimate the variance of the ATE.

- The bootstrap is a resampling method that estimates the sampling distribution of a statistic by resampling the data with replacement.

- The bootstrap is a powerful tool that can be used to estimate the variance of a statistic, construct confidence intervals, and conduct hypothesis tests.

- The bootstrap is particularly useful when the asymptotic variance is difficult to estimate, or the asymptotic variance is difficult to derive.

- Or when a package is not available to estimate the variance of your estimator.
---
# Inference via bootstrap: ATT and ATE

- We can use the bootstrap to estimate the standard errors for the ATT (which `targeted` does not provide)

- The `boot` package in `R` is a powerful package for conducting the bootstrap.

- The `boot` package requires a function that computes the statistic of interest.

- The function should take two arguments: the data and the indices of the data to resample.

- The `boot` function then resamples the data and computes the statistic of interest many times.

- The standard errors are then estimated as the standard deviation of the bootstrap estimates.
---
# Inference via bootstrap: ATT and ATE

```r
# Function to compute the ATE and ATT
bootfun_ate_att <- function(data, indices, ...) {
  df1 <- data[indices, ]
  reg_treated <- estimatr::lm_robust(y ~ x1 + x2 + x3 + x4, data = df1 %>% filter (d==1))
  reg_untreated <- estimatr::lm_robust(y ~ x1 + x2 + x3 + x4, data = df1 %>% filter (d==0))
  y1_hat <- predict(reg_treated, newdata = df1)
  y0_hat <- predict(reg_untreated, newdata = df1)
  cate_hat <- y1_hat - y0_hat
  ate_hat = as.numeric(mean(cate_hat))
  att_hat = as.numeric(mean(cate_hat[df1$d==1]))
  out <- cbind(ate_hat ,att_hat )
  return(out)
}

# Conduct the bootstrap
boot_out <- boot::boot(data = df, 
                       statistic = bootfun_ate_att, 
                       R = 1000)
```
---
# Inference via bootstrap: ATT and ATE
.small-code[

```r
#Let's see the results
# Original point estimates
boot_out$t0
```

```
##       ate_hat  att_hat
## [1,] 2.093953 1.815709
```

```r
# Bootstrap standard errors
apply(boot_out$t, 2, sd)
```

```
## [1] 0.1093330 0.1280646
```

```r
# print all the results
boot_out
```

```
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot::boot(data = df, statistic = bootfun_ate_att, R = 1000)
## 
## 
## Bootstrap Statistics :
##     original       bias    std. error
## t1* 2.093953 -0.002027230   0.1093330
## t2* 1.815709 -0.001023231   0.1280646
```
]
---
# Inference via bootstrap: ATT and ATE

```r
# Can we test if the ATE is equal to ATT?

# Point estimate
ate_minus_att <- boot_out$t0[1] - boot_out$t0[2]
ate_minus_att
```

```
## [1] 0.2782438
```

```r
# Bootstrap standard error of the difference
se_ate_minus_att <- sd(boot_out$t[,1] - boot_out$t[,2])
se_ate_minus_att
```

```
## [1] 0.067948
```

```r
# t-statistic
t_stat <- ate_minus_att/se_ate_minus_att
```
---
# Inference via bootstrap: ATT and ATE

```r
# p-value of the difference
p_value <- 2*pnorm(-abs(t_stat))
p_value
```

```
## [1] 4.222561e-05
```