class: title-slide # Econ 520: Data Science for Economists ## Lecture 14: Linear Regression Adjustments under Unconfoundedness: Inference <br> <p align=center> Pedro H. C. Sant'Anna </p> <div style="margin-top: -.7cm;"></div> <p align=center> Emory University </p> <br> <p align=center> Spring 2024 </p> --- class: center, middle name: prologue # Main Goal <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1100px></html> --- # Main Goals <style type="text/css"> .small-code .remark-code{ font-size: 40% } .medium-code .remark-code{ font-size: 55% } </style> - The main goal of this lecture is to discuss how we can make inference about ATE and ATT using linear regression adjstments, under unconfoundedness. - We will cover the principles of it, in theory and in practice. - We will talk about two-step estimation procedures and their complications. - Talk about bootstrap and its application to this problem. - Illustrate the concepts via Monte Carlo Simulations --- class: center, middle name: CI # Estimating ATE and ATT using Linear Regression Adjustments <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1100px></html> --- # Unconfoundedness + Overlap - In the previous lecture, we discussed how we can estimate the ATE and ATT using linear regression adjustments. - The base of the argument was the unconfoundedness and the overlap assumptions. - With these two assumptions, we can write the ATE and ATT as `$$\begin{eqnarray*} ATE &\equiv& E[Y(1)- Y(0)] ~~~~~~~~~~~~= E[m_{D=1}(X)] - E[m_{D=0}(X)],\\ ATT &\equiv& E[Y(1)- Y(0)|D=1] = E[Y|D=1] - E[m_{D=0}(X)|D=1], \end{eqnarray*}$$` where `\(m_{D=d}(X) \equiv E[Y|D=d,X]\)` is a conditional expectation function (that can be estimated with regressions). --- # Plug-in estimators - We can estimate the ATE and ATT by plugging in the estimates of the conditional expectations. - To do that, all we need to do is come up with a model for `\(E[Y|D=1,X]\)` and `\(E[Y|D=0,X]\)`, i.e., models for `\(m_{D=1}(X)\)` and `\(m_{D=0}(X)\)`. - But .hi[Econ 320] has already taught you how to do that, at least to start the game! - We can use linear regression to estimate these conditional expectations. --- # Plug-in estimators - We can pose a linear model for `\(E[Y|D=1,X]\)`, i.e., `\(m_{D=1}(X) = X'\beta^{D=1}\)`, and use the linear regression model among treated units to estimate these unknown coefficients, i.e., run the following regression: `$$\begin{eqnarray*} Y_i &=& X_i'\beta^{D=1} + \varepsilon^{D=1}_i~ \text{among treated units}. \end{eqnarray*}$$` - We can pose a similar model for `\(E[Y|D=0,X]\)`, i.e., `\(m_{D=0}(X) = X'\beta^{D=0}\)`, and use the linear regression model among untreated units to estimate these unknown coefficients, i.e., run the following regression: `$$\begin{eqnarray*} Y_i = X_i'\beta^{D=0} + \varepsilon^{D=0}_i ~ \text{among untreated units}. \end{eqnarray*}$$` - The implied `\(CATE(X)\)` behind these models is `$$\begin{eqnarray*} {CATE}(X) = m_{D=1}(X) - m_{D=0}(X)= X'\beta^{D=1} - X'\beta^{D=0}. \end{eqnarray*}$$` --- # Plug-in estimators - The issue is that we don't know the true values of `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`. - But we can estimate them! - Once we have estimates of `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`, we can plug them into the formulas for the ATE and ATT to get the plug-in estimators of these parameters as `$$\begin{eqnarray*} \widehat{ATE} &=& E_n[X'\widehat{\beta}^{D=1}] - E_n[X'\widehat{\beta}^{D=0}], \\ \widehat{ATT} &=& E_n[X'\widehat{\beta}^{D=1}|D=1] - E_n[X'\widehat{\beta}^{D=0}|D=1] \end{eqnarray*}$$` where, for a generic variable `\(Z\)`, `\(E_n[Z]\)` and `\(E_n[Z|D=1]\)` are the sample averages of `\(Z\)` among all units and among treated units, respectively. --- # What if we have more flexible models for `\(m_{D=d}(X)\)`? - .hi[What if we were to be more flexible about the functional form of the conditional expectations?] - .hi[What would be the implications of that?] - .hi[Would it be a good idea?] - .hi[Are there challenges?] - .hi[How would you implement these more flexible models?] --- class: center, middle name: Inference # What about inference? <html><div style='float:left'></div><hr color='#EB811B' size=1px width=1100px></html> --- # But what about inference? - In practice, we do not want only learn about the values of `\(ATE\)` and `\(ATT\)`. - We also want to know how precise our estimates are. - We want to know how confident we are that our estimates are close to the true values of `\(ATE\)` and `\(ATT\)`. - We want to know if we can reject the null hypothesis that `\(ATE\)` and `\(ATT\)` are zero. - We want to construct confidence intervals for `\(ATE\)` and `\(ATT\)`. - We want to test if `\(ATE\)` and `\(ATT\)` are different from each other. --- # How can we do that? - Let's first focus on the ATE. - With some abuse of notation, we can write our regression adjuested (RA) estimator of the ATE as `$$\begin{eqnarray*} \widehat{ATE} &=& \dfrac{1}{n}\sum_{i=1}^n (\widehat{Y}_i(1) - \widehat{Y}_i(0)), \end{eqnarray*}$$` where `\(\widehat{Y}_i(1)\)` and `\(\widehat{Y}_i(0)\)` are the predicted values of `\(Y_i\)` from the linear regression models for treated and untreated units, respectively, i.e., for every unit `\(i\)`, `$$\begin{eqnarray*} \widehat{Y}_i(1) &=& X_i'\widehat{\beta}^{D=1}, \\ \widehat{Y}_i(0) &=& X_i'\widehat{\beta}^{D=0}. \end{eqnarray*}$$` --- # Quantifying uncertainty - To make inference about the `\(ATE\)`, we need to quantify the uncertainty around our point estimate `\(\widehat{ATE}\)`. - But our point estimate `\(\widehat{ATE}\)` looks like a simple average of `\(n\)` random variables. - We know how to make inference about the mean of a random variable, right? - .hi[What kind of statistical tools can we use to make asymptotically valid inference about the mean of a random variable?] - .hi[What are the assumptions that we need to make to use these tools?] - .hi[How can we apply these tools to our case?] --- # Quantifying uncertainty - Without making distributional assumptions, we would invoke the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) to make inference about the ATE. - To justify these results, we often rely on the summands of the sample mean being independent and identically distributed (i.i.d.) random variables. - In our case, that would entail to assume that `\(\widehat{Y}_i(1) - \widehat{Y}_i(0)\)` are i.i.d. - If this were the case, we could compute the standard error of `\(\widehat{ATE}\)` as `$$\begin{eqnarray*} \widehat{\text{SE}}(\widehat{ATE}) = \bigg(\dfrac{1}{n^2}\sum_{i=1}^n (\widehat{Y}_i(1) - \widehat{Y}_i(0) - \widehat{ATE})^2.\bigg)^{\frac{1}{2}} \end{eqnarray*}$$` - .hi[But can we actually do this and trust the results?] --- # Quantifying uncertainty - Unfortunately, even when our data is i.i.d., `\(\widehat{Y}_i(1) - \widehat{Y}_i(0)\)` are not i.i.d. random variables. - This happens essentially because we are using the entire data to estimate `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`. - And the estimated coefficients are used to predict the counterfactual outcomes for all units. - This leads to a correlation between the fitted values `\(\widehat{Y}_i(1)\)` and `\(\widehat{Y}_i(0)\)`. - In summary, because we replace the true coefficients with their estimates, the summands of the sample mean are not i.i.d. random variables. - .hi[Thus, we need to tackle this issue to make valid inference about the ATE and ATT.] --- # Addendum: two-step estimation - Our RA estimator for the ATE is a special case of a more general two-step estimation procedure. - In the first step, we estimate the conditional expectations `\(m_{D=1}(X)\)` and `\(m_{D=0}(X)\)`. - In the second step, we use the estimated conditional expectations to predict the counterfactual outcomes for all units. - This two-step estimation procedure is very common in the literature. - When making inference, we need to account for the uncertainty in both steps. - If we ignore the uncertainty in the first step, our standard errors (and confidence intervals) will not be justified. --- # Adjusting the standard errors - The key to make valid inference about the ATE is to find a way to write `\(\sqrt{n} (\widehat{ATE} - ATE)\)` as a sum of i.i.d. random variables with mean zero and finite variance (plus some asymptotically negligible term), i.e., `$$\begin{eqnarray*} \sqrt{n} (\widehat{ATE} - ATE) = \dfrac{1}{\sqrt{n}}\sum_{i=1}^n IF_{i, \widehat{ate}} + o_p(1), \end{eqnarray*}$$` where `\(IF_{i, \widehat{ate}}\)` is the influence function of the RA estimator for the ATE. - The influence function of a parameter is a function that quantifies the effect of a small perturbation in the data on the parameter. --- # IF of the RA estimator for ATE - In our case, the influence function of our RA estimator for ATE is given by `$$\begin{eqnarray*} IF_{i, \widehat{ate}} &=& X_i'(\beta^{D=1} - \beta^{D=0}) \\ && + (D_i \varepsilon^{D=1}_i X_i')E[DXX']^{-1}E[X] \\ && - ((1 - D_i) \varepsilon^{D=0}_i X_i')E[(1-D) XX']^{-1}E[X] \end{eqnarray*}$$` - First term: uncertainty from second stage (if we were to know the true `\(\beta\)`'s) - Second and third terms: uncertainty from first stage (estimation of `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`) - Based on these results, we can prove that, as `\(n\rightarrow \infty\)`, the RA estimator for the ATE is asymptotically normal, i.e., `$$\begin{eqnarray*} \sqrt{n} (\widehat{ATE} - ATE) \rightarrow_d N(0,Var(IF_{i, \widehat{ate}})). \end{eqnarray*}$$` --- # IF of the RA estimator for ATE - Thus, if we can estimate the IF, we can make valid inference about the ATE. - But recall that the IF contains unknown quantities, such as the error terms `\(\varepsilon^{D=1}_i\)` and `\(\varepsilon^{D=0}_i\)`, as well and true coefficients `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`, and population expectations: `$$\begin{eqnarray*} IF_{i, \widehat{ate}} &=& X_i'(\beta^{D=1} - \beta^{D=0}) \\ && + (D_i \varepsilon^{D=1}_i X_i')E[DXX']^{-1}E[X] \\ && - ((1 - D_i) \varepsilon^{D=0}_i X_i')E[(1-D) XX']^{-1}E[X] \end{eqnarray*}$$` - To leverage the influence function, we need to estimate the unknown quantities. --- # Inference for ATE - Plug-in principle for the win! `$$\begin{eqnarray*} \widehat{IF}_{i, \widehat{ate}} &=& X_i'(\widehat{\beta}^{D=1} - \widehat{\beta}^{D=0}) \\ && + (D_i \widehat{\varepsilon}^{D=1}_i X_i')E_n[DXX']^{-1}E_n[X] \\ && - ((1 - D_i) \widehat{\varepsilon}^{D=0}_i X_i')E_n[(1-D) XX']^{-1}E_n[X], \end{eqnarray*}$$` where `\(\widehat{\beta}^{D=1}\)` and `\(\widehat{\beta}^{D=0}\)` are the OLS estimates of `\(\beta^{D=1}\)` and `\(\beta^{D=0}\)`, and `\(\widehat{\varepsilon}^{D=1}_i\)` and `\(\widehat{\varepsilon}^{D=0}_i\)` are the residuals from the first stage regressions. - We can then estimate the asymptotic variance of the RA estimator for the ATE as `$$\begin{eqnarray*} \widehat{Var}(\widehat{ATE}) = \dfrac{1}{n}\sum_{i=1}^n \widehat{IF}_{i,\widehat{ate}}^{2}. \end{eqnarray*}$$` --- # Inference for ATE - The standard errors are then given by `\(\sqrt{\widehat{Var}(\widehat{ATE})\bigg/n}\)`. - With the standard errors in hand, we can construct confidence intervals and hypothesis tests. - 95\% confidence intervals for the ATE are given by `\(\widehat{ATE} \pm 1.96 \times \sqrt{\widehat{Var}(\widehat{ATE})/n}\)`. - We can test if ATE is zero if `\(|\widehat{ATE}|\bigg/ \sqrt{\widehat{Var}(\widehat{ATE})/n} > 1.96\)`. - We can also compute p-values for the null hypothesis that ATE is zero. --- # Inference for ATE, in practice -.hi[I am sure you are thinking:] This is all nice and dandy, but how do I do this in practice? - Luckily, there are packages that automate this for you! - Here, we will use the `targeted` package in `R`. - A convenient way to install (if necessary) and load everything is by running the below code chunk. ```r ## Load and install the packages that we'll be using today if (!require("pacman")) install.packages("pacman") pacman::p_load(estimatr,tidyverse, targeted, boot, ggplot2) ``` --- # Let's use our simulated data from Lecture 13 ```r # Create a function to simulate the data simulate_data <- function(n_obs){ # Generate confounders x1 <- rnorm(n_obs, mean = 0, sd = 1); x2 <- rnorm(n_obs, mean = 0, sd = 2) x3 <- rpois(n_obs, lambda = 1) ; x4 <- rbinom(n_obs, size = 1, prob = 0.2) # put them all in a matrix x <- cbind(1, x1, x2, x3, x4) # Generate Potential Outcomes as linear functions of the confounders beta_Y0 <- c(1, 2, 3, 4, 5); beta_Y1 <- c(2, 3, 4, 5, 5) epislon_Y0 <- rnorm(n_obs); epislon_Y1 <- rnorm(n_obs) y1 <- x %*% beta_Y1 + epislon_Y1 y0 <- x %*% beta_Y0 + epislon_Y0 # Generate the treatment indicator beta_pscore <- c(1, 1,-1,1,-1)/3 pscore <- plogis(x %*% beta_pscore) d <- 1*(runif(n_obs) <= pscore) #observed outcome y <- y1*d + y0*(1-d) df <- data.frame(y=y, d=d, x1=x1, x2=x2, x3=x3, x4=x4) return(df) } ``` --- # Let's estimate the ATE ```r set.seed(20240404) n_obs <- 1000 df <- simulate_data(n_obs) #Let's estimate the ate atehat <- targeted::ate(y ~ d | d*(x1+x2+x3+x4)|1, data=df, binary=FALSE) ``` --- # Let's estimate the ATE .small-code[ ```r summary(atehat) ``` ``` ## ## Augmented Inverse Probability Weighting estimator ## Response y (Outcome model: gaussian): ## y ~ d * (x1 + x2 + x3 + x4) ## Exposure d (Propensity model: logistic regression): ## d ~ 1 ## ## Estimate Std.Err 2.5% 97.5% P-value ## d=0 6.3514 0.25937 5.8430 6.8597 2.016e-132 ## d=1 8.4453 0.33804 7.7828 9.1079 9.228e-138 ## Outcome model: ## (Intercept) 1.0277 0.06831 0.8939 1.1616 3.658e-51 ## d 0.9430 0.09261 0.7615 1.1245 2.367e-24 ## x1 1.9659 0.04750 1.8728 2.0590 0.000e+00 ## x2 2.9663 0.02573 2.9159 3.0167 0.000e+00 ## x3 4.0460 0.04998 3.9481 4.1440 0.000e+00 ## x4 4.8494 0.11763 4.6188 5.0799 0.000e+00 ## d:x1 1.0288 0.06183 0.9076 1.1500 3.684e-62 ## d:x2 1.0623 0.03169 1.0002 1.1244 2.337e-246 ## d:x3 1.0113 0.06122 0.8913 1.1313 2.724e-61 ## d:x4 0.1570 0.15275 -0.1424 0.4564 3.041e-01 ## Propensity model: ## (Intercept) -0.4895 0.06515 -0.6172 -0.3619 5.726e-14 ## ## Average Treatment Effect (constrast: 'd=1' - 'd=0'): ## ## Estimate Std.Err 2.5% 97.5% P-value ## ATE 2.094 0.1088 1.881 2.307 1.308e-82 ``` ] --- # Inference via bootstrap - Sometimes, the asymptotic variance of the ATE is not well approximated by the formula we discussed earlier. - In such cases, we can use the bootstrap to estimate the variance of the ATE. - The bootstrap is a resampling method that estimates the sampling distribution of a statistic by resampling the data with replacement. - The bootstrap is a powerful tool that can be used to estimate the variance of a statistic, construct confidence intervals, and conduct hypothesis tests. - The bootstrap is particularly useful when the asymptotic variance is difficult to estimate, or the asymptotic variance is difficult to derive. - Or when a package is not available to estimate the variance of your estimator. --- # Inference via bootstrap: ATT and ATE - We can use the bootstrap to estimate the standard errors for the ATT (which `targeted` does not provide) - The `boot` package in `R` is a powerful package for conducting the bootstrap. - The `boot` package requires a function that computes the statistic of interest. - The function should take two arguments: the data and the indices of the data to resample. - The `boot` function then resamples the data and computes the statistic of interest many times. - The standard errors are then estimated as the standard deviation of the bootstrap estimates. --- # Inference via bootstrap: ATT and ATE ```r # Function to compute the ATE and ATT bootfun_ate_att <- function(data, indices, ...) { df1 <- data[indices, ] reg_treated <- estimatr::lm_robust(y ~ x1 + x2 + x3 + x4, data = df1 %>% filter (d==1)) reg_untreated <- estimatr::lm_robust(y ~ x1 + x2 + x3 + x4, data = df1 %>% filter (d==0)) y1_hat <- predict(reg_treated, newdata = df1) y0_hat <- predict(reg_untreated, newdata = df1) cate_hat <- y1_hat - y0_hat ate_hat = as.numeric(mean(cate_hat)) att_hat = as.numeric(mean(cate_hat[df1$d==1])) out <- cbind(ate_hat ,att_hat ) return(out) } # Conduct the bootstrap boot_out <- boot::boot(data = df, statistic = bootfun_ate_att, R = 1000) ``` --- # Inference via bootstrap: ATT and ATE .small-code[ ```r #Let's see the results # Original point estimates boot_out$t0 ``` ``` ## ate_hat att_hat ## [1,] 2.093953 1.815709 ``` ```r # Bootstrap standard errors apply(boot_out$t, 2, sd) ``` ``` ## [1] 0.1093330 0.1280646 ``` ```r # print all the results boot_out ``` ``` ## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot::boot(data = df, statistic = bootfun_ate_att, R = 1000) ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* 2.093953 -0.002027230 0.1093330 ## t2* 1.815709 -0.001023231 0.1280646 ``` ] --- # Inference via bootstrap: ATT and ATE ```r # Can we test if the ATE is equal to ATT? # Point estimate ate_minus_att <- boot_out$t0[1] - boot_out$t0[2] ate_minus_att ``` ``` ## [1] 0.2782438 ``` ```r # Bootstrap standard error of the difference se_ate_minus_att <- sd(boot_out$t[,1] - boot_out$t[,2]) se_ate_minus_att ``` ``` ## [1] 0.067948 ``` ```r # t-statistic t_stat <- ate_minus_att/se_ate_minus_att ``` --- # Inference via bootstrap: ATT and ATE ```r # p-value of the difference p_value <- 2*pnorm(-abs(t_stat)) p_value ``` ``` ## [1] 4.222561e-05 ```