1.(100 points) Suppose your data comes from an experiment where the treatment is randomly assigned. Let \(Y(1), Y(0)\) be the potential outcomes under treatment and control, respectively. Let \(D\) be the treatment indicator, and \(X\) be a set of covariates. Since we have an completely randomized experiment, it follows that \(Y(1), Y(0) \perp D.\)

Here, we have a few options to estimate the average treatment effect (ATE). One of the most common methods is the difference-in-means estimator, which is given by \[\begin{eqnarray*} \widehat{ATE}_{dm} &=& \frac{\sum_{i=1}^N D_i Y_i}{\sum_{i=1}^N D_i} - \frac{\sum_{i=1}^N (1-D_i) Y_i}{\sum_{i=1}^N (1-D_i)}\\ &=& \overline{Y}_{D=1} - \overline{Y}_{D=0}, \end{eqnarray*}\] where \(\overline{Y}_{D=1}\) and \(\overline{Y}_{D=0}\) are the sample means of the outcome variable for the treated and control groups, respectively.

Another common methods are related to using linear regressions, as you have seen in Econ 320. That is, one can use the following regression model: \[\begin{eqnarray*} Y_i &=& \alpha + \theta D_i + \beta_1 X_{1i} + \dots + \beta_k X_{ki} \varepsilon_i, \end{eqnarray*}\] where \(\alpha\) is the intercept, \(\theta\) is an average treatment effect, and \(\beta_1, \dots, \beta_k\) are the coefficients of the covariates. Researchers often interpret estimates of \(\theta\) as estimates the ATE.

A third option is to use regression adjustments, as discussed in Slide 13. That is, one can split the data into treated units and control units, and then regress the outcome variable on the covariates for each group. The average treatment effect can then be estimated as the difference between the average predicted outcomes for the treated and control groups. More precisely, if one uses linear regressions as working models, one can estimate the ATE by running the following regressions: \[\begin{eqnarray*} Y_i &=& \alpha_{D=1} + X_i' \beta_{D=1} + \varepsilon_i, \text{ for units } D_i = 1,\\ Y_i &=& \alpha_{D=0} + X_i' \beta_{D=0} + \varepsilon_i, \text{ for units } D_i = 0, \end{eqnarray*}\] where \(\alpha_{D=1}\) and \(\alpha_{D=0}\) are the intercepts, \(\beta_{D=1}\) and \(\beta_{D=0}\) are the coefficients of the covariates, and \(X_i'\) is the vector of covariates for unit \(i\).

Once these regressions are estimated, the ATE can be estimated as \[\begin{eqnarray*} \widehat{ATE}_{ra} &=& \widehat{\alpha}_{D=1} - \widehat{\alpha}_{D=0} + \overline{X} \widehat{\beta}_{D=1} - \overline{X} \widehat{\beta}_{D=0}, \end{eqnarray*}\] where \(\overline{X}\) is the sample means of the covariates across all units.

In this question, you will compare the performance of these three methods in estimating the ATE using Monte Carlo Simulations.

We will consider the following data generating process: - \(X\) is univariate, and follows a normal distribution with mean 0 and variance 1. - The treatment indicator \(D\) is independent of everything, and is generated as \(1\{u < 0.2\}\), where \(u\) is a random variable following a uniform distribution. - The treated and untreated potential outcomes are generated as \(Y_i(1) = X_i + \varepsilon_i(1)\) and \(Y_i(0) = -X_i + \varepsilon_i(0)\), respectively, where \(\varepsilon\)’s follows a normal distribution with mean 0 and variance 1, and they are independent of each other. - Observed data is \((Y_i, D_i, X_i)_{i=1}^n\), where \(Y_i = D_i Y_i(1) + (1-D_i) Y_i(0)\).

Given the above DGP, we know that \(ATE=2E[X] = 0\). We would like to estimate the ATE as precise as possible.

2.(50 points) Suppose now that selection into treatment is not random, i.e., we do not have a A/B test like in Question 4. More specifically, let’s assume that the treatment indicator \(D\) is a function of the covariates \(X\), i.e., \(D = 1\{u<= p(X)\}\), where \(u\) is a random variable following a uniform distribution, and \(p(X) = exp(-1 + 0.5X)/(1+exp(-1 + 0.5X))\).

Repeat the Monte Carlo simulation in Question 4, but now considering the new treatment assignment. Discuss your results. How does the performance of the estimators change when selection into treatment is not random?