Locally efficient doubly robust DiD estimators for the ATT

drdid is used to compute the locally efficient doubly robust estimators for the ATT in difference-in-differences (DiD) setups. It can be used with panel or stationary repeated cross section data. Data should be store in "long" format.

drdid(
  yname,
  tname,
  idname,
  dname,
  xformla = NULL,
  data,
  panel = TRUE,
  estMethod = c("imp", "trad"),
  weightsname = NULL,
  boot = FALSE,
  boot.type = c("weighted", "multiplier"),
  nboot = 999,
  inffunc = FALSE,
  trim.level = 0.995
)

Arguments

yname: The name of the outcome variable.
tname: The name of the column containing the time periods.
idname: The name of the column containing the unit id name.
dname: The name of the column containing the treatment group (=1 if observation is treated in the post-treatment, =0 otherwise)
xformla: A formula for the covariates to include in the model. It should be of the form ~ X1 + X2 (intercept should not be listed as it is always automatically included). Default is NULL which is equivalent to xformla=~1.
data: The name of the data.frame that contains the data.
panel: Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When panel = FALSE, the data is treated as stationary repeated cross sections.
estMethod: the method to estimate the nuisance parameters. The default is "imp" which uses weighted least squares to estimate the outcome regressions and inverse probability tilting to the estimate the the propensity score, leading to the improved locally efficient DR DiD estimator proposed by Sant'Anna and Zhao (2020). The other alternative is "trad", which then uses OLS to estimate outcome regressions and maximum likelihood to estimate propensity score. This leads to the "traditional" locally efficient DR DiD estimator proposed by Sant'Anna and Zhao (2020).
weightsname: The name of the column containing the sampling weights. If NULL, then every observation has the same weights. The weights are normalized and therefore enforced to have mean 1 across all observations.
boot: Logical argument to whether bootstrap should be used for inference. Default is FALSE and analytical standard errors are reported.
boot.type: Type of bootstrap to be performed (not relevant if boot = FALSE). Options are "weighted" and "multiplier". If boot = TRUE, default is "weighted".
nboot: Number of bootstrap repetitions (not relevant if boot = FALSE). Default is 999.
inffunc: Logical argument to whether influence function should be returned. Default is FALSE.
trim.level: The level of trimming for the propensity score. Default is 0.995.

Value

A list containing the following components:

ATT: The DR DiD point estimate
se: The DR DiD standard error
uci: Estimate of the upper bound of a 95% CI for the ATT
lci: Estimate of the lower bound of a 95% CI for the ATT
boots: All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL
att.inf.func: Estimate of the influence function. Default is NULL
ps.flag: Convergence Flag for the propensity score estimation (only active if estMethod = "imp".): =0 if trust algorithm converged, =1 if IPT (original) algorithm converged (in case it was used), =2 if GLM logit estimator was used (i.e., if both trust and IPT did not converged).
call.param: The matched call.
argu: Some arguments used in the call (panel, estMethod, boot, boot.type, nboot, type="dr")

Details

When panel data are available (panel = TRUE), the drdid function implements the locally efficient doubly robust difference-in-differences (DiD) estimator for the average treatment effect on the treated (ATT) defined in equation (3.1) in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of a linear regression model for the outcome evolution among the comparison units.

When only stationary repeated cross-section data are available (panel = FALSE), the drdid function implements the locally efficient doubly robust difference-in-differences (DiD) estimator for the average treatment effect on the treated (ATT) defined in equation (3.4) in Sant'Anna and Zhao (2020). This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and of (separate) linear regression models for the outcome of both treated and comparison units, in both pre and post-treatment periods.

When one sets estMethod = "imp" (the default), the nuisance parameters (propensity score and outcome regression parameters) are estimated using the methods described in Sections 3.1 and 3.2 of Sant'Anna and Zhao (2020). In short, the propensity score parameters are estimated using the inverse probability tilting estimator proposed by Graham, Pinto and Pinto (2012), and the outcome regression coefficients are estimated using weighted least squares,where the weights depend on the propensity score estimates; see Sant'Anna and Zhao (2020) for details.

When one sets estMethod = "trad", the propensity score parameters are estimated using maximum likelihood, and the outcome regression coefficients are estimated using ordinary least squares.

The main advantage of using estMethod = "imp" is that the resulting estimator is not only locally efficient and doubly robust for the ATT, but it is also doubly robust for inference; see Sant'Anna and Zhao (2020) for details.

References

Graham, Bryan, Pinto, Cristine, and Egel, Daniel (2012), "Inverse Probability Tilting for Moment Condition Models with Missing Data." Review of Economic Studies, vol. 79 (3), pp. 1053-1079, doi:10.1093/restud/rdr047

Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003

Examples

# -----------------------------------------------
# Panel data case
# -----------------------------------------------
# Form the Lalonde sample with CPS comparison group
eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2)
# Further reduce sample to speed example
set.seed(123)
unit_random <- sample(unique(eval_lalonde_cps$id), 5000)
eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,]
# -----------------------------------------------
# Implement improved DR locally efficient DiD with panel data
drdid(yname="re", tname = "year", idname = "id", dname = "experimental",
      xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74,
      data = eval_lalonde_cps, panel = TRUE)
#>  Call:
#> drdid(yname = "re", tname = "year", idname = "id", dname = "experimental", 
#>     xformla = ~age + educ + black + married + nodegree + hisp + 
#>         re74, data = eval_lalonde_cps, panel = TRUE)
#> ------------------------------------------------------------------
#>  Further improved locally efficient DR DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#> -615.2344   683.1528   -0.9006     0.3678   -1954.2139  723.7452 
#> ------------------------------------------------------------------
#>  Estimator based on panel data.
#>  Outcome regression est. method: weighted least squares.
#>  Propensity score est. method: inverse prob. tilting.
#>  Analytical standard error.
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.

#Implement "traditional" DR locally efficient DiD with panel data
drdid(yname="re", tname = "year", idname = "id", dname = "experimental",
      xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74,
      data = eval_lalonde_cps, panel = TRUE, estMethod = "trad")
#>  Call:
#> drdid(yname = "re", tname = "year", idname = "id", dname = "experimental", 
#>     xformla = ~age + educ + black + married + nodegree + hisp + 
#>         re74, data = eval_lalonde_cps, panel = TRUE, estMethod = "trad")
#> ------------------------------------------------------------------
#>  Locally efficient DR DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#> -507.6113   692.9046   -0.7326     0.4638   -1865.7043  850.4817 
#> ------------------------------------------------------------------
#>  Estimator based on panel data.
#>  Outcome regression est. method: OLS.
#>  Propensity score est. method: maximum likelihood.
#>  Analytical standard error.
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.

# -----------------------------------------------
# Repeated cross section case
# -----------------------------------------------
# use the simulated data provided in the package
#Implement "improved" DR locally efficient DiD with repeated cross-section data
drdid(yname="y", tname = "post", idname = "id", dname = "d",
      xformla= ~ x1 + x2 + x3 + x4,
      data = sim_rc, panel = FALSE, estMethod = "imp")
#>  Call:
#> drdid(yname = "y", tname = "post", idname = "id", dname = "d", 
#>     xformla = ~x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, 
#>     estMethod = "imp")
#> ------------------------------------------------------------------
#>  Further improved locally efficient DR DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#>  -0.2089     0.2002    -1.0431     0.2969    -0.6013     0.1836  
#> ------------------------------------------------------------------
#>  Estimator based on (stationary) repeated cross-sections data.
#>  Outcome regression est. method: weighted least squares.
#>  Propensity score est. method: inverse prob. tilting.
#>  Analytical standard error.
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.

#Implement "traditional" DR locally efficient DiD with repeated cross-section data
drdid(yname="y", tname = "post", idname = "id", dname = "d",
      xformla= ~ x1 + x2 + x3 + x4,
      data = sim_rc, panel = FALSE, estMethod = "trad")
#>  Call:
#> drdid(yname = "y", tname = "post", idname = "id", dname = "d", 
#>     xformla = ~x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, 
#>     estMethod = "trad")
#> ------------------------------------------------------------------
#>  Locally efficient DR DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#>  -0.1678     0.2008    -0.8356     0.4034    -0.5614     0.2258  
#> ------------------------------------------------------------------
#>  Estimator based on (stationary) repeated cross-sections data.
#>  Outcome regression est. method: OLS.
#>  Propensity score est. method: maximum likelihood.
#>  Analytical standard error.
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.