Inverse probability weighted DiD estimators for the ATT

ipwdid computes the inverse probability weighted estimators for the average treatment effect on the treated in difference-in-differences (DiD) setups. It can be used with panel or stationary repeated cross-sectional data, with or without normalized (stabilized) weights. See Abadie (2005) and Sant'Anna and Zhao (2020) for details.

ipwdid(
  yname,
  tname,
  idname,
  dname,
  xformla = NULL,
  data,
  panel = TRUE,
  normalized = TRUE,
  weightsname = NULL,
  boot = FALSE,
  boot.type = c("weighted", "multiplier"),
  nboot = 999,
  inffunc = FALSE,
  trim.level = 0.995
)

Arguments

yname: The name of the outcome variable.
tname: The name of the column containing the time periods.
idname: The name of the column containing the unit id name.
dname: The name of the column containing the treatment group (=1 if observation is treated in the post-treatment, =0 otherwise)
xformla: A formula for the covariates to include in the model. It should be of the form ~ X1 + X2 (intercept should not be listed as it is always automatically included). Default is NULL which is equivalent to xformla=~1.
data: The name of the data.frame that contains the data.
panel: Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When panel = FALSE, the data is treated as stationary repeated cross sections.
normalized: Logical argument to whether IPW weights should be normalized to sum up to one. Default is TRUE.
weightsname: The name of the column containing the sampling weights. If NULL, then every observation has the same weights. The weights are normalized and therefore enforced to have mean 1 across all observations.
boot: Logical argument to whether bootstrap should be used for inference. Default is FALSE and analytical standard errors are reported.
boot.type: Type of bootstrap to be performed (not relevant if boot = FALSE). Options are "weighted" and "multiplier". If boot = TRUE, default is "weighted".
nboot: Number of bootstrap repetitions (not relevant if boot = FALSE). Default is 999.
inffunc: Logical argument to whether influence function should be returned. Default is FALSE.
trim.level: The level of trimming for the propensity score. Default is 0.995.

Value

A list containing the following components:

ATT: The IPW DiD point estimate
se: The IPW DiD standard error
uci: Estimate of the upper bound of a 95% CI for the ATT
lci: Estimate of the lower bound of a 95% CI for the ATT
boots: All Bootstrap draws of the ATT, in case bootstrap was used to conduct inference. Default is NULL
att.inf.func: Estimate of the influence function. Default is NULL
call.param: The matched call.
argu: Some arguments used in the call (panel, normalized, boot, boot.type, nboot, type=="ipw")

Details

The ipwdid function implements the inverse probability weighted (IPW) difference-in-differences (DiD) estimator for the average treatment effect on the treated (ATT) proposed by Abadie (2005) (normalized = FALSE) or Hajek-type version defined in equations (4.1) and (4.2) in Sant'Anna and Zhao (2020), when either panel data or stationary repeated cross-sectional data are available. This estimator makes use of a logistic propensity score model for the probability of being in the treated group, and the propensity score parameters are estimated via maximum likelihood.

References

Abadie, Alberto (2005), "Semiparametric Difference-in-Differences Estimators", Review of Economic Studies, vol. 72(1), p. 1-19, doi:10.1111/0034-6527.00321

Sant'Anna, Pedro H. C. and Zhao, Jun. (2020), "Doubly Robust Difference-in-Differences Estimators." Journal of Econometrics, Vol. 219 (1), pp. 101-122, doi:10.1016/j.jeconom.2020.06.003

Examples

# -----------------------------------------------
# Panel data case
# -----------------------------------------------
# Form the Lalonde sample with CPS comparison group
eval_lalonde_cps <- subset(nsw_long, nsw_long$treated == 0 | nsw_long$sample == 2)
# Further reduce sample to speed example
set.seed(123)
unit_random <- sample(unique(eval_lalonde_cps$id), 5000)
eval_lalonde_cps <- eval_lalonde_cps[eval_lalonde_cps$id %in% unit_random,]

# Implement IPW DiD with panel data (normalized weights)
ipwdid(yname="re", tname = "year", idname = "id", dname = "experimental",
      xformla= ~ age+ educ+ black+ married+ nodegree+ hisp+ re74,
      data = eval_lalonde_cps, panel = TRUE)
#>  Call:
#> ipwdid(yname = "re", tname = "year", idname = "id", dname = "experimental", 
#>     xformla = ~age + educ + black + married + nodegree + hisp + 
#>         re74, data = eval_lalonde_cps, panel = TRUE)
#> ------------------------------------------------------------------
#>  IPW DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#> -655.9068   687.7806   -0.9537     0.3403   -2003.9567  692.1431 
#> ------------------------------------------------------------------
#>  Estimator based on panel data.
#>  Hajek-type IPW estimator (weights sum up to 1).
#>  Propensity score est. method: maximum likelihood.
#>  Analytical standard error.
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.

# -----------------------------------------------
# Repeated cross section case
# -----------------------------------------------
# use the simulated data provided in the package
#Implement IPW DiD with repeated cross-section data (normalized weights)
# use Bootstrap to make inference with 199 bootstrap draws (just for illustration)
ipwdid(yname="y", tname = "post", idname = "id", dname = "d",
      xformla= ~ x1 + x2 + x3 + x4,
      data = sim_rc, panel = FALSE,
      boot = TRUE, nboot = 199)
#>  Call:
#> ipwdid(yname = "y", tname = "post", idname = "id", dname = "d", 
#>     xformla = ~x1 + x2 + x3 + x4, data = sim_rc, panel = FALSE, 
#>     boot = TRUE, nboot = 199)
#> ------------------------------------------------------------------
#>  IPW DID estimator for the ATT:
#>  
#>    ATT     Std. Error  t value    Pr(>|t|)  [95% Conf. Interval] 
#>  -15.8033    9.7237    -1.6252     0.1041    -32.7867    1.1801  
#> ------------------------------------------------------------------
#>  Estimator based on (stationary) repeated cross-sections data.
#>  Hajek-type IPW estimator (weights sum up to 1).
#>  Propensity score est. method: maximum likelihood.
#>  Boostrapped standard error based on 199 bootstrap draws. 
#>  Bootstrap method: weighted . 
#> ------------------------------------------------------------------
#>  See Sant'Anna and Zhao (2020) for details.