Test if Parallel trends assumption is sensitive to functional form

didFF(
  data,
  yname,
  tname,
  idname,
  gname,
  weightsname = NULL,
  est_method = "dr",
  xformla = NULL,
  panel = TRUE,
  allow_unbalanced_panel = FALSE,
  nevertreated = NULL,
  control_group = base::c("nevertreated", "notyettreated"),
  anticipation = 0,
  nbins = NULL,
  binpoints = NULL,
  numSims = 1e+05,
  seed = 0,
  lb_graph = NULL,
  ub_graph = NULL,
  aggte_type = "group",
  balance_e = NULL,
  min_e = -Inf,
  max_e = Inf,
  distDD = FALSE,
  pl = FALSE,
  cores = parallel::detectCores()
)

Arguments

data

The name of the data.frame that contains the data

yname

The name of the outcome variable

tname

The name of the column containing the time periods

idname

The cross-sectional unit id name

gname

The name of the variable in data that contains the first period when a particular observation is treated. This should be a positive number for all observations in treated groups. It defines which "group" a unit belongs to. It can be 0 or Inf for units in the “never-treated” group.

weightsname

The name of the column containing the sampling weights. If not set, all observations have the same weight (Default is NULL).

est_method

the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust approach in the DRDID package. Other built-in methods include "ipw" for inverse probability weighting and "reg" for first step regression estimators.

xformla

A formula for the covariates to include in the model. It should be of the form ~ X1 + X2. Default is NULL which is equivalent to xformla=~1. This is used to create a matrix of covariates which is then passed to the 2x2 DID estimator chosen in est_method.

panel

Whether or not the data is a panel dataset. The panel dataset should be provided in long format – that is, where each row corresponds to a unit observed at a particular point in time. The default is TRUE. When panel=FALSE, the data is treated as repeated cross sections.

allow_unbalanced_panel

Whether or not function should "balance" the panel with respect to time and id. The default value is FALSE which means that att_gt() will drop all units where data is not observed in all periods.

nevertreated

A scalar indicating never treated cohort. If any cohorts are equal to 0 and all time periods are above 0, then the 0 cohort is taken as never-treated by default; otherwise the default is Inf.

control_group

Which units to use the control group. The default is control_group = "nevertreated", which sets the control group to be the group of units that never participate in the treatment. This group does not change across groups or time periods. The other option is to set control_group="notyettreated". In this case, the control group is set to the group of units that have not yet participated in the treatment in that time period. This includes all never treated units, but it includes additional units that eventually participate in the treatment, but have not participated yet.

anticipation

The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes

nbins

A scalar indicating the (maximum) number of bins for the support of outcome. By default, if the outcome has fewer than 20 values then it is taken to be a discrete variable; otherwise nbins=20 is used. Empty bins dropped.

binpoints

Alternative to nbins: A vector indicating the interval endpoints to use; if the data range is not included then min(y) and max(y) are added as endpoints. For a user-specified vector a = c(a_1, a_2, ..., a_n), let b = a if min(y) >= min(a) and b = c(min(y), a) otherwise; then let c = b if max(y) <= max(a) and c = c(b, max(y)) otherwise. Bins are [c_1, c_2], (c_2, c_3], ..., (c_{n-1}, c_n]. Empty bins are dropped. By default, if the outcome has fewer than 20 values then it is taken to be a discrete variable and its values are used as bin points. Otherwise nbins is used.

numSims

Number of simulation draws to compute p-value for moment inequality test. Default numSims=100000.

seed

Starting seed for moment inequality test. Default is seed=0, set seed=NULL for random seed.

lb_graph

For display only; smallest bin to be plotted.

ub_graph

For display only; largest bin to be plotted.

aggte_type

Which type of (scalar) aggregated treatment effect parameter to compute. Options are "simple", "dynamic", "group", and "calendar". Default is group.

balance_e

If set (and if aggte_type = "dynamic"), it balances the sample with respect to event time. For example, if balance.e=2, it will drop groups that are not exposed to treatment for at least three periods. (the initial period when e=0 as well as the next two periods when e=1 and the e=2). This ensures that the composition of groups does not change when event time changes.

min_e

For aggte_type = "dynamic", this is the smallest event time to compute dynamic effects for. By default, min_e = -Inf so that effects at all feasible event times are computed.

max_e

For aggte_type = "dynamic", this is the largest event time to compute dynamic effects for. By default, max_e = Inf so that effects at all lfeasible event times are computed.

distDD

Estimate the distributional treatment effects (the distribution of Y(1) minus the implied distribution of Y(0), for the treated). Default is FALSE. The function distDD is provided as a wrapper for distDD=TRUE.

pl

Whether or not to use parallel processing. Default is FALSE.

cores

The number of cores to use for parallel processing. Only relevant if pl = TRUE.Default is cores = parallel::detectCores().

Value

A list object containing: The plot of the implied density under the null; a table with the estimated and implied densities, and the p-value for H0= Implied Density>=0; the average treatment effects.

References

Roth, Jonathan and Sant'Anna, Pedro H. C. (2023), "When is Parallel Trends Sensitive to Functional Form?" Econometrica, vol. 91 (2), pp. 737–747, doi:10.3982/ECTA19402